Metainformation Strategies for Electronic Resources

“Metainformation Strategies for Electronic Resources”

Susan Schreibman University College Dublin, Eire

This paper will address the theoretical and practical issues in devising and implementing a project-specific metainformation scheme for electronic resources. While one can argue that a scheme like the Text Encoding Initiative provides for encoding which greatly enhances plain text retrieval, in practice without extensive use of the keyword or indexing elements, retrieval of information is limited to what is explicit in the text. Searching for what is explicit in the text, even if that text has been encoded logically (as opposed to physically), does not provide the kind of functionality most humanists expect from digital archives. This paper then is an exploration of the advantages and disadvantages in creating a meta-meta information or classification scheme for electronic resources. For this talk I will draw heavily on theoretical models (both pre-and post-computer indexing models) from library and information studies. I will also adopt the position that creators of electronic resources are encoding their primary material in a SGML or XML-based metainformation scheme, such as the Text Encoding Initiative. I will also assume that the project directors have already made certain specific decisions in encoding what is explicit in the text in accordance with the project's goals. In other words, I am assuming that a digital project is already taking advantage of the tagging structure afforded in a scheme like the TEI in providing for the encoding of titles of text, place, personal, geographic and organisation names, etc., as deemed important to a particular project. There can be no doubt that this type of tagging greatly enhances retrieval, for example by distinguishing the occurrence of WB Yeats as a title as opposed to a personal name, or facilitating the searching of all strings within a <placename> element. And although this type of encoding of electronic resources gives users unprecedented access in locating very specific strings of text, in practice users are frustrated by limited and relatively simplistic search and retrieval strategies. In most electronic resources, users are limited to retrieving only what is explicit in the text, i.e. strings of text, some of which have been encoded logically. In the case of images, the situation is even more problematic. Unless a project has developed a header consisting of detailed metainformation, most images can only be retrieved by image title. Boolean and proximity searches go a very small way in solving the problem of retrieving more than single word searches, but do not provide the conceptually and theoretically rigorous searches most scholars in the humanities want and expect from electronic resources. Specifically, this paper will address the practical and theoretical issues raised by devising a classification or indexing scheme which facilitates search and retrieval by going beyond encoding what is explicit in the text. To this end, several points will be raised:

although encoding what is implicit in the text facilitates retrieval of concepts not possible by explicit encoding, this process is much more subjective;
how this subjectivity influences retrieval;
the concept of granularity will be raised, and the problems of encoding to various levels;
the problems of encoding implicit metainformation which is transparent to users.

While at past ALLC/ ACH conferences many papers have discussed the difficulties in consistent encoding of explicit text in large projects in which many people participate in the encoding process, the possibilities for inconsistent encoding of implicit text multiplies exponentially. Yet, I would argue, that without the development of classification or indexing schemes, digital archives remain hidden behind front ends which may look resplendent, but which barely reveal their complexity and richness. To this end, the rest of the paper will be divided into three parts. Part I will provide an overview of some of the major metainformation schemes which were developed in a pre-digital environment, such as AACR2, the Dewey Decimal Classification, and the Library of Congress Subject Headings. Topics to be covered will include:

the theoretical impetus behind these schemes;
how and why these schemes were conceived and made extensible;
why these schemes cannot be transferred to a digital environment without adaptation.

The second part of the paper will explore current applications of some of these schemes to a digital environment, such as the Art and Architecture Thesaurus and the Thesaurus for Graphic Materials. Specifically, I will address how these schemes have been adapted from facilitating indexing codex-based texts to digital ones. In addition, the special case of indexing images will also be discussed. The third part of this paper will explore metainformation schemes devised for several specific digital archives, including The Blake Archive and The Thomas MacGreevy Archive, both published at the Institute for Advanced Technology in the Humanities at the University of Virginia. In the case of the Thomas MacGreevy Archive, I will demonstrate how we, working within the TEI, developed a metainformation scheme which facilitated very specific genre searching for both texts and images.