“Metainformation Strategies for Electronic
Resources”
Susan
Schreibman
University College Dublin, Eire
This paper will address the theoretical and practical issues in devising and
implementing a project-specific metainformation scheme for electronic resources.
While one can argue that a scheme like the Text Encoding Initiative provides for
encoding which greatly enhances plain text retrieval, in practice without
extensive use of the keyword or indexing elements, retrieval of information is
limited to what is explicit in the text. Searching for what is explicit in the
text, even if that text has been encoded logically (as opposed to physically),
does not provide the kind of functionality most humanists expect from digital
archives.
This paper then is an exploration of the advantages and disadvantages in creating
a meta-meta information or classification scheme for electronic resources. For
this talk I will draw heavily on theoretical models (both pre-and post-computer
indexing models) from library and information studies. I will also adopt the
position that creators of electronic resources are encoding their primary
material in a SGML or XML-based metainformation scheme, such as the Text
Encoding Initiative. I will also assume that the project directors have already
made certain specific decisions in encoding what is explicit in the text in
accordance with the project's goals. In other words, I am assuming that a
digital project is already taking advantage of the tagging structure afforded in
a scheme like the TEI in providing for the encoding of titles of text, place,
personal, geographic and organisation names, etc., as deemed important to a
particular project.
There can be no doubt that this type of tagging greatly enhances retrieval, for
example by distinguishing the occurrence of WB Yeats as a title as opposed to a
personal name, or facilitating the searching of all strings within a
<placename> element. And although this type of encoding of electronic
resources gives users unprecedented access in locating very specific strings of
text, in practice users are frustrated by limited and relatively simplistic
search and retrieval strategies. In most electronic resources, users are limited
to retrieving only what is explicit in the text, i.e. strings of text, some of
which have been encoded logically. In the case of images, the situation is even
more problematic. Unless a project has developed a header consisting of detailed
metainformation, most images can only be retrieved by image title. Boolean and
proximity searches go a very small way in solving the problem of retrieving more
than single word searches, but do not provide the conceptually and theoretically
rigorous searches most scholars in the humanities want and expect from
electronic resources.
Specifically, this paper will address the practical and theoretical issues raised
by devising a classification or indexing scheme which facilitates search and
retrieval by going beyond encoding what is explicit in the text. To this end,
several points will be raised:
- although encoding what is implicit in the text facilitates retrieval of concepts not possible by explicit encoding, this process is much more subjective;
- how this subjectivity influences retrieval;
- the concept of granularity will be raised, and the problems of encoding to various levels;
- the problems of encoding implicit metainformation which is transparent to users.
- the theoretical impetus behind these schemes;
- how and why these schemes were conceived and made extensible;
- why these schemes cannot be transferred to a digital environment without adaptation.