“Perdita's Progress: Raising Standards in a TEI-based
Approach to Cataloguing Early Modern Manuscripts”
Jill
Seal
Nottingham Trent University, UK
Claire
Warwick
Sheffield University, UK
Elizabeth
Clarke
Nottingham Trent University, UK
The Perdita Project was established in 1997 at the Nottingham Trent University,
by Elizabeth Clarke and Victoria Burke of the English & Media Studies
department, with Martyn Bennett of the History department, funded initially by
the Nottingham Trent University. A substantial AHRB grant in 1999 meant that we
could appoint research fellow Jonathan Gibson and researchers Jill Seal and
Gillian Wright. Claire Warwick from the Department of Information Studies at
Sheffield is electronic publication consultant to the project, which runs until
2002.
We regard this paper as a collaborative exercise. In it we aim to introduce some
of the dilemmas which we have faced during the project. We hope that our
progress may be of interest to those at the conference, and that our discussions
with the humanities computing community will also be of help to us in the
continuation of the project.
The Perdita Project is producing a comprehensive guide to manuscript compilations
of early modern women. We are carrying out research on over 450 manuscripts
written or compiled by women, which include miscellanies, commonplace books,
account books, medical and cookery receipt books, religious writing, and
autobiographical material. Our descriptions of the manuscripts will be encoded
in SGML to allow extensive searching capacity, and will be published on the
Internet in 2002.
The interdisciplinary nature of the project is vital to our work, and is
something that we will discuss throughout the paper. By describing previously
unpublished materials by women of the 16th & 17th centuries, we aim to be a
resource for scholars both literary and historical, enabling access to
manuscript sources which are often very difficult to trace in comparison with
published texts. We see ourselves as part of the movement to rewrite literary
history, moving the emphasis from printed text and male and/or canonical works
to a fuller picture of the writing of the period (Ezell, 1993). Electronic
publication also allows us to address important issues to do with the
dissemination of text in an electronic medium. This seems uniquely appropriate,
since as Woodmansee (1994) argues, the transmission of scholarly electronic text
shares some features with the coterie traditions of manuscript dissemination
which we are studying in the early modern period.
In the paper we shall explore some of the challenges we face in attempting to
combine the two traditions and when working with the different media of
manuscript, print and electronic text. We will discuss why we have chosen to
encode manuscript descriptions rather than the texts themselves, and how far we
should interpret the manuscripts in the descriptions that we provide. To what
level, and in what way should these descriptions be encoded, and how will this
affect their usefulness? How far should we try to consider the present user
community? Standards, whether in manuscript description or in electronic text
encoding, are also vital to our work.
Projects which deal with the electronic publication of manuscripts tend either to
provide users with digitised images of manuscript pages, and/or to transcribe
and encode the text. However, we have chosen a different strategy, since we are
not presently intending to transcribe entire manuscripts. Rather, we will
present descriptions, which will take the form of an extended catalogue entry,
including a list of contents, a physical description, and a biographical article
on the compiler(s). We believe that there are valid scholarly reasons for doing
so.
There is already a certain amount of literary text available in electronic form,
much of which is lacking any kind of commentary or contextual material. Given
the time and financial constraints of our project, we therefore preferred to
concentrate on a more novel research area. Our methodology has been designed as
a response to the shift in focus in manuscript studies from the search for
authoritative texts to the historical circumstances of manuscript production and
circulation (Beal, 1980-, Marotti, 1995, Woudhuysen, 1996, Hobbs, 1992, Love,
1993). Rather than simply producing large amounts of transcribed text with no
accompanying commentary or contextual research, we prefer to make an important
scholarly contribution to this research area.
We also consider that it is important that our resource should lead scholars to
visit archives, and consult the manuscripts themselves, when possible. Since the
provision of digital surrogates tends to increase the amount of usage of the
original material (Lee 1998, Chapman, Kingsley & Dempsey, 1999), we feel
that out efforts should be directed to descriptive scholarly research to aid
researchers in their use of original documents.
We do, however, acknowledge the problematic nature of our task and of
classifications such as authorship, function and gender in looking at manuscript
compilations, and pledge ourselves to giving "as much information as possible to
facilitate useful readings of the manuscript compilations". But what are
"useful" readings, how much information should we give, and in what form? We
therefore intend to conduct a study of our potential user community in
collaboration with Sheffield University DIS to try to answer these, and other,
questions.
However, we are aware of the potential problems of trying to ensure that the
resource remains usable and accessible by a community of future users whose
needs we cannot hope to predict. This means that we need to apply, and in some
cases set, standards in various areas. Most obviously, we must apply the highest
standards in manuscript cataloguing and description. There are also other areas
which we are particularly well-equipped to explore, for instance, a standard
vocabulary for describing handwriting. Most fascinating of all is the question
of what a woman's hand might look like. Electronic delivery will provide us with
an ideal opportunity to contribute to this discussion, by providing visual
samples of women's hands.
We are also concerned with the standards necessary for electronic publication. We
must be aware of how far the standards necessary for text encoding and useful
searching impose interpretations on the manuscript, since already, in our
editing and cataloguing process, we are working at several removes from the
original text (see fig. 1).We are conscious that the decisions we make in
encoding the descriptive material must not be so prescriptive that they hinder
usage, but at the same time we aim to aid searching by appropriate markup. We
therefore approach this project in the spirit of the text-encoding initiative.
However, further complications are caused by the fact that what we are encoding
is essentially metadata, not simply transcribed text.
We would like to explore the ideological, conceptual and practical differences
between the TEI, metadata systems, and the use of controlled vocabulary. In the
world of electronic resources there appears to be a culture clash between a
post-structuralist, qualitative 'search for anything you like - create your own
text' ideology and the controlled, quantitative approach to classifying objects
taken by museums. This may represent the difference between dealing with text
and dealing with objects, or the difference between English and History, where
computing projects tend to deal with statistics. However, we at Perdita are
faced with the problem of trying to balance the two approaches. We are aware
that the TEI header is ideal for a project which wants to shape the data in the
form of the original text, even if it does not encode it. Some historians are
now beginning to recognise that 'data' is deeply embedded in text, which is why
we believe that TEI is the best ideological option.
Yet, database pioneers such as The Getty Institute in LA are encouraging us to
adopt database methodology and to use terminology and Thesauri, because of their
more systematic nature. This view is supported by research by DIS at Sheffield,
which suggests that when constructing and using metadata, many users find the
lack of a controlled vocabulary inhibits the ease with which they can search
electronic resources. (Whittaker 1999) At the British Women Writers' Conference
at Albuquerque in September, everyone recognised the need for a standard set of
keywords. Unfortunately no such resource yet exists.
We have therefore decided to view our work in the light of future users. We will
describe our attempts to combine both standards and to provide some sense of the
original text, using TEI markup, with a standardised search vocabulary for ease
of searching.
References
P. Beal. Index of English Literary Manuscripts. London: Mansell, 1980. (2 vols, 2 pts).
A. Chapman N. Kingsley L. Dempsey. Full Disclosure. Releasing the Value of Library and Archive Collections. Bath: UKOLN, 1999.
M. Ezell. Writing Women's Literary History. Baltimore: Johns Hopkins U. P., 1993.
M. Hobbs. Early Seventeenth-Century Verse Miscellany Manuscripts. Aldershot: Scholar Press, 1992.
S. Lee. Scoping the Future of Oxford's Digital Collections. : , 1998.
H. Love. Scribal Publication in Seventeenth-Century England. Oxford: Clarendon Press, 1993.
A. Marotti. Manuscript, Print, and the English Renaissance Lyric. Ithaca: Cornell U. P., 1995.
S. Whittaker. “The construction of Dublin Core Metadata by
non-specialist users.” University of Sheffield, 1999.
M. Woodmansee. “On the Author Effect: Recovering Collectivity.” The Construction of Authorship: Textual Appropriation in Law and Literature. Ed. Martha Woodmansee Peter Jaszi. Durham, NC: Duke U.P., 1994.
H. R. Woudhuysen. Sir Philip Sidney and the Circulation of Manuscripts 1558-1640. Oxford: Clarendon Press, 1996.