Digital Humanities Abstracts

“Perdita's Progress: Raising Standards in a TEI-based Approach to Cataloguing Early Modern Manuscripts”
Jill Seal Nottingham Trent University, UK Claire Warwick Sheffield University, UK Elizabeth Clarke Nottingham Trent University, UK

The Perdita Project was established in 1997 at the Nottingham Trent University, by Elizabeth Clarke and Victoria Burke of the English & Media Studies department, with Martyn Bennett of the History department, funded initially by the Nottingham Trent University. A substantial AHRB grant in 1999 meant that we could appoint research fellow Jonathan Gibson and researchers Jill Seal and Gillian Wright. Claire Warwick from the Department of Information Studies at Sheffield is electronic publication consultant to the project, which runs until 2002. We regard this paper as a collaborative exercise. In it we aim to introduce some of the dilemmas which we have faced during the project. We hope that our progress may be of interest to those at the conference, and that our discussions with the humanities computing community will also be of help to us in the continuation of the project. The Perdita Project is producing a comprehensive guide to manuscript compilations of early modern women. We are carrying out research on over 450 manuscripts written or compiled by women, which include miscellanies, commonplace books, account books, medical and cookery receipt books, religious writing, and autobiographical material. Our descriptions of the manuscripts will be encoded in SGML to allow extensive searching capacity, and will be published on the Internet in 2002. The interdisciplinary nature of the project is vital to our work, and is something that we will discuss throughout the paper. By describing previously unpublished materials by women of the 16th & 17th centuries, we aim to be a resource for scholars both literary and historical, enabling access to manuscript sources which are often very difficult to trace in comparison with published texts. We see ourselves as part of the movement to rewrite literary history, moving the emphasis from printed text and male and/or canonical works to a fuller picture of the writing of the period (Ezell, 1993). Electronic publication also allows us to address important issues to do with the dissemination of text in an electronic medium. This seems uniquely appropriate, since as Woodmansee (1994) argues, the transmission of scholarly electronic text shares some features with the coterie traditions of manuscript dissemination which we are studying in the early modern period. In the paper we shall explore some of the challenges we face in attempting to combine the two traditions and when working with the different media of manuscript, print and electronic text. We will discuss why we have chosen to encode manuscript descriptions rather than the texts themselves, and how far we should interpret the manuscripts in the descriptions that we provide. To what level, and in what way should these descriptions be encoded, and how will this affect their usefulness? How far should we try to consider the present user community? Standards, whether in manuscript description or in electronic text encoding, are also vital to our work. Projects which deal with the electronic publication of manuscripts tend either to provide users with digitised images of manuscript pages, and/or to transcribe and encode the text. However, we have chosen a different strategy, since we are not presently intending to transcribe entire manuscripts. Rather, we will present descriptions, which will take the form of an extended catalogue entry, including a list of contents, a physical description, and a biographical article on the compiler(s). We believe that there are valid scholarly reasons for doing so. There is already a certain amount of literary text available in electronic form, much of which is lacking any kind of commentary or contextual material. Given the time and financial constraints of our project, we therefore preferred to concentrate on a more novel research area. Our methodology has been designed as a response to the shift in focus in manuscript studies from the search for authoritative texts to the historical circumstances of manuscript production and circulation (Beal, 1980-, Marotti, 1995, Woudhuysen, 1996, Hobbs, 1992, Love, 1993). Rather than simply producing large amounts of transcribed text with no accompanying commentary or contextual research, we prefer to make an important scholarly contribution to this research area. We also consider that it is important that our resource should lead scholars to visit archives, and consult the manuscripts themselves, when possible. Since the provision of digital surrogates tends to increase the amount of usage of the original material (Lee 1998, Chapman, Kingsley & Dempsey, 1999), we feel that out efforts should be directed to descriptive scholarly research to aid researchers in their use of original documents. We do, however, acknowledge the problematic nature of our task and of classifications such as authorship, function and gender in looking at manuscript compilations, and pledge ourselves to giving "as much information as possible to facilitate useful readings of the manuscript compilations". But what are "useful" readings, how much information should we give, and in what form? We therefore intend to conduct a study of our potential user community in collaboration with Sheffield University DIS to try to answer these, and other, questions. However, we are aware of the potential problems of trying to ensure that the resource remains usable and accessible by a community of future users whose needs we cannot hope to predict. This means that we need to apply, and in some cases set, standards in various areas. Most obviously, we must apply the highest standards in manuscript cataloguing and description. There are also other areas which we are particularly well-equipped to explore, for instance, a standard vocabulary for describing handwriting. Most fascinating of all is the question of what a woman's hand might look like. Electronic delivery will provide us with an ideal opportunity to contribute to this discussion, by providing visual samples of women's hands. We are also concerned with the standards necessary for electronic publication. We must be aware of how far the standards necessary for text encoding and useful searching impose interpretations on the manuscript, since already, in our editing and cataloguing process, we are working at several removes from the original text (see fig. 1).We are conscious that the decisions we make in encoding the descriptive material must not be so prescriptive that they hinder usage, but at the same time we aim to aid searching by appropriate markup. We therefore approach this project in the spirit of the text-encoding initiative. However, further complications are caused by the fact that what we are encoding is essentially metadata, not simply transcribed text. We would like to explore the ideological, conceptual and practical differences between the TEI, metadata systems, and the use of controlled vocabulary. In the world of electronic resources there appears to be a culture clash between a post-structuralist, qualitative 'search for anything you like - create your own text' ideology and the controlled, quantitative approach to classifying objects taken by museums. This may represent the difference between dealing with text and dealing with objects, or the difference between English and History, where computing projects tend to deal with statistics. However, we at Perdita are faced with the problem of trying to balance the two approaches. We are aware that the TEI header is ideal for a project which wants to shape the data in the form of the original text, even if it does not encode it. Some historians are now beginning to recognise that 'data' is deeply embedded in text, which is why we believe that TEI is the best ideological option. Yet, database pioneers such as The Getty Institute in LA are encouraging us to adopt database methodology and to use terminology and Thesauri, because of their more systematic nature. This view is supported by research by DIS at Sheffield, which suggests that when constructing and using metadata, many users find the lack of a controlled vocabulary inhibits the ease with which they can search electronic resources. (Whittaker 1999) At the British Women Writers' Conference at Albuquerque in September, everyone recognised the need for a standard set of keywords. Unfortunately no such resource yet exists. We have therefore decided to view our work in the light of future users. We will describe our attempts to combine both standards and to provide some sense of the original text, using TEI markup, with a standardised search vocabulary for ease of searching.

References

P. Beal. Index of English Literary Manuscripts. London: Mansell, 1980. (2 vols, 2 pts).
A. Chapman N. Kingsley L. Dempsey. Full Disclosure. Releasing the Value of Library and Archive Collections. Bath: UKOLN, 1999.
M. Ezell. Writing Women's Literary History. Baltimore: Johns Hopkins U. P., 1993.
M. Hobbs. Early Seventeenth-Century Verse Miscellany Manuscripts. Aldershot: Scholar Press, 1992.
S. Lee. Scoping the Future of Oxford's Digital Collections. : , 1998.
H. Love. Scribal Publication in Seventeenth-Century England. Oxford: Clarendon Press, 1993.
A. Marotti. Manuscript, Print, and the English Renaissance Lyric. Ithaca: Cornell U. P., 1995.
S. Whittaker. “The construction of Dublin Core Metadata by non-specialist users.” University of Sheffield, 1999.
M. Woodmansee. “On the Author Effect: Recovering Collectivity.” The Construction of Authorship: Textual Appropriation in Law and Literature. Ed. Martha Woodmansee Peter Jaszi. Durham, NC: Duke U.P., 1994.
H. R. Woudhuysen. Sir Philip Sidney and the Circulation of Manuscripts 1558-1640. Oxford: Clarendon Press, 1996.