Digital Humanities Abstracts

“Figura: A Tool for the Collaborative Editing of Non-nesting Content”
Rafael Alvarado Princeton University alvarado@princeton.edu Sarah-Jane Murray Princeton University sjmurray@princeton.edu

Figura is a web-based database application designed to support the Charrette Project at Princeton University (Uitti 1997). In particular, it supports the work of creating a TEI-compliant critical edition of the Old French manuscript tradition of Chrétien de Troyes’s Le Chevalier de la Charrette. The application addresses the shortcomings of a traditional, purely document-centric approach to humanities computing applications (described below) through the use of a database management system that acts as a pre-processor for marked up documents. The motivation behind the use of a database has not been to supplant the primary role of XML and the TEI in the development of digital critical editions, but rather to avoid having to employ an “extreme markup” solution that would further complicate an already difficult set of technologies. Instead, Figura targets the black box of traditional humanities computing applications-the indexing engine-and replaces it with something that is more accommodating to the specific needs of scholars, all the while keeping document encoding standards intact. The traditional approach to humanities computing applications is one in which primary source materials are marked up using a textual encoding standard, such as the TEI (Sperberg-McQueen, Burnard et al. 1994), in order to produce “thick documents” that contain both primary and interpretive content in a single document or collection of documents. These documents are then made available for use by the scholarly public by means of a transformation engine that can generate content in a standard, viewable format, such as HTML, and an indexing engine that will allow the document web to be searched, presumably taking advantage of the rich markup found in the source documents. A primary task of the Charrette Project has been to encode a set of rhetorical figures—e.g. instances of chiasmus, adnominatio, enjambment, etc.—in the critical edition of the text (Uitti and Foulet 1989). Because of their large number and radically non-nesting character, however, the prospect of directly encoding the figures into the text, using the technique of segmenting and splicing elements, has seemed impractical at best. In addition, the Charrette Project has been collaborative from the outset, involving the work of many editorial assistants, both over time and at any given time. Each editor has been in charge of a figure type, and has been responsible for locating figure instances in the entire document. In the traditional approach, this division of labor would have to be carried out serially, and therefore the length of time required to complete the project would multiply by the number of assistants involved. Each of these problems was solved through the use of a database to store the textual content of the Foulet-Uitti edition. The problem of encoding non-hierarchical and multiple hierarchies of content objects using markup technologies such as SGML and XML is well documented (Renear, Mylonas et al. 1996; Alvarado 1999; Sperberg-McQueen and Huitfeldt 1999). In areas where this problem cannot be ignored, such as the analysis of qualitative data and discourse, the principle of standoff markup has been developed by several groups (Thompson and McKelvie 1997; Müller and Strube 2001; Glass and Eugenio 2002). Figura employs a methodology similar to that of these examples—which makes sense, given the kinship between discourse analysis and rhetoric—but makes use of a relational database, rather than a collection of documents, to store the links between textual elements and their affiliations in figural elements. The advantage of this approach is that editors are freed from the well-formedness constraint of XML. Once the data is entered, an algorithm may be applied to the join the source document and its non-nesting elements, to produce well-formed XML—or even MECS—using a variety of techniques, such as automated splicing or CSS grouping. Standoff markup also allows for rich, secondary content to be stored independently