“Figura: A Tool for the Collaborative Editing of
Non-nesting Content”
Rafael
Alvarado
Princeton University
alvarado@princeton.edu
Sarah-Jane
Murray
Princeton University
sjmurray@princeton.edu
Figura is a web-based database application designed to support the Charrette
Project at Princeton University (Uitti 1997). In particular, it supports the
work of creating a TEI-compliant critical edition of the Old French manuscript
tradition of Chrétien de Troyes’s Le Chevalier de la
Charrette. The application addresses the shortcomings of a
traditional, purely document-centric approach to humanities computing
applications (described below) through the use of a database management system
that acts as a pre-processor for marked up documents. The motivation behind the
use of a database has not been to supplant the primary role of XML and the TEI
in the development of digital critical editions, but rather to avoid having to
employ an “extreme markup” solution that would further complicate an already
difficult set of technologies. Instead, Figura targets the black box of
traditional humanities computing applications-the indexing engine-and replaces
it with something that is more accommodating to the specific needs of scholars,
all the while keeping document encoding standards intact.
The traditional approach to humanities computing applications is one in which
primary source materials are marked up using a textual encoding standard, such
as the TEI (Sperberg-McQueen, Burnard et al. 1994), in order to produce “thick
documents” that contain both primary and interpretive content in a single
document or collection of documents. These documents are then made available for
use by the scholarly public by means of a transformation engine that can
generate content in a standard, viewable format, such as HTML, and an indexing
engine that will allow the document web to be searched, presumably taking
advantage of the rich markup found in the source documents.
A primary task of the Charrette Project has been to encode a set of rhetorical
figures—e.g. instances of chiasmus, adnominatio, enjambment, etc.—in the
critical edition of the text (Uitti and Foulet 1989). Because of their large
number and radically non-nesting character, however, the prospect of directly
encoding the figures into the text, using the technique of segmenting and
splicing elements, has seemed impractical at best. In addition, the Charrette
Project has been collaborative from the outset, involving the work of many
editorial assistants, both over time and at any given time. Each editor has been
in charge of a figure type, and has been responsible for locating figure
instances in the entire document. In the traditional approach, this division of
labor would have to be carried out serially, and therefore the length of time
required to complete the project would multiply by the number of assistants
involved. Each of these problems was solved through the use of a database to
store the textual content of the Foulet-Uitti edition.
The problem of encoding non-hierarchical and multiple hierarchies of content
objects using markup technologies such as SGML and XML is well documented
(Renear, Mylonas et al. 1996; Alvarado 1999; Sperberg-McQueen and Huitfeldt
1999). In areas where this problem cannot be ignored, such as the analysis of
qualitative data and discourse, the principle of standoff markup has been
developed by several groups (Thompson and McKelvie 1997; Müller and Strube 2001;
Glass and Eugenio 2002). Figura employs a methodology similar to that of these
examples—which makes sense, given the kinship between discourse analysis and
rhetoric—but makes use of a relational database, rather than a collection of
documents, to store the links between textual elements and their affiliations in
figural elements. The advantage of this approach is that editors are freed from
the well-formedness constraint of XML. Once the data is entered, an algorithm
may be applied to the join the source document and its non-nesting elements, to
produce well-formed XML—or even MECS—using a variety of techniques, such as
automated splicing or CSS grouping.
Standoff markup also allows for rich, secondary content to be stored
independently