“Encoding Renditional Information in Primary Source Texts”

Julia Flanders Brown University Women Writers Project Julia_Flanders@brown.edu Paul Caton Brown University paul@swansong.stg.brown.edu

Introduction

Renditional information--by which we mean broadly any facts about the appearance, ornamentation, or layout of text on a page--occupies an uncertain position in current theories of humanities text encoding. At the heart of the problem lie longstanding philosophical questions about the nature of 'text'. Does a particular physical instantiation of a work combine an an essential component with non-essential features that can vary without altering the essence? On the surface it would seem that markup schemes where the defined tag set is weighted towards capturing structure rather than appearance (such as the TEI and most other SGML-based schemes) lend support to a practicable distinction between the 'essential text' and its renditional 'packaging' of any one instance (see Renear, Renear et al., DeRose et al.). On the other hand it might be argued that much of what we commonly consider renditional information--line-spacing, indentation, font family, use of italics, small caps, etc.--serves to impose structure and draw attention to particular types of content. It would follow that to capture the structure of a document together with identifiable content objects like quotes, foreign words, technical terms, and so on, makes it unnecessary to capture the renditional details per se. Many scholarly encoding projects, however, capture primary source data for quasi-archival purposes, and their transcription needs to supply information to people with a variety of critical interests and a corresponding variety of opinions as to what constitutes significant textual information. These projects have to confront the problem of dealing with visual information on its own terms (as opposed to treating it as a cue for structure and content). A clear methodological framework is essential even for a project with ambitions to capture all possible renditional information, let alone one with more modest and realistic goals; without such a framework it is impossible to determine what and how to record. This paper focuses on the problem of defining a rationale for the capture of renditional information: on what grounds do we decide what kinds of rendition to record? and if we want to use meaning as a way of deciding that rendition is important enough to record, what is the horizon of meaningfulness?

Methodological frameworks

The possible criteria by which renditional features will be deemed worthy or unworthy of capture emerge from a variety of different sources: some of them pragmatic, some deriving from aesthetic theory or literary criticism. Some of the most significant are listed below, and will be discussed in more detail in the finished paper. These do not represent mutually exclusive categories, but rather overlapping conceptual axes which may interact in various ways: for instance, the criterion of meaningfulness requires one to specify a user population for whom meaning is being defined (linguistic meaning? literary meaning? cultural meaning?). The criteria we will consider are as follows:

the criterion of use: what kinds of rendition need to be recorded in order to provide for the needs of an identified user population.
the criterion of meaningfulness: whether a given piece of renditional information affects our understanding or interpretation of the text.
the criterion of substantiveness, distinguished either from accident or from the decorative: whether information considered decorative or accidental contributes to our understanding of the text.
the criterion of measurability or perceptibility and the possibility of accuracy: under this criterion, differences which are too small or difficult to measure can be omitted; this also raises issues of the meaningful units of measurement.
the criterion of intentionality, in which case we need to ask whose intention: authorial? printer's house style? compositor's effort to get all the words onto the page?

A phenomenon like wrong-font letters might fare very differently depending on the criteria chosen: on the grounds of meaningfulness or substantiveness we might not record it, but on grounds of measurability and serving a certain user population (say, analytical bibliographers) we might well include it, and on the issue of intentionality the inclusion or omission would carry a strong theoretical message.

Creating a taxonomy of rendition

Developing a taxonomy of renditional characteristics, however simple, must be done by individual projects based on their own location within the approaches described above. If renditional information is to be used for any kind of processing, retrieval, or comparison, it must be described systematically using terms which identify the significant boundaries between phenomena (for instance, alignment and justification). It is also important (as with any kind of data capture) to decompose complex phenomena into their basic significant parts so that each may be described distinctly.

References

Stephen DeRose et al. “What is Text, Really?.” Journal of Computing in Higher Education. 1990. 1: .

Allen Renear. “Out of Praxis: Three (Meta)Theories of Textuality.” Electronic Text: Investigations in Method and Theory. Ed. Kathryn Sutherland. Oxford: , 1997.

Allen Renear et al. “Refining Our Notion of What Text Really Is: The Problem of Overlapping Hierarchies.” Research in Humanities Computing. Ed. N. Ide. Oxford: , 1995.