DHQ: Digital Humanities Quarterly
Volume 15 Number 1
Audiated Annotation from the Middle Ages to the Open Web

Tanya E. Clement <tclement_at_utexas_dot_edu>, University of Texas at Austin
Liz Fischer <lfischer_at_utexas_dot_edu>, University of Texas at Austin


Current theories about the significance of annotations in literary studies are based primarily on assumptions developed in print culture about verbal texts. In these textual theories, the text is typically present, authorized, and centralized as the ideal text for an ideal reader, and to annotate is to add authorized comments in a sociotechnical system that includes publication, dissemination, and reception. To audiate is to imagine a song that's not playing. In music learning theory, audiation is based on the concept that the musician learns to play music by developing their own musical aptitude, her individual interpretation of a musical score based on her particular experience of the music. This short article introduces audiation as an alternate theoretical framing for articulating the significance of personal literary annotations. Comparing commentary on psalms in the Middle Ages to IIIF (International Image Interoperability Framework) web annotations, we use the concept of audiation to situate annotations within literary study in terms of a more capacious understanding of the individual's interpretation of text and of the reading experience as part of an unauthorized, distributed, and decentralized system. By bringing together theories and technologies of annotation with sound, we offer the concept of audiated annotations as a means to re-evaluate modes of access, discovery, and analysis of cultural objects in digital sound studies.

Annotations in Textual Theory (paratext, marginalia, metadata)

Scholars have long theorized annotation in the creation and analysis of literature as paratext, marginalia, and mark-up [Audenaert 2010] [Bernstein 1998] [Bernstein 2011] [Bradley and Vetch 2007] [Bray et al. 2000] [Hillesund 2010] [Jackson 2001]. Paratext includes peritext, such as those materials in the interstices of the book, such as chapter titles or notes, and epitext. Epitexts are other texts outside of the central text that influence the ideal or typical reader's interpretation, such as "interviews, conversations, and confidences" (Genette 10). Revealing the political and social perspectives about the "ideal" reader that such contexts engage, Genette includes "contextual paratexts" in this category that include politicized information about the author, such as "Proust's part-Jewish ancestry and his homosexuality" (8). Not surprisingly then, paratext is authorized by the author or by his (in this case) social demographics or discourse community: it "is always the conveyor of a commentary that is authorial or more or less legitimated by the author", Gennette writes (2). Indeed, "by definition, something is not a paratext unless the author or one of his associates accepts responsibility for it" (9). Authorized by the surrounding discourse community, paratext is also authorized by the text itself through its physical association. Paratext is "situated in relation to the location of the text itself: around the text and either within the same volume or at a more respectful (or more prudent) distance" (4). Genette's concept of paratext essentializes authorial intention and the immediacy of the textual object.
Marginalia in literary study is also interpreted based on its proximity to the text as well as the extent to which it is authorial or authorized. H.J. Jackson argues that marginalia of significance includes notes inside of books, not outside of books, citing "significant differences between notes made on separate sheets of paper or in a notebook and notes made in the book that becomes part of the book and accompany it ever after" [Jackson 2001, 14]. Beyond its locative status, marginalia is more or less important if the person creating the notes is "authorial." According to Jackson, notes by the authors themselves are the most significant, then marginalia by other authors, of equal or greater literary importance, and finally, granted the least status, are general readers' notes: "Our own notes we like, or have learned to live with," Jackson writes, "those we resist are always written by somebody else" (235). Marginalia plays a minor role in textual theory by reflecting an association with other authors in reception theory and in a history of reading or as biography when the author corrects writing about themselves (243), but Jackson generally argues that marginalia has not been considered significant enough to study because marginalia is generally non-authorial and often ephemeral, not physically attached to the authorized text.
Metadata, or data about data, are also significant forms of annotation in recent literary study, deemed less and more important based on their authority and textual proximity. In the digital realm, activities such as searching and retrieving texts in library systems, sharing scholarly or pedagogical work with students and researchers, and using artificial intelligence and machine learning to discover patterns are activities that share a common reliance on metadata. Much like paratext in the publishing industry, metadata has more official functions in libraries (for access) and archives (for context) [Gilliland 2008, 2–3]. In both cases, metadata is information that is lacking in the "information object" within a sociotechnical system. In the library setting, metadata might include the author name or genre information, which is gathered in order to facilitate finding that object. In an archive, metadata might include a previous researcher's notes, which can provide important contextual clues for future researchers. In general, Gilliland notes that "[i]n all these diverse interpretations, metadata not only identifies and describes an information object; it also documents how that object behaves, its function and use, its relationship to other information objects, and how it should be and has been managed over time" [Gilliland 2008, 7]. Consequently, metadata's function is entangled with making an "information object" system-aware, whether that system is a human-readable metadata standard or a technological process. In contrast, unauthorized, "user-generated" metadata such as community-generated "folksonomies," while a nice record of general user's experience, are system-adverse since such metadata often do not fit with the established socio-technical system at hand. Indeed, such out-of-system metadata is "idiosyncratic" and, therefore, "untrustworthy", Gilliland argues, because it can "negatively affect interoperability between metadata and the resources it is intended to describe" [Gilliland 2008, 8–9]. Like paratextual and marginal annotations, metadata has been considered meaningful in literary study when they are authorized and in proximity to the text. [1]
In each of these examples (paratext, marginalia, and metadata), the sociotechnical systems in which annotations circulate represent discourse fields where authority is crucial to the significance or signifying capacity. Yet, there are other, under-theorized examples in literary study in which annotations reflect the individual, unauthorized reader's interpretation of an absent text, and the reading experience is part of an unauthorized, distributed, and decentralized system. Below, we discuss two seemingly disparate examples of such annotations across time, in the context of medieval psalm commentary and open web IIIF (International Image Interoperability Framework) standards for annotations. We are calling these audiated annotations to emphasize the three principles these kinds of unauthorized and extra-textual annotations share: namely, what we are calling audiated annotations are often (1) self-described and independent, removed from the object of comment itself and reflecting a textual condition [McGann 1991] in which (2) annotations are understood as compound objects that are (3) embedded in a particular, user-generated reading experience rather than an authorized, ideal reading experience.

Annotations in Medieval Literary Culture

In pre-print, medieval literary culture, annotations still took place next to full texts, but commentary forms were not reliant on the centrality of an ideal text. In the medieval tradition, orality and aurality were central to literacy, and psalm commentaries circulated in an unauthorized, distributed, and decentralized community of texts, readers, orators, and listeners. For Benedictine monks, in particular, Psalms were a major part of medieval monastic life, and weekly recitation of the psalms was recommended. Like later proteges learning to audiate using Edwin Gordon's theories of musical education [Gordon 2007], an ability to memorize the psalms was seen as an early indication of intellect among monks in training [Dyer 1989]. Psalms were not simply recited like other prayers and readings; psalms were nearly always sung. As a natural consequence of years of daily recitation, monks were expected to have the verse and tune of all 150 psalms memorized.
Consequently, the practice of audiation, of using inner-hearing to imagine what a song sounds like, was key to medieval psalm-singing. In medieval devotional practice, there is a concept of "the inner senses" which operate separately from, but are related to, the physical senses of sight and sound, an inner sense of sight at the origin of the phrase "the mind's eye." Beth Williamson discusses the way the physical sense of sight and the inner sense of hearing work together in medieval music, especially psalm-singing (2013). Psalms often have two sections between which is a pause, represented on the page as a space and musically as a breath. Williamson says this pause is not an absence, but a shift in the site of meaning:

[A]t such a point, the music may not be sounding, but it has not stopped. The singers are aware, within their own interiority, of its continuation, and though they do not hear in their physical ears they hear it still inwardly. In this moment of silence, music does not disappear, but functions temporarily — and temporally — on a different level.  [Williamson 2013, 31]

What Williamson describes is similar to Gordon's descriptions of audiation — the presence of meaning (the concept and construct of music) in the absence of sound. While Williamson regards that state of inner hearing as temporary in the moment of performance, the implication is that the singers, like Gordon's students, hear the psalms when reading the text.
Annotations were common in the psalm commentary tradition. The psalter was the most commented upon book of the Middle Ages, and all monks would have had access to at least some kind of commentary in their library (Dyer 1989). There are several ways these commentaries are presented on the page. The most standard presentation of medieval commentary is the way the Glossa ordinaria (the standard biblical commentary) is usually written: the main text is in one column in a large script, with commentary in the surrounding margins in a smaller script. Privileging the main text, this layout looks much like texts today. Because the practice of audiation was a common mode of interacting with the psalms, other commentary forms perform audiatated annotations.
The popular psalm commentary of Gilbert of Poitiers, for example, privileges the commentary over the primary text. The Gilbert Psalter comes in two layouts: cum textu ("with the text") and catena ("chain"). In the cum textu format, the page is divided into two columns: the inner column (near the spine) for the main psalm text, and the outer for the commentary. In this format, the relative width of the columns is adjusted, and the main text is sometimes abbreviated to ensure the main text and relevant section of commentary stay in sync (Salomon 2012, 43). This layout puts the main text and the commentary on a more equal status: the main text is still usually larger, but takes up less of the page, is not centered and may be altered to accommodate the commentary. Unlike the cum textu format, the catena format of the Gilbert Psalter places more emphasis on the commentary. The page is still divided into two columns, but the commentary occupies both. When the commentary for a new verse starts, the first few words of the psalm text are given in extreme abbreviation. Aside from the first new words, the psalm text itself is absent from the page. Theresa Gross-Diaz describes the appearance of this layout in her study of the Gilbert Psalter as follows:

[T]he first words of the verse given in full, the end of the text sometimes disintegrating into a string of initials in the interest of economy of space, time, and parchment. Despite this interpolated repetition of the psalms in this 'simple' format, one would be hard-pressed to reconstruct each psalm from the lemmata provided, since the order of words and even of verses is often scrambled beyond recognition.  [Gross 1996, 48]

Such extreme abbreviation of a commented-upon text is only possible if the reader either has a separate copy of the text to use side-by-side or can call the text to mind with minimal prompting. In the case of the psalms, the latter is more likely: as discussed, readers who memorized the psalms as text and as sound encounter the psalms aurally with the mind's ear. A medieval reader who knows his psalms coming to Gilbert's commentary does not need the psalm to be present on the page or audibly because it is present in the mind.
Gilbert's Psalter offers an early example that demonstrates how commentaries are at a remove from the text through extreme abbreviation, but also how these audiated annotations function as compound objects that reflect a particular, rather than a general, reader's experience. Commentaries in the catena layout are "chains "not only in the sense that they move on the page as an unbroken string of commentary but also in the sense that they link together previous commentaries. Where a Glossa ordinaria-style commentary isolates the words of each commentator — in one corner what Augustine said, in another corner what St. Hippolytus said — the catena makes a new, continuous commentary text by pulling together pieces of existing, multi-authored commentaries. As David Salomon says, a catena's author "joins the links in the chain but does not necessarily have a hand in constructing those links themselves" (47). It is important to note that Gilbert's Psalter is not, according to Saloman and Gross-Diaz, a "true" or typical catena for this reason since the commentary seems to be his own rather than just pieces of existing commentaries. Finally, catena psalm commentary is embedded in a particular rather than a general reading experience. While psalm commentaries, like Gilbert's, are sometimes "authorized" in that they were widely read and copied, some catena texts were unauthorized, created by individuals for their private use and not widely copied or, currently, discoverable (Salomon 2012).

Annotations in IIIF

Today, the most ubiquitous audiated annotations are web-based. Audiated (unauthorized and extra-textual) annotations in open Web standards such as the IIIF (International Image Interoperability Framework) extend the use, shareability, and accessibility of online cultural artifacts. The IIIF consortium adopts the principles of Linked Data and the Architecture of the Web via a Shared Canvas data model and the use of JSON-LD "in order to provide a distributed and interoperable framework" [Appleby] for the presentation of Web content. Essentially, linked data on the Web are interrelated--they are data that refers to and "are aware of" other similar data — making semantic queries across platforms more productive and useful. The IIIF standard places particular emphasis on facilitating the creation of links (or references between bits of data) that are unauthorized or user-generated annotations of content because often, such contextual information is not well-described by current metadata schemas, especially in the context of cultural heritage institutions such as libraries, archives, and museums. The "Introduction to IIIF" [Crane 2017] claims:

While a multitude of different standards and practices are expected and even desirable for descriptive metadata, they do nothing for the content itself. There has been no standardized way of referring to a page of a book, or a sentence in a handwritten letter, from one digitised collection to the next. Descriptive metadata standards don't help us. It is not their job to enable us to refer to parts of the work, down to the tiniest detail - interesting marginalia, a single word on a page - and make statements about those parts in the web of linked data. It is not their job to present content, or share it, or refer to it.

This statement functions as a kind of manifesto for a distributed and unauthorized annotation environment that is not beholden to the kind of authorizl, often ideal-text-centric, metadata standards on which library, archive, and museum systems typically depend.
In IIIF, the manifest is the primary document. The manifest is a plain text file written in JSON that privileges a reader's perception of how an object should be presented on a Web page. Manifests can be created and shared by institutions and read by presentation software, but IIIF manifests can be created or copied and reshared by readers who may wish to reorient how that object is presented online. By referencing or creating links to only the tiniest detail of an object such as an image or an audio file (the brightest star in Van Gogh's "Starry Night" or one phrase in a poem spoken by Maya Angelou), that reader can create a manifest that reorients completely how an object is read or accessed. In a IIIF manifest, all of the instructions about how the object should be presented are conceptualized as annotations on a canvas. Even what we might consider the main object of study — an image of the page of a book, a photograph of a painting, or a snippet of sound or video — is noted in the JSON manifest as an annotation to this canvas of the reader's mind. In this way, the idea of the idealized text is reoriented toward a privileging of the reader's instructions in the manifest about the presentation of that object.
The IIIF manifest is a capacious document, containing multiple links brought together to create a particular reading, viewing, or listening experience; it reflects the object as constituting many parts, as a composite. For example, the manifest for a particular presentation of a medieval manuscript might include a canvas that links images of every manuscript page and the binding, multispectral images showing text that had been erased and written over, transcriptions, and explanatory notes that refer to each. This textual constellation, linked from the manifest and presented on the Web page seamlessly by software, may or may not be created or owned by the same people. Pieces of manuscripts that were cut apart and sold to different libraries can be reunited virtually on a new page as directed by a IIIF manifest.[2] If a reader's primary object of interest is the digitized Gilbert Psalter manuscript in the Parker library, they can create annotations describing the large, decorated initials in the book, and present those annotations on the Web using IIIF with or without the manuscript image. Without the image, the reader would not see the illuminated initials, but audiated annotations describing them can still be shown in spatial relation to one another. With IIIF, readers can create audiated annotations for the present absent text. In both Medieval and online cultures, audiated annotations circulate as composite, unauthorized, and decentralized objects for study.


In the digital environment, collections that might include manuscripts or musical, spoken, or bioacoustical artifacts will require audiated annotations to be discoverable. Often, for privacy or copyright reasons, audiovisual cultural heritage objects such as historical audio and film are not freely available online. In the analog world, without annotations, we cannot find or know what is in or on a sound or image artifact unless someone has annotated a name on the back of a polaroid or on a written label on an audio reel or a cassette tape. Similarly, without metadata or descriptive information embedded in or associated with a digital file, we cannot search for or discover that object. As a result, audiated annotations — annotations that are unauthorized, decentralized, and composite — sometimes serve as the only access point into important cultural objects in literary study.
Likewise, annotations on an audio object that may never circulate freely for copyright or privacy reasons can be described in temporal relation to that absent object and shared widely, like a playlist on an old mixed tape or liner notes on an album that points to and tells us more about the present, absent content. Such community-based, unauthorized sharing of scholarly annotations already exists in free and minimally produced scholarly editions using Jekyll to produce GitHub pages emphasized by scholarly editors who follow the tenets of Minimal Computing in DH including Minimal Editions (Minimal Computing n.d.), Wax (with IIIF-based static exhibits mimicking Omeka's functionality) (Nyröp n.d.), and the Versioning Machine (Schreibman 2015). The AudiAnnotate project is developing similar workflows for producing the same kind of community-based and composite annotations for audio (HiPSTAS 2020). This ability to share audiated annotations on an inaccessible object increases discoverability and that object's circulation in our cultural imaginary through scholarship, teaching, and learning. Untethered from a "main text", which is decentered as yet another annotated link on the IIIF canvas, readers can compile any Frankenstein canvas, that beautiful corpse.


[1] One exception is the history of conversations in the Text Encoding Initiative (TEI) surrounding stand-off markup. See [Spadini and Truska 2019], and [TEI 2003].
[2] For an example of this, see Lisa Fagin Davis' work with books "broken" by Otto Ege in the early twentieth century (Davis 2016).

