Abstract
Access to audio collections is often restricted by institutions for copyright,
privacy, and preservation reasons, but it is the lack of descriptive metadata and
annotations that stands in the way of all levels of access and use. Libraries,
archives, and museums (LAMs) often hold physical audio artifacts that are unmarked
and lacking important identifiable information such as title, date, location,
subject, participants, or context. Annotating is only one of a list of scholarly
primitives including discovering, comparing, referring, sampling, illustrating, and
representing [Unsworth 2000]. IIIF (International Image
Interoperability Framework) is one standardized solution that LAMs have adopted for
giving users the ability to perform these primitives with images held in cultural
heritage institutions. The AudiAnnotate project builds on the new IIIF standards for
AV to address the gaps in engaging with audio by developing a solution to bring
together free audio annotation tools and the Web as a standardized collaboration and
presentation platform. The AudiAnnotate use case presented here includes a
presentation by Tanya Clement titled “Zora Neale Hurston's WPA
field recordings in Jacksonville, FL (1939)” which provides context to
three recordings of Hurston created during the Works Project Administration Federal
Writers Project from 1937-1942 and made available online at the Library of Congress
as part of the Florida Folklife Collections Florida Memory (FM) project.
Introduction
In recent years, increased concern over media degradation and obsolescence combined
with the decreasing cost of digital storage has led libraries, archives, and museums
(LAMs) to digitize audiovisual (AV) materials for improved access and long-term
preservation. Yet, improving preservation and access must go far beyond digitization.
The fact that digital AV collections are not well-represented in our national and
international digital platforms, such as Europeana and the Digital Public Library of
America (DPLA), demonstrates complicated factors surrounding how LAM institutions
manage and facilitate access to digital surrogates. As of July 2020, Europeana
comprised 55% images and 42% text objects, but only 1% sound objects and .5% video
objects [
Europeana]. DPLA included 67% images and 32% text, with less
than 1% sound objects, and moving image objects [
Digital Public Library of America]. AV collections
often include lectures, panels, and speeches; performances such as story-telling,
oral histories, and poetry or dance performances; and other documentary AV historical
artifacts. Yet, even while they are sometimes the only record of an event or an
aural, visual, or performance tradition, AV digital artifacts remain underused and
understudied. The goal of the AudiAnnotate Extensible Workflow (AWE) project is to
accelerate access to, promote scholarship and teaching with, and extend understanding
of significant digital AV collections in the humanities.
The State of Recorded Sound Preservation in the United States: A
National Legacy at Risk in the Digital Age (2010) by the Council on
Library and Information Resources and the Library of Congress reports that if AV
collections go unused, libraries and archives that hold AV collections from a diverse
range of time periods, cultures, and contexts will not prioritize their preservation
[
CLIR 2010]. One successful response to the 2010 report has been the
Radio Preservation Task Force (RPTF) of the Library of Congress. Created 2014, the
RPTF’s primary goal has been “to support collaboration between
faculty researchers and archivists toward the preservation of radio
history” by developing an online inventory of extant American radio
archival collections and pedagogical guides for utilizing radio and sound archives
[
Radio Preservation Task Force]. While significant work, this kind of inventory only
provides surface-level access to limited information about some artifacts. A
persistent lack of descriptive metadata about the content of AV materials continues
to stand in the way of further levels of access and use.
While increasingly more AV objects might be digitized, under-resourced LAMs must
still spend valuable human labor listening to or watching AV media in real time to
generate the basic metadata required to make these items indexable, searchable, and
accessible online. LAMs often hold physical media artifacts that are unmarked and
lacking important identifiable information such as title, date, location, subject,
participants, or context. Beyond creating access and discovery points for
researchers, this basic information can help LAM professionals organize these
materials as well as decide whether there are cultural sensitivity, privacy, or
copyright concerns at play in creating access to them. Generating the needed metadata
is prohibitively time-consuming, and automatic, machine-generated metadata is an
expensive process still very much in research and development and certainly not
accessible to all.
[1]
Even with simple metadata, AV materials may not include enough information to pique
researcher and student engagement. Annotations are what John Unsworth has called
a scholarly primitive — an essential humanities method for adding
context and meaning to cultural objects of study for use in research, teaching, and
publication [
Unsworth 2000]. Researchers annotate books when they are
taking notes; students annotate print-outs of poems when they are discussing them in
class; friends annotate faces on images on social media when they are trying to
direct attention to a person on their post. With AV materials, users may want to
annotate particular events such as when a speaker is speaking and who they are; the
presence of chickens, gunshots, helicopters, or feedback from the crowd for a better
sense of context; or when a speaker laughs, sings, yodels, plays an instrument, or
switches languages in order to understand the genre of or audience for a performance.
Annotations have been the basis for engaging audiences with cultural objects from the
era of monks creating commentary on medieval manuscripts to current online scholarly
pages, editions, and exhibits [
Clement and Fischer 2021]. Sometimes, if an AV
object is not available online, annotations can provide context, like being able to
read liner notes for a missing album. Further possibilities for access include the
ability for scholars, students, or the public involved in larger projects across
institutions to systematically, collaboratively annotate or the ability for LAMs to
showcase user annotations by including them back into their digital asset management
(DAM) systems. Presently, however, even when AV materials are made accessible by LAM
institutions, these digital objects remain inaccessible for annotation and therefore
inaccessible for learning, public comment, scholarship, and general use.
Annotating is only one of a list of scholarly primitives including discovering,
comparing, referring, sampling, illustrating, and representing [
Unsworth 2000].
[2] IIIF
(International Image Interoperability Framework) is one standardized solution that
LAMs have adopted to give users the ability to perform these primitives with images
held in cultural heritage institutions. Comprising 56 global members including major
research universities, national libraries, and world-renowned museums, archives,
software companies, and other organizations, the IIIF Consortium has worked together
since Fall 2011 to create, test, refine, implement, and promote the IIIF
specifications for interoperable functionality and collaboration across repositories.
IIIF uses linked data and W3C web standards to facilitate sharing digital image data,
migrating across technology systems, and using third-party software to enhance access
to images, allowing for viewing, zooming, comparing, manipulating, and working with
annotated images on the Web. With IIIF, users can reference images linked from LAMs
into software that allows them to manipulate the images in new ways without impacting
the institution’s presentation of the item. Universal Viewer, for example, which has
been under development by Digirati since 2012, is a community-developed open source
project that allows users to zoom into an image using the IIIF image application
programming interface (API), create annotations, and generate links to the zoomed,
annotated regions. As a result, a user can focus on one part of Vincent Van Gogh’s
painting
Irises at the J. Paul Getty Museum and annotate
a particular brush-stroke. She can save this view, compare it against another part of
the painting or another painting, and share this view with others. Storiiies is
another project that demonstrates how third-party software can help users engage
images at holding institutions to generate digital stories. In both cases, the
institutions’ use of IIIF allows researchers to implement a broad range of online
tools to discover, compare, refer, sample, illustrate, and represent their
interpretations of these cultural heritage objects, which in turn encourages their
broader use.
As of June 2020, the IIIF-AV Technical Specification Group has extended the existing
IIIF Presentation API (version 3) to accommodate rendering AV in a web browser. The
AV group is actively welcoming contributions to their collection of AV user stories,
mockups, and prototypes in order to make sure IIIF-AV is used by a broad audience.
While IIIF has shared use cases for different kinds of AV manifests, including for
album covers, oral histories, multi-track recordings, and AV with sign language,
tools for exposing and playing these manifests are still under development.
[3] Freely available, Universal Viewer displays
annotations as captions on AV materials, but the annotations cannot be used to
navigate the object or be shown separately from the AV object, a necessity for
oft-restricted AV materials. The AWE project builds on these IIIF accomplishments,
addressing the gaps in engaging with AV by developing a solution to bring together
free AV annotation tools and the Web as a standardized collaboration and presentation
platform.
In response to the need for a workflow that supports IIIF manifest creation,
collaborative editing, flexible modes of presentation, and permissions control, the
AudiAnnotate Extensible Workflow (AWE) connects open source tools for annotation
(such as Audacity), public code and document repositories (GitHub), and the
AudiAnnotate web application for creating and sharing IIIF manifests and annotations.
Usually limited by proprietary software and LAM systems with restricted access to AV,
researchers can use AWE as a complete sequence of tools and transformations for
accessing, identifying, annotating, and sharing annotations. LAMs will benefit from
AWE as it facilitates metadata generation, is built on W3C web standards in IIIF for
sharing online scholarship, and generates static web pages that are lightweight and
easy to preserve and harvest. AWE represents a new kind of AV ecosystem where the
exchange is opened between institutional repositories, annotation software, online
repositories and publication platforms, and researchers.
Architecture
The AudiAnnotate web application architecture is lightweight. Researchers create
their own time-stamped annotations, provide a URL to the AV item, and upload the
annotations to the AudiAnnotate application, which creates a static site that
includes a playable edition or exhibit where the artifact, the annotations, and any
introductory or other explanatory material can be viewed together. As a Ruby-on-Rails
application that does not store data locally, the AudiAnnotate application eliminates
the need to run a database or datastore. The application can be installed on a small,
cloud instance, a 1GB Linode shared instance costing $5 USD per month. A further
advantage to a databaseless architecture is that if the AudiAnnotate web application
goes offline, the artifacts it produces will still be available on GitHub. Users with
a basic understanding of Jekyll or Markdown can edit AudiAnnotate-created sites
without the web application running at all. Furthermore, any other installation of
the AudiAnnotate web application will have access to the same data created by
previous installations, allowing projects to be transported from one institution to
another without data migration.
The AudiAnnotate workflow (AWE) is built on what coders call glue code —
code that sticks a bunch of things together to make something useful. Annotation
tools such as Audacity give users robust AV analysis and labelling tools from which a
simple text file can be exported and used in AudiAnnotate. The AudiAnnotate
application wraps references to online AV files and annotations in IIIF manifests to
make them presentable in the Universal Viewer and publishes the sites using Jekyll to
generate static pages. Even the app itself takes advantage of the Ruby on Rails web
framework, Open Source libraries for authentication and displaying tables, and GitHub
to store everything it produces.
Disadvantages
Dependency on GitHub is the biggest limitation on the process of producing
AudiAnnotate sites. Since AudiAnnotate relies not only on Git but on the GitHub API,
producing new AudiAnnotate sites would be impossible without GitHub. In addition,
legal sanctions prevent scholars in Iran, Cuba, and other countries from using GitHub
[
GitHub and Trade Controls]. However, once the AudiAnnotate application creates a
static site that is published to GitHub, that site does not have to stay on GitHub. A
git clone command or a zip download copies the site to a local
computer, after which it can be run locally, served from any web server, or stored in
a digital preservation system.
While the system works well when only a few people are building AudiAnnotate sites,
its reliance on GitHub can be a bottleneck when larger groups like a workshop or
classroom all attempt to build sites at the same time. AudiAnnotate’s first workshop
ran into a “Too many requests” error from GitHub because participants were
pinging the GitHub API to simply list their existing AudiAnnotate projects. This was
a limitation of the “no database” architecture — the AudiAnnotate
app had no data to show without a connection to GitHub. Consequently, the
application’s GitHub API connection was updated to use the authenticated user
everywhere possible, which reduced the number of requests via the unauthenticated API
connection. While the issue was resolved for the second workshop, delays still
occurred in responses to API requests and when building sites. Because GitHub did not
return a web page immediately when a request was kicked off, some users experienced
errors. Ultimately, using the freely available services of a corporate-funded system
like GitHub can be useful, especially for projects and teams with less funding, but
it is one that comes with limitations.
AudiAnnotate Use Case Examples
The use case examples presented here include four kinds of projects that demonstrate
how AWE provides new kinds of access to audio and video artifacts. The first example
is an audio scholarly edition by Tanya Clement titled
“Zora Neale Hurston's WPA field recordings in Jacksonville, FL
(1939),” which includes three recordings created during the Works
Project Administration (WPA) Federal Writers Project from 1937-1942 and made
available online at the Library of Congress as part of the
Florida Folklife Collections’ Florida Memory project. Zora Neale Hurston was an
African American ethnographer, novelist, and dramatist, a collector, speaker,
performer, and writer of other people’s stories and her own. In the 1930s alone,
Hurston wrote numerous short stories, journal articles, books, and musicals based on
her ethnographic field work in Alabama, the Bahamas, Florida, Georgia, Haiti,
Jamaica, and New Orleans.
[7] These specific recordings
of Hurston performing songs she had collected were created on June 18, 1939 in the
WPA offices in Jacksonville, Florida under the direction of Herbert Halpert with
Carita Doggett Corse and Stetson Kennedy. In order to make Hurston’s performances
more accessible, the Florida Memory project extracted 21 moments when Hurston sings
and talks from the longer recordings.
[8] In
contrast to the Florida Memory abridged “playlist,” Clement’s scholarly edition
is based on the full recordings and, consequently, facilitates listening to Hurston
sing and talk within the context of the other songs, stories, and people recorded
that day. Including stories shared by Beatrice Lange, which were told to her by the
descendent of a rice plantation owner from South Georgia as well as Art Pages, the
pianist for a Cuban band, and Rev. H. W. Stuckey of South Carolina, Buford County,
the edition shows that listening to Hurston’s performances in context is important.
As an African American woman and ethnographer from Florida, educated at Columbia
University as well as in turpentine camps and juke joints in the Jim Crow South,
Hurston played multiple roles in the WPA office as both collector and performer, as
subject and object of inquiry, and as an authorized and unauthorized agent of the
narrativization of African American folklore. The presentation of Hurston’s songs in
the context of the other performances allows Clement to highlight conversations with
Herbert Halpert, the white male lead and authorized “collector” as well as
Carita Doggett Corse, Hurston’s white, female benefactor, and the songs and stories
of the other performers, both white and Black, male and female, who provide a complex
picture of the racialized and gendered endeavor that was folklore collecting in
Florida in 1939.
The second example,
“Example Sensitive Audio Lesson: John Beecher, McComb
‘Criminal Syndicalism’ Case,” is a lesson plan on using sensitive
archival materials in the classroom developed by Bethany Radcliff and Kylie
Warkentin. Radcliff and Warkentin base the lesson plan around a 1964 recording of a
Civil Rights activism event from the John and Barbara Beecher Collection at the Harry
Ransom Center (HRC) at the University of Texas in Austin. On the HRC website, the
recording had been made accessible without any trigger or other warnings. While this
recording highlights the voices of community activists, it also includes racist
slurs, descriptions of imprisonment of Black high schoolers, and testimonies from
concerned parents. In an attempt to practice trauma-informed pedagogy and avoid
replicating oppression, the authors use the apparatus that AWE provides to present
this audio within a context that guides students and instructors in their approaches
to the conversation. A “Lesson Introduction and Overview” for the instructor
gives resources on trauma-informed pedagogy, an overview of how to create a GitHub
account and AWE project, and a full lesson plan for engaging students in audio
annotation and working with sensitive materials. The next sections, “1. Considering
‘Criminal Syndicalism’ Audio,”
“2. Analysis and Potential Annotation Routes,” and “3. Further Analysis and
Annotation in Groups,” comprise different activities for students including
providing an annotated, time-stamped version of a clip from the audio, marked with
trigger warnings for sensitive sections, and activities to lead students through a
collaborative process of critical analysis with audio using annotation. The final
sections “4. Presentation” and “Extension Activity Using Hypothesis” focus
on using AudiAnnotate to present findings and introduce how to integrate Hypothes.is,
a third-party web browser plug-in annotation tool that can be used with AWE projects
to promote further discussion through collaborative annotation.
The third and fourth use case examples are graduate student essays that demonstrate
how the AWE workflow works with video artifacts and how the process of annotation can
shape scholarship. “
The Kindergarten Teacher” by Zoe Bursztajn-Illingworth is an
investigation of poetic voice and address on screen in Sarah Colangelo’s film
The Kindergarten Teacher (2018). The project shows that
annotating scenes from the film shaped the author’s dissertation chapter, “Right Voice, Wrong Body:
The Kindergarten
Teacher, Poetic Address, and Voice as Possession.” Through the
process of annotation, the author observed how the film’s form reveals poetic voice
as dialogic and public as opposed to a monologic, private utterance as lyric theory
often proposes. In this multimedia essay, sections include scenes from the film with
timestamped annotations alongside prose sections where the scenes are discussed
further in the larger context of film and poetic theory.
The final use case example, “
Camile 1921”
by Janet Reinschmidt, includes annotations for the film
Camille (1921) that the author used as part of a master's thesis on
reception studies and queer interest in early Hollywood film. Again, this author
provides access to
Camille[9] with annotations that
mark key scenes of interest as well as a discussion of the influence that annotation
had on their scholarship. During the annotation process, the author’s perception of
silent and early sound film shifted as they re-watched scenes dozens of times for
minute details easily overlooked by audiences. The author discusses how this process
helped them reconsider the editing techniques and industrial shifts in film editing
of a hundred years ago. This project also deploys Hypothes.is as a means to invite
public comment. The author starts the public conversation through Hypothes.is by
creating her own Hypothes.is annotations as examples, including notes on background
information on lesbian cinema history and on the role of production collaborators for
the filming of
Camille. For this author, the ability to
facilitate and invite larger discussion around
Camille
means their project “is as much about the preservation of these
endangered silent films as it is about my thesis research”
[
Reinschmidt].
These four projects (a scholarly edition, a lesson-plan, and two essays) show that
increasing the use of AV in research and teaching — and therefore its preservation —
is about more than just creating access.
The State of Recorded
Sound Preservation in the United States: A National Legacy at Risk in the Digital
Age includes a survey of scholars whose work is primarily with audio and
concluded that scholars wanted unfettered access and better discovery tools for
deep listening or “listening for content, in
note, performance, mood, texture, and technology”
[
CLIR 2010, 41]
[
CLIR 2010, 157]. The report also suggests that training for
archivists and librarians in sound preservation must include
critical
listening skills [
CLIR 2010, 147] because librarians and
archivists need to know what scholars and students want to do with sound artifacts in
order to make these collections more accessible. AWE facilitates such rich projects
because it enables collaborative AV annotation and presentation through a minimalist
computing workflow that depends on preexisting, free annotation software and on
standardized web protocols the use of which is shared by free, community-rich
platforms and tools such as GitHub and Hypothes.is. AWE is a sustainable and
easy-to-use method that enhances how researchers and students are able to deploy
scholarly primitives that, as the use cases show, go beyond annotation to include
discovering, comparing, referring, sampling, illustrating, and representing.
Increasing the use of AV materials and their preservation requires facilitating the
production of new knowledge in sustainable ways.