INTRODUCTION
Close engagement with primary sources is foundational to humanities research,
teaching, and learning [
Falbo 2000], [
Schmiesing and Hollis 2002], [
Toner 1993]. We are fortunate
to live in an age of digital abundance: Over the past three decades, galleries,
libraries, archives, and museums (collectively known as GLAM institutions) have
undertaken initiatives to digitize their collection holdings, making millions of
primary-source archival materials freely available online and more accessible to more
people than ever before [
Rosenzweig 2003],[
Solberg 2012], [
Ramsay 2014][
Putnam 2016]. Increasingly, the
world’s myriad cultural heritage materials — architectural plans, books, codices,
correspondence, drawings, ephemera, manuscripts, maps, newspapers, paintings,
paperwork, periodicals, photographs, postcards, posters, prints, sheet music,
sketches, slides, and more — are now only a click away.
But interacting with digitized archival materials in a Web browser can be a
frustrating experience — one that fails to replicate the embodied engagement possible
during in-person research. As users, we have become resigned to the limitations of
Web-based image viewers. How we have to fiddle with Zoom In
and Zoom Out buttons to awkwardly jump between fixed levels of
magnification. How we must laboriously click and drag, click and drag, over and over,
to follow a line of text across a page, or read the length of a newspaper column, or
trace a river over a map. How we can’t ever quite tell how big or small something is.
We have long accepted these frictions as necessary compromises to quickly and easily
view materials online instead of making an inconvenient trip to a physical archive
itself — even as we also understand that clicking around in a Web viewer pales in
comparison to the rich, fluid, embodied experience of interacting with a physical
item in a museum gallery or library reading room.
As a step toward moving past these limitations, this research team presents a new
kind of image viewer. Booksnake is a scholarly app for iPhones and iPads that enables
users to interact with existing digitized archival materials at life size in physical
space (see figure 1). Instead of displaying digital images on a flat screen for
indirect manipulation, Booksnake uses augmented reality, the process of overlaying a
virtual object on physical space [
Azuma et al. 2001], to bring digitized
items into the physical world for embodied exploration. Existing image viewers
display image files as a means of conveying visual information about an archival
item. In contrast, Booksnake converts image files into virtual objects in order to
create the feeling of being in the item's presence. To do this, Booksnake
automatically transforms digital images of archival materials into custom,
size-accurate virtual objects, then dynamically inserts these virtual objects into
the live camera view on a smartphone or tablet for interaction in physical space (see
figure 2). Booksnake is available as a free app for iPhone and iPad on Apple's App
Store:
https://apps.apple.com/us/app/booksnake/id1543247176.
Booksnake's use of AR makes it feel like a digitized item is physically present in a
user’s real-world environment, enabling closer and richer engagement with digitized
materials. A Booksnake user aims their phone or tablet at a flat surface (like a
table, wall, bed, or floor) and taps the screen to anchor the virtual object to the
physical surface. As the user moves, Booksnake uses information from the device’s
cameras and sensors to continually adjust the virtual object’s relative position,
orientation, and size in the camera view, such that the item appears to remain
stationary in physical space as the user and device move. A user thus treats
Booksnake like a lens, looking through their device’s screen
at a virtual object overlaid on the physical world.
The project takes its name from book snakes, the weighted strings used by archival
researchers to hold fragile physical materials in place. Similarly, Booksnake enables
users to keep virtual materials in place on physical surfaces in the real world. By
virtualizing the experience of embodied interaction, Booksnake makes it possible for
people who cannot otherwise visit an archive (due to cost, schedule, distance,
disability, or other reasons) to physically engage with digitized materials.
Booksnake represents a proof-of-concept, demonstrating that it is possible to
automatically transform existing digital image files of cultural heritage materials
into life-size virtual objects on demand. We have designed Booksnake to easily link
with existing digitized collections via the International Image Interoperability
Framework (or IIIF, pronounced
triple-eye-eff), a
widely-supported set of open Web protocols for digitally accessing archival materials
and related metadata, developed and maintained by the GLAM community [
Cramer 2011], [
Cramer 2015], [
Snydman, Sanderson, and Cramer 2015]. The first version of Booksnake uses
IIIF to access Library of Congress digitized collections, including the Chronicling
America collection of historical newspapers. Our goal is for Booksnake to be able to
display any image file accessible through IIIF. At this point, the considerable
variability in how different institutions record dimensional metadata means that we
will have to customize Booksnake's links with each digitized collection, as we
discuss further below.
Our approach represents a novel use of AR for humanistic research, teaching, and
learning. While humanists have begun to explore the potential of immersive
technologies — primarily virtual reality (VR) and AR — they have largely approached
these technologies as publication platforms: ways of hosting custom-built experiences
containing visual, textual, spatial, and/or aural content that is manually created or
curated by subject-matter experts. Examples include reconstructions of
twelfth-century Angkor Wat [
Chandler et al. 2017], a Renaissance
studiolo [
Shemek 2018], events from the Underground Railroad [
Roth and Fisher 2019], a medieval Irish castle [
Herron 2020], and an 18th-century Paris theatre [
François et al. 2021]. These are
rich, immersive experiences, but they are typically limited to a specific site,
narrative, or set of material. The work involved in manually creating these virtual
models can be considerable, lengthy, and expensive.
In contrast, Booksnake is designed as a general-purpose AR tool for archival
exploration. Booksnake's central technical innovation is the automatic transformation
of existing digital images of archival cultural heritage materials into
dimensionally-accurate virtual objects. We aim to enable users to freely select
materials from existing digitized cultural heritage collections for embodied
interaction. Unlike custom-built immersive humanities projects, Booksnake makes it
possible for a user to create virtual objects easily, quickly, and freely. While
Booksnake fundamentally differs from existing Web-based image viewers, it is, like
them, an empty frame, waiting for a user to fill it with something interesting.
This article presents a conceptual and technical overview of Booksnake. We first
critique the accepted method of viewing digitized archival materials in Web-based
image viewers on flat screens and discuss the benefits of embodied interaction with
archival materials, then describe results from initial user testing and potential use
cases. Next, we contextualize Booksnake, as an AR app for mobile devices, within the
broader landscape of immersive technologies for cultural heritage. We then detail the
technical pipeline by which Booksnake transforms digitized archival materials into
virtual objects for interaction in physical space. Ultimately, Booksnake's use of
augmented reality demonstrates the potential of spatial interfaces and embodied
interaction to improve accessibility to archival materials and activate digitized
collections, while its presentation as a mobile app is an argument for the untapped
potential of mobile devices to support humanities research, teaching, and
learning.
Booksnake is designed and built by a multidisciplinary team at the University of
Southern California. It represents the combined efforts of humanities scholars,
librarians, interactive media designers, and computer scientists — most of them
students or early-career scholars — extending over four years.
[1] The project has been
financially supported by the Humanities in a Digital World program (under grants from
the Mellon Foundation) in USC's Dornsife College of Letters, Arts, and Sciences; and
by the Ahmanson Lab, a scholarly innovation lab in the Sydney Harman Academy for
Polymathic Studies in USC Libraries. The research described in this article was
supported by a Digital Humanities Advancement Grant (Level II) from the National
Endowment for the Humanities (HAA-287859-22). The project’s next phase will be
supported by a second NEH Digital Humanities Advancement Grant (Level II)
(HAA-304169-25).
FROM FLAT SCREENS TO EMBODIED EXPLORATION
Flat-screen image viewers have long been hiding in plain sight. Each image viewer
sits at the end of an institution's digitization pipeline and offer an interface
through which users can access, view, and interact with digital image files of
collection materials. These image viewers' fundamental characteristic is their
transparency: The act of viewing a digital image of an archival item on a flat screen
has become so common within contemporary humanities practices that we have
unthinkingly naturalized it.
But there is nothing
natural about using flat-screen image
viewers to apprehend archival materials, because everything about the digitization
process is novel. The ability to instantly view a digitized archival item on a
computer screen rests on technological developments from the last sixty years: the
high-resolution digital cameras that capture archival material; the digital asset
management software that organizes it; the cheap and redundant cloud storage that
houses it; the high-speed networking infrastructure that delivers it; the Web
browsers with which we access it; the high-resolution, full-color screens that
display it. Even the concept of a graphical user interface, with its representative
icons and mouse-based input, is a modern development, first publicly demonstrated by
Douglas Engelbart in 1968 and first commercialized by Apple, with the Lisa and the
Macintosh, in the early 1980s [
Fisher 2019, 17–26, 85–93, 104–117]. Put another way, our now-familiar ways of using a mouse or trackpad to interact
with digitized archival materials in a Web-based flat-screen image viewer would be
incomprehensibly alien to the people who originally created and used many of these
materials — and, in many cases, even to the curators, librarians, and archivists who
first acquired and accessioned these items. What does this mean for our relationship
with these archival materials?
Despite this, and although Web-based flat-screen image viewers sustain a robust
technical development community, they have been largely overlooked by most digital
humanists. A notable exception is manuscript scholars, for whom the relationship
between text, object, and digital image is particularly important. Indeed, manuscript
scholars have led the charge in identifying flat-screen image viewers as sites of
knowledge creation and interpretation — often by expressing their frustration with
these viewers' limited affordances or with the contextual information these viewers
shear away [
Nolan 2013], [
Szpiech 2014], [
Kropf 2017], [
Almas et al. 2018], [
Porter 2018], [
van Zundert 2018], [
van Lit 2020].
To see flat-screen image viewers more clearly, it helps to understand them within
the context of archival digitization practices. Digitization is usually understood as
the straightforward conversion of physical objects into digital files, but this
process is never simple and always involves multiple interpretive decisions.
[2] As Johanna
Drucker writes, “the way artifacts are [digitally] encoded
depends on the parameters set for scanning and photography. These already embody
interpretation, since the resolution of an image, the conditions of lighting under
which it is produced, and other factors, will alter the outcome”
[
Drucker 2013, 8]. Some of these encoding decisions are
structured by digitization guidelines, such as the U.S. Federal Agency Digitization
Guidelines Initiative [
Federal Agency Digitization Guidelines Initiative 2023] standards (2023), while other
decisions, such as how to light an item or how to color-correct a digital image,
depend on the individual training and judgment of digitization professionals.
A key digitization convention is to render an archival item from an idealized
perspective, that of an observer perfectly centered before the item. To achieve this,
a photographer typically places the item being digitized perpendicular to the
camera's optical axis, centers the item within the camera's view, and orthogonally
aligns the item with the viewfinder's edges. During post-processing, a digitization
professional can then crop, straighten, and de-skew the resulting image, or stitch
together multiple images of a given item into a single cohesive whole. These physical
and digital activities produce an observer-independent interpretation of an archival
item. Put another way, archival digitization is what Donna Haraway calls a god trick,
the act of “seeing everything from nowhere”
[
Haraway 1988, 582].
Taking these many decisions together, Drucker argues that “
digitization is not representation but interpretation”
[
Drucker 2013, 12] (emphasis original). Understanding
digitization as a continuous interpretive process, rather than a simple act of
representation, helps us see how this process extends past the production of digital
image files and into how these files are presented to human users via into
traditional flat-screen image viewers.
Flat-screen image viewers encode a set of decisions about how we can (or should)
interact with a digitized item. Just as decisions about resolution, lighting, and
file types serve to construct a digitized interpretation of a physical object, so too
do decisions about interface affordances for an image viewer serve to construct an
interpretive space. Following Drucker, image viewers do not simply represent digital image files to a user, they interpret them. Decisions by designers and developers about an image
viewer’s interface affordances (how to zoom, turn pages, create annotations, etc.)
structure the conditions of possibility for a user's interactions with a digitized
item.
The implications of these decisions are particularly visible when comparing
different image viewers. For example, Mirador, a leading IIIF-based image viewer,
enables users to compare two images from different repositories side-by-side [
Project Mirador n.d.]. A different IIIF-based image viewer,
OpenSeadragon, is instead optimized for viewing individual “high-resolution zoomable images”
[
OpenSeadragon n.d.]. Mirador encourages juxtaposition, while
OpenSeadragon emphasizes attention to detail. These two viewers each represent a
particular set of assumptions, goals, decisions, and compromises, which in turn shape
how their respective users encounter and read a given item, the interpretations those
users form, and the knowledge they create.
Flat-screen image viewers generally share three attributes that collectively
structure a user's interactions with a digitized item. First, and most importantly,
flat-screen image viewers directly reproduce the idealized “view
from nowhere” delivered by the digitization pipeline. Flat-screen image
viewers could present digital images in any number of ways — upside down, canted at
an angle away from the viewer, obscured by a digital curtain. But instead,
flat-screen image viewers play the god trick. Second, flat-screen image viewers rely
on indirect manipulation via a mouse or trackpad. To zoom, pan, rotate, or otherwise
navigate a digitized item, a user must repeatedly click buttons or click and drag,
positioning and re-positioning the digital image in order to apprehend its content.
These interaction methods create friction between the user and the digitized item,
impeding discovery [
Sundar et.al 2013]. Finally, flat-screen image
viewers arbitrarily scale digital images to fit a user’s computer screen. “Digitisation doesn't make everything equal, it just makes everything
the same size,” writes [
Crane 2021]. In a flat-screen image
viewer, a monumental painting and its postcard reproduction appear to be the same
size, giving digitized materials a false homogeneity and disregarding the contextual
information conveyed by an item's physical dimensions. In sum, flat-screen image
viewers are observer-independent interfaces for indirect manipulation of arbitrarily
scaled digitized materials.
In contrast, Booksnake is an observer-dependent interface for embodied interaction
with life-size digitized materials. When we encounter physical items, we do so
through our bodies, from our individual point of view. Humans are not simply “two eyeballs attached by stalks to a brain computer,” as
Catherine D'Ignazio and Lauren Klein write in their discussion of data
visceralization [
D'Ignazio and Klein 2020, §3]. We strain toward the
far corners of maps. We pivot back and forth between the pages of newspapers. We curl
ourselves over small objects like daguerreotypes, postcards, or brochures. These
embodied, situated, perspectival experiences are inherent to our interactions with
physical objects. By using augmented reality to pin life-size digitized items to
physical surfaces, Booksnake enables and encourages this kind of embodied
exploration. With Booksnake, you can move around an item to see it from all sides,
step back to see it in its totality, or get in close to focus on fine details.
Integral to Booksnake is Haraway's idea of “the particularity and
embodiment of all vision”
[
Haraway 1988, 582]. Put another way, Booksnake lets you break
out of the god view and see an object as only you can. As an image viewer with a
spatial interface, Booksnake is an argument for a way of seeing that prioritizes
embodied interaction with digitized archival materials at real-world size. In this,
Booksnake is more than a technical critique of existing flat-screen image viewers. It
is also an
intellectual critique of how these image viewers
foreclose certain types of knowledge creation and interpretation. Booksnake thus
offers
a new way of looking at digitized materials.
This is an interpretive choice in our design of Booksnake. Again, image viewers do
not simply
represent digital image files to a user, they
interpret them. Booksnake relies on the same digital image
files as do flat-screen image viewers, and these files are just as mediated in
Booksnake as they are when viewed in a flat-screen viewer. (“There is no unmediated photograph,” Haraway writes [
Haraway 1988, 583]. Indeed, Booksnake even displays these image
files on the flat screen of a smartphone or tablet — but its use of AR creates the
illusion that the object is physically present. Where existing flat-screen image
viewers foreground the digital-ness of digitized objects, Booksnake instead recovers
and foregrounds their object-ness, their materiality. Drucker writes that “information spaces drawn from a point of view, rather than as if
they were observer independent, reinsert the subjective standpoint of their
creation”
[
Drucker 2011, ¶20]. Drucker was writing about data
visualization, but her point holds for image viewers (which, after all, represent a
kind of data visualization). Our design decisions aim to create an interpretive space
grounded in individual perspective, with the goal of helping a Booksnake user get
closer to the “subjective standpoint” of an archival
item's original creators and users. Even as the technology underpinning Booksnake is
radically new, it enables methods of embodied looking that are very old, closely
resembling in form and substance physical, pre-digitized ways of looking.
INITIAL TESTING RESULTS AND USE CASES
Booksnake's emphasis on embodied interaction gives it particular potential as a tool
for humanities education. Embodied interaction is a means of accessing situated
knowledges [
Haraway 1988] and is key to apprehending cultural heritage
materials in their full complexity. As museum scholars and educators have
demonstrated, this is true both for physical objects [
Kai-Kee, Latina, Sadoyan 2020] and for virtual replicas [
Kenderdine and Yip 2019]. Meanwhile, systematic reviews show AR can support
student learning gains, motivation, and knowledge transfer [
Bacca et al. 2014]. By using AR to make embodied interaction possible
with digitized items, Booksnake supports student learning through movement,
perspective, and scale. For example, a student could use Booksnake to physically
follow an explorer's track across a map, watch a painting’s details emerge as she
moves closer, or investigate the relationship between a poster’s size and its message
— interactions impossible with a flat-screen viewer. Here, we briefly highlight
themes that have emerged in user and classroom testing, then discuss potential use
cases.
A common theme in classroom testing was that Booksnake's presentation of life-size
virtual objects made students feel like they were closer to digitized sources. A
colleague's students used Booksnake as part of an in-class exercise to explore
colonial-era Mesoamerican maps and codices, searching for examples of cultural
syncretism. In a post-activity survey, students repeatedly described a feeling of
presence. Booksnake gave one student “the feeling of actually
seeing the real thing up close.” Another student wrote that “I felt that I was actually working with the codex.” A third
wrote that “it was cool to see the resource, something I will
probably never get to flip through, and get to flip through and examine
it.” Students also commented on how Booksnake represented these items'
size. One student wrote that Booksnake “gave me the opportunity
to get a better idea of the scale of the pieces.” Another wrote that “I liked being able to see the material ‘physically’ and see the
scale of the drawings on the page to see what they emphasized and how they took up
space.” These comments suggest that Booksnake has the most potential to
support embodied learning activities that ask students to engage with an item's
physical features (such as an object's size or proportions), or with the relationship
between these features and the item's textual and visual content.
During one-on-one user testing sessions, two history Ph.D. students each separately
described feeling closer to digitized sources. One tester described Booksnake as
opening “a middle ground” between digital and physical
documents by offering the “flexibility” of digital
materials, but the “sensorial experience of closeness”
with archival documents. The other tester said that Booksnake brought “emotional value” to archival materials. “Being able to stand up and lean over it [the object] brought it to life a little
more,” this tester said. “You can’t assign research
utility to that, but it was more immersive and interactive, and in a way
satisfying.” Their comments suggest that Booksnake can enrich engagement
with digitized materials, especially by producing the feeling of physical
presence.
As a virtualization technology, Booksnake makes it possible to present archival
materials in new contexts, beyond the physical site of the archive itself. Jeremy
Bailenson argues that virtualization technologies are especially effective for
scenarios that would otherwise be rare, impractical, destructive, or expensive [
Bailenson 2018]. Here, we use these four characteristics to describe
potential Booksnake use cases. First, it is rare to have an individual, embodied
interaction with one-of-a-kind materials, especially for people who do not work at
GLAM institutions. A key theme in student comments from the classroom testing
described above, for example, was that Booksnake gave students a feeling of direct
engagement with unique Mesoamerican codices. Second, it is often logistically
impractical for a class to travel to an archive (especially for larger classes) or to
bring archival materials into classrooms outside of library or museum buildings. The
class described above, for example, had around eighty students, and Booksnake made it
possible for each student to individually interact with these codices, during
scheduled class time and in their existing classrooms. Third, the physical fragility
and the rarity or uniqueness of many archival materials typically limits who can
handle them or where they can be examined. Booksnake makes it possible to engage with
virtual archival materials in ways that would cause damage to physical originals. For
example, third-grade students could use Booksnake to explore a historic painting by
walking atop a virtual copy placed on their classroom floor, or architectural
historians could use Booksnake to bring virtual archival blueprints into a physical
site. Finally, it is expensive (in both money and time) to physically bring together
archival materials that are held by two different institutions. With Booksnake, a
researcher could juxtapose digitized items held by one institution with physical
items held by a different institution, for purposes of comparison (such as comparing
a preparatory sketch to a finished painting) or reconstruction (such as reuniting
manuscript pages that had been separated). In each of these scenarios, Booksnake's
ability to produce virtual replicas of physical objects lowers barriers to embodied
engagement with archival materials and opens new possibilities for research,
teaching, and learning.
IMMERSIVE TECHNOLOGIES FOR CULTURAL HERITAGE
Booksnake is an empty frame. It leverages AR to extend the exploratory freedom that
users associate with browsing online collections into physical space. In doing so,
Booksnake joins a small collection of digital tools and projects using immersive
technologies as the basis for interacting with cultural heritage materials and
collections.
A dedicated and technically adept user could use 3D modeling software (such as
Blender, Unity, Maya, or Reality Composer) to manually transform digital images into
virtual objects. This can be done with any digital image, but is time-intensive and
breaks links between images and metadata. Booksnake automates this process, making it
more widely accessible, and preserves metadata, supporting humanistic inquiry.
The Google Arts & Culture app offers the most comparable use of humanistic AR.
Like Booksnake, Arts & Culture is a mobile app for smartphones and tablets that
offers AR as an interface for digitized materials. A user can activate the app’s
“Art Projector” feature to “display
life-size artworks, wherever you are” by placing a digitized artwork in
their physical surroundings [
Luo 2019]. But there are three key
differences between Google’s app and Booksnake. First, AR is one of many possible
interaction methods in Google’s app, which is crowded with stories, exhibits, videos,
games, and interactive experiences. In contrast, Booksnake emphasizes AR as its
primary interface, foregrounding the embodied experience. Second, Google’s app
focuses on visual art (such as paintings and photographs), while Booksnake can
display a broader range of archival and cultural heritage materials, making it more
useful for humanities scholars. (Booksnake can also display paginated materials, as
we discuss further below, while Google's app cannot.) Finally — and most importantly
— Google’s app relies on a centralized database model. Google requires participating
institutions to upload their collection images and metadata to Google’s own servers,
so that Google can format and serve these materials to users [
Google n.d.]. In contrast, Booksnake’s use of IIIF enables institutions
to retain control over their digitized collections and expands the capabilities of an
open humanities software ecosystem.
Another set of projects approach immersive technologies as tools for designing and
delivering exhibition content. Some use immersive technologies to enhance physical
exhibits, such as Veholder, a project exploring technologies and methods for
juxtaposing 3D virtual and physical objects in museum settings [
Haynes 2018], [
Haynes 2010]. Others are tools for using
immersive technologies to create entirely virtual spaces. Before its discontinuation,
many GLAM institutions and artists adapted Mozilla Hubs, a general-purpose tool for
building 3D virtual spaces that could be accessed using a flat-screen Web browser or
a VR headset, to build virtual exhibition spaces, although users were required to
manually import digitized materials and construct virtual replicas [
Cool 2022]. Another project, Diomira Galleries, is a prototype tool for
building VR exhibition spaces with IIIF-compliant resources [
Bifrost Consulting Group 2023]. Like Booksnake, Diomira uses IIIF to import digital
images of archival materials, but Diomira arbitrarily scales these images onto
template canvases that do not correspond to an item’s physical dimensions. As with
Booksnake, these projects demonstrate the potential of immersive technologies for new
research interactions and collection activation with digitized archival
materials.
Finally, we are building Booksnake as an AR application for existing consumer mobile
devices as a way of lowering barriers to immersive engagement with cultural heritage
materials. Most VR projects require expensive special-purpose headsets, which has
limited adoption and access [
Greengard 2019]. In contrast, AR-capable
smartphones are ubiquitous, enabling a mobile app to tap the potential of a large
existing user base, and positioning such an app to potentially mitigate racial and
socioeconomic digital divides in the United States. More Americans own smartphones
than own laptop or desktop computers [
Pew Research Center 2021]. And while Black and
Hispanic adults in the United States are less likely to own a laptop or desktop
computer than white adults, Pew researchers have found “no
statistically significant racial and ethnic differences when it comes to
smartphone or tablet ownership”
[
Atske and Perrin 2021]. Similarly, Americans with lower household incomes
are more likely to rely on smartphones for Internet access [
Vogels 2021]. Smartphones are thus a key digital platform for engaging and including the
largest and most diverse audience. Developing an Android version of Booksnake will
enable us to more fully deliver on this potential.
AUTOMATICALLY TRANSFORMING DIGITAL IMAGES INTO VIRTUAL OBJECTS
Booksnake’s central technical innovation is automatically transforming existing
digital images of archival cultural heritage materials into dimensionally-accurate
virtual objects. To make this possible, Booksnake connects existing software
frameworks in a new way.
First, Booksnake uses IIIF to download images and metadata.
[3] IIIF was proposed in 2011 and developed over the
early 2010s. Today, the IIIF Consortium is composed of sixty-five global GLAM
institutions, from the British Library to Yale University [
International Image Interoperability Framework n.d.],
while dozens more institutions offer access to their collections through IIIF because
several common digital asset management (DAM) platforms, including CONTENTdm, LUNA,
and Orange DAM, support IIIF [
OCLC n.d.]
[
Luna 2018]
[
orangelogic n.d.]. This widespread use of IIIF means that Booksnake
is readily compatible with many existing digitized collections. By using IIIF,
Booksnake embraces and extends the capabilities of a robust humanities software
ecosystem. By demonstrating a novel method to transform existing IIIF-compliant
resources for interaction in augmented reality, we hope that Booksnake will drive
wider IIIF adoption and standardization.
Next, Booksnake uses a pair of Apple software frameworks, ARKit and RealityKit.
ARKit, introduced in 2017, interprets and synthesizes data from an iPhone or iPad's
cameras and sensors to understand a user's physical surroundings and to anchor
virtual objects to horizontal and vertical surfaces [
Apple n.d._a].
RealityKit, introduced in 2019, is a framework for rendering and displaying virtual
objects, as well as managing a user's interactions with them (for example, by
interpreting a user’s on-screen touch gestures) [
Apple n.d._b]. Both
ARKit and RealityKit are built into the device operating system, enabling us to rely
on these frameworks to create virtual objects and to initiate and manage AR sessions.
Developing Booksnake as a native mobile app, rather than a Web-based tool, makes it
possible for Booksnake to take advantage of the powerful camera and sensor
technologies in mobile devices. We are developing Booksnake’s first version for
iPhone and iPad because Apple’s tight integration of hardware and software supports
rapid AR development and ensures consistency in user experience across devices. We
plan to extend our work by next developing an Android version of Booksnake, improving
accessibility. Another development approach, WebXR, a cross-platform Web-based AR/VR
framework currently in development, lacks the features to support our project
goals.
Booksnake thus links IIIF with RealityKit and ARKit to produce a novel result: an
on-demand pipeline for automatically transforming existing digital images of archival
materials into custom virtual objects that replicate the physical original’s
real-world proportions, dimensions, and appearance, as well as an AR interface for
interacting with these virtual objects in physical space. (See figure 3.) How does
Booksnake do this?
Booksnake starts with the most humble of Internet components: the URL. A Booksnake
user first searches and browses an institution’s online catalog through an in-app Web
view. Booksnake offers an “Add” button on catalog pages for individual items.
When the user taps this button to add an item to their Booksnake library, Booksnake
retrieves the item page’s URL. Because of Apple’s privacy restrictions and
application sandboxing, this is the only information that Booksnake can read from a
given Web page; it cannot directly access content on the page itself. Instead,
Booksnake translates the item page URL into the corresponding IIIF manifest URL.
An IIIF manifest is a JSON file — a highly structured, computer-readable text file —
that contains a version of the item’s catalog record, including metadata and URLs for
associated images. The exact URL translation process varies depending on how an
institution has implemented IIIF, but in many cases it is as simple as appending
"/manifest.json" to the item URL. For example, the item URL for the
Library of Congress’s 1858 “Chart of the submarine Atlantic
Telegraph,” is
https://www.loc.gov/item/2013593216/, and the item’s IIIF manifest URL is
https://www.loc.gov/item/2013593216/manifest.json. In other cases,
Booksnake may extract a unique item identifier from the item URL, then use that
unique identifier to construct the appropriate IIIF manifest URL. Booksnake then
downloads and parses the item’s IIIF manifest.
First, Booksnake extracts item metadata from the IIIF manifest. Booksnake uses this
metadata to construct an item page in the app’s Library tab, enabling a user to view
much of the same item-level metadata visible in the host institution’s online
catalog. An IIIF manifest presents metadata in key-value pairs, with each pair
containing a general label (or key) and a corresponding entry (or value). For
example, the IIIF manifest for the 1858 Atlantic telegraph map mentioned above
contains the key “Contributors”, representing the catalog-level field listing an
item’s authors or creators, and the corresponding item-level value “Barker, Wm. J. (William J.) (Surveyor),” identifying the
creator of this specific item. Importantly, while the key-value pair structure is
generally consistent across IIIF manifests from different institutions, the key names
themselves are not. The "Contributors" key at one institution may be named
“Creators” at another institution, and “Authors” at a third. The current
version of Booksnake simply displays the key-value metadata as provided in the IIIF
manifest. As Booksnake adds support for additional institutions, we plan to identify
and link different keys representing the same metadata categories (such as
“Contributors,”
“Creators,” and “Authors”). This will enable users, for example, to sort
items from different institutions by common categories like “Author” or “Date
created,” or to search within a common category.
Second, Booksnake uses image URLs contained in the IIIF manifest to download the
digital images (typically JPEG or JPEG 2000 files) associated with an item's catalog
record. Helpfully, IIIF image URLs are structured so that certain requests — like the
image’s size, its rotation, even whether it should be displayed in color or
black-and-white — can be encoded in the URL itself. Booksnake leverages this
affordance to request images that are sufficiently detailed for virtual object
creation, which sometimes means requesting images larger than what the institution
serves by default. For example, Library of Congress typically serves images that are
25% of the original size, but Booksnake modifies the IIIF image URL to request images
at 50% of original size. Our testing indicates that this level of resolution produces
sufficiently detailed virtual objects, without visible pixelation, for Library of
Congress collections.
[4] We
anticipate customizing this value for other institutions. Having downloaded the
item’s metadata and digital images, Booksnake can now create a virtual object
replicating the physical original.
Our initial goal was for Booksnake to display any image resource available through
IIIF, but we quickly discovered that differences in how institutions record
dimensional metadata meant that we would have to adapt Booksnake to different
digitized collections. To produce size-accurate virtual objects, Booksnake requires
either an item’s physical dimensions in a computer-readable
format, or both the item’s pixel dimensions and its
digitization resolution, from which it can calculate the item's physical dimensions
(see figures 4a and 4b). Institutions generally take one of three approaches to
providing dimensional metadata.
The simplest approach is to list an item’s physical dimensions as metadata in its
IIIF manifest. Some archives, such as the David Rumsey Map Collection, provide
separate fields for an item’s height and width, each labeled with units of measure.
This formatting provides the item’s dimensions in a computer-readable format, making
it straightforward to create a virtual object of the appropriate size. Alternatively,
an institution may use the IIIF Physical Dimension service, an optional service that
provides the scale relationship between an item’s physical and pixel dimensions,
along with the physical units of measure [
International Image Interoperability Framework 2015]. But we are
unaware of any institution that has implemented this service for its
collections.
[5]
A more common approach is to provide an item’s physical dimensions in a format that
is not immediately computer-readable. The Huntington Digital Library, for example,
typically lists an item’s dimensions as part of a text string in the “physical
description” field.
This c.1921 steamship poster, for example, is described as: “Print ; image 60.4 x 55 cm (23 3/4 x 21 5/8 in.) ; overall 93.2 x 61
cm (36 11/16 x 24 in.)” [spacing
sic]. To interpret
this text string and convert it into numerical dimensions, a computer program like
Booksnake requires additional guidance. Which set of dimensions to use, “image”
or “overall”? Which units of measure, centimeters or inches? And what if the
string includes additional descriptors, such as “folded” or “unframed”? We
are currently collaborating with the Huntington to develop methods to parse
dimensional metadata from textual descriptions, with extensibility to other
institutions and collections.
Finally, an institution may not provide
any dimensional
metadata in its IIIF manifests. This is the case with the Library of Congress (LOC),
which lists physical dimensions, where available, in an item’s catalog record, but
does not provide this information in the item’s IIIF manifest.
[6] This presented us
with a problem: How to create dimensionally-accurate virtual objects without the
item’s physical dimensions? After much research and troubleshooting, we hit upon a
solution. We initially dismissed pixel dimensions as a source of dimensional metadata
because there is no consistent relationship between physical and pixel dimensions.
And yet, during early-stage testing, Booksnake consistently created life-size virtual
maps from LOC, even as items in other LOC collections resulted in virtual objects
with wildly incorrect sizing. This meant there
was a
relationship, at least for one collection — we just had to find it.
LOC digitizes items to FADGI standards, which specify a digitization resolution for
different item types. For example, FADGI standards specify a target resolution of 600
ppi for prints and photographs, and 400 ppi for books.
[7] We then discovered that
LOC scales down item images in some collections. (For example, map images are scaled
down by a factor of four.) We then combined digitization resolution and scaling
factors for each item type into a pre-coded reference table. When Booksnake imports
an LOC item, it consults the item’s IIIF manifest to determine the item type, then
consults the reference table to determine the appropriate factor for converting the
item’s pixel dimensions to physical dimensions. This solution is similar to a
client-side version of the IIIF Physical Dimension service, customized to LOC’s
digital collections.
As this discussion suggests, determining the physical dimensions of a digitized item
is a seemingly simple problem that can quickly become complicated. Developing robust
methods for parsing different types of dimensional metadata is a key research area
because these methods will allow us to expand the range of institutions and materials
with which Booksnake is compatible. While IIIF makes it straightforward to access
digitized materials held by different institutions, the differences in how each
institution presents dimensional metadata mean that we will currently have to adapt
Booksnake to use each institution's metadata schema.
[8]
Booksnake then uses this information to create virtual objects on demand. When a user
taps “View in Your Space” to initiate an AR session, Booksnake uses RealityKit
to transform the item’s digital image into a custom virtual object suitable for
interaction in physical space. First, Booksnake creates a blank two-dimensional
virtual plane sized to match the item’s physical dimensions. Next, Booksnake applies
the downloaded image to this virtual plane as a texture, scaling the image to match
the size of the plane. This results in a custom virtual object that matches the
original item’s physical dimensions, proportions, and appearance. This process is
invisible to the user — Booksnake “just works.” This process is straightforward
for flat digitized items like maps or posters, which are typically digitized from a
single perspective, of their recto (front) side.
Booksnake’s virtual object creation process is more complex for compound objects,
which have multiple images linked to a single item record. Compound objects can
include books, issues of periodicals or newspapers, diaries, scrapbooks, photo
albums, and postcards. The simplest compound objects, such as postcards, have two
images showing the object’s recto (front) and verso (back) sides. Nineteenth-century
newspapers may have four or eight pages, while books may run into hundreds of pages,
with each page typically captured and stored as a single image.
Booksnake handles compound objects by creating multiple individual virtual objects,
one for each item image, then arranging and animating these objects to support the
illusion of a cohesive compound object. Our initial implementation is modeled on a
generic paginated codex, with multiple pages around a central spine. As with flat
objects, this creation process happens on demand, when a user starts an AR session.
Booksnake uses a compound object’s first image as the virtual object’s cover or first
page (more on this below). Booksnake creates a virtual object for this first image,
then creates a matching invisible virtual object, which acts as an invisible “page
zero” (see figure 5). The user sees the first page of a newspaper, for example,
sitting and waiting to be opened. When the user swipes from right to left across the
object edge to "turn" the virtual page, Booksnake retrieves the next two images,
transforms them into virtual objects representing pages two and three, then animates
a page turn with the object’s “spine” serving as the rotation axis (see figure
6). Once the animation is complete, pages two and three have replaced the invisible
“page zero” and page one, and Booksnake discards those virtual objects (see
figure 7). This on-demand process means that Booksnake only needs to load a maximum
of four images into AR space, optimizing for memory and performance. By using a swipe
gesture to turn pages, Booksnake leverages a navigation affordance with which users
are already familiar from interactions with physical paginated items, supporting
immersion and engagement. The page-turn animation, paired with a page-turn sound
effect, further enhances the realism of the virtual experience.
A key limitation in our initial approach to creating virtual paginated objects is
that our generic codex model is based on one particular type of object, the
newspaper. Specifically, we used LOC's Chronicling America collection of historical
newspapers as a testbed to develop the pipeline for creating virtual paginated
objects, as well as the user interface and methods for interacting with virtual
paginated objects in physical space. While the newspaper is broadly representative of
the codex form's physical features and affordances, making it readily extensible to
other paginated media, there are important differences in how different media are
digitized and presented in online collections. For example, while Chronicling America
newspapers have one page per image, some digitized books held by LOC have two pages
per image. We have adapted Booksnake's object creation pipeline to identify
double-page images, split the image in half, and wrap each resulting image onto
individual facing pages.
[9] There are also important
cultural differences in codex interaction methods: We plan to further extend the
capabilities of this model by building support for IIIF's "right-to-left" reading
direction flag, which will enable Booksnake to correctly display paginated materials
in languages like Arabic, Chinese, Hebrew, and Japanese.
A further limitation of our initial approach is our assumption that all compound
objects represent paginated material in codex form, which is not the case. Booksnake
cannot yet realistically display items like scrolls or Mesoamerican screenfold books
(which are often digitized by page, but differ in physical construction and
interaction methods from Western books). In other cases, a compound object comprises
a collection of individual items that are stored together but not physically bound to
each other. One example is
“Polish Declarations of Admiration and Friendship for the United
States”, held by LOC, which consists of 181 unbound sheets. Or an
institution may use a compound object to store multiple different images of a single
object. For example, the Yale Center for British Art (YCBA) often provides several
images for paintings in its collections, as with
the 1769
George Stubbs painting “Water Spaniel”. YCBA
provides four images: of the framed painting, the unframed painting, the unframed
image cropped to show just the canvas, and the framed painting’s verso side. We plan
to continue refining Booksnake’s compound object support by using collection- and
item-level metadata to differentiate between paginated and non-paginated compound
objects, in order to realistically display different types of materials.
Having constructed a custom virtual object, Booksnake next opens the live camera view
so that the user can interact with the object in physical space. As the user moves
their device to scan their surroundings, Booksnake overlays a transparent image of
the object, outlined in red, over horizontal and vertical surfaces, giving the user a
preview of how the object will fit in their space. Tapping the transparent preview
anchors the object, which turns opaque, to a physical surface. As the user and device
move, Booksnake uses camera and sensor data to maintain the object's relative
location in physical space, opening the object to embodied exploration. The user can
crouch down and peer at the object from the edge of the table, pan across the
object's surface, or zoom into the key details by moving closer to the object. The
user can also enlarge or shrink an object by using the familiar pinch-to-zoom
gesture—although in this case, the user is not zooming in or out, but re-scaling the
virtual object itself. The object remains in place even when out of view of the
device: A user can walk away from an anchored object, then turn back to look at it
from across the room.
Again, an object viewed with Booksnake is just as mediated as one viewed in a
traditional Web-based viewer. Throughout the development process, we have repeatedly
faced interpretive questions at the intersection of technology and the humanities.
One early question, for example: Should users be allowed to re-scale objects? The
earliest versions of Booksnake lacked this affordance, to emphasize that a given
virtual object was presented at life size. But one of our advisory board members,
Philip J. Ethington, argued that digital technology is powerful because it can enable
interactions that aren't possible in the real world. And so we built a way for users
to re-scale objects by pinching them, while also displaying a changing percentage
indicator to show users how much larger or smaller the virtual object is when
compared to the original. In approaching this and similar questions, our goal is for
virtual objects to behave in realistic and familiar ways, to closely mimic the
experience of embodied interaction with physical materials.