“Supporting Digital Scholarship: a Project Funded by the
Andrew W. Mellon Foundation”
John
Unsworth
University of Virginia, USA
Worthy
Martin
University of Virginia, USA
Thornton
Staple
University of Virginia, USA
Ken
Price
University of Virginia, USA
Summary:
To date, digital library efforts have focused on library-based production of digital primary resources. This project will, for the first time, address second-generation digital library problems, where the focus is on scholarly analysis, reprocessing, and the creation of digital primary resources. With $1m in support from the Andrew W. Mellon Foundation over three years (2000-2002), the University of Virginia's Institute for Advanced Technology in the Humanities (IATH) and the University of Virginia Libraries' Digital Library Research and Development Group will address three closely related problems:- 1. scholarly use of digital primary resources;
- 2. library adoption of "born-digital" scholarly research; and
- 3. co-creation of digital resources by scholars, publishers, and libraries.
Institutional Background:
Since its inception in 1992, the Institute has focused intensive support and advanced computer resources on long-term humanities research proposed by faculty at the University of Virginia and elsewhere. To date, the Institute has supported more than forty fellows in architecture, landscape architecture, architectural history, art history, religious studies, classics, anthropological linguistics, medieval and 19th-century British literature, 19th-century American literature, American history, classical history, history of science, archaeology, film, and music, among other disciplines. The majority of this research - indeed, most of the Institute's work - involves intensive collaboration among groups of scholars, and between scholars and the Institute's technical experts. The Pompeii Forum project, for example, sends an interdisciplinary group of researchers to Pompeii each summer, where a systematic survey of the Forum at Pompeii is being conducted using an extremely accurate surveying device known as a laser Total station, and feeding data from that device into a laptop in the field. These measurements are then brought back to the Institute, where they are processed into two-dimensional plans and three-dimensional CAD models. Further field-research provides an extensive photographic survey of the buildings at Pompeii, and these photographs are used in conjunction with advanced photogrammetric software to create accurate, photo-realistic surfaces for the three-dimensional CAD models. Finally, using modeling tools custom-built at the Institute, the researchers are able to combine individual building models into a model of the entire site and even render the walls transparent, in order to see both sides at once, thus producing an analysis of the Forum more detailed, more accurate, and more flexible than any other to date. The University of Virginia Libraries have established a number of electronic data centers that work closely with the Institute's staff and fellows: the Electronic Text Center, the Geospatial and Statistical Data Center, the Digital Media Center, and the Special Collections Digital Center. Library digital centers have provided support to many of the same faculty involved in research with the Institute, and staff from these centers meet regularly with IATH staff and others in a digital library interest group. Most recently, the Libraries have established a Digital Library Research and Development Group, charged with long-range planning of digital library architectures, systems, and procedures. Having begun to assemble a broad digital collection, they recognize that no library management system yet exists to handle it and they have dedicated themselves to developing an appropriate solution to the problem. Further information about library digital centers is available on the Web at <http://www.lib.virginia.edu/ecenters.html> Information about Digital Library Research and Development is available at <http://www.lib.virginia.edu/dl/intro/>.Project Goals:
Much of what has taken place in digital library contexts to date has aimed at producing large collections of digital data, often - in fact usually - without the involvement of the intended audience for that data, scholars and researchers. In this project, we aim to foreground the scholarly user - something we believe we are uniquely positioned to do - and from this perspective we will look at the issues of collections development, data management, metadata, and digital library systems. We expect to complete a number of trials in these areas, and although we do not believe the scope of this project is sufficient to provide universal or definitive solutions, we do expect to arrive at a better understanding of the problems that will be involved in the next generation of digital library activities. So much hyperbole attends the current phase of digital library development that it may seem surprising to suggest there are things scholars need to do that digital libraries cannot support. Three scenarios are presented here as examples of some of those unsolved, second-generation digital library problems:Scenario 1: Scholarly use of digital primary resources
A literary scholar researching the history of a particular poem knows that its author also painted the subject of the poem. She can find information about the poem and the painting in the digital library, and can even retrieve a digital image of the painting. The scholar knows that other dual-media works were produced by this author, and she suspects that the author's arrangements of his paintings in exhibitions might well be significant in understanding the related literary works: therefore, the scholar would like to use the digital library to find out when the painting in question was exhibited and, for a given exhibition date, would like to know what painting was to its left and what painting was to its right - and then see those paintings together in a virtual reconstruction of the exhibit. In this example, we consider the possibility that the scholar of the very near future will want to do something more than browse or perform keyword searches in the digital library. The promise of the digital library is that it will enable scholars to frame questions that would have been inconceivable without this technology. And yet, in practice, we find that digital libraries support only very narrowly defined investigative activities. Partly this is because we tend to treat objects in the digital library as though they had no other temporal or spatial contexts - as though they had always and only existed, discrete and timeless, in our information systems. Partly, too, these limitations are a sign that the digital library is mainly concerned, at this point, with providing simple access to the discrete digital object, rather than with supporting context, comparison, or analysis - the building blocks of scholarship. We could begin to grapple with this problem by producing several proof-of-concept example projects, in which data and metadata expressly support more complex kinds of "behaviors" in the digital library, and are associated with other objects in the digital library (e.g., Java applets) that actualize those behaviors on the end-user's machine. This follows the Fedora model that the library is already developing, specifically that aspect of Fedora that permits "client access to multiple views, or disseminations, of the object's data through the transparent activation of external mechanisms that execute these content type behaviors" <http://www2.cs.cornell.edu/NCSTRL/CDLRG/FEDORA.html>.Scenario 2: Library adoption of "born-digital" scholarly research
An archaeologist spends decades producing detailed digital records of an important classical archaeological site. The records include CAD reconstructions of individual buildings, topographical maps, photographs, and maps locating particular artifacts in areas and layers of excavation, and large-scale computer models of the entire site. Upon retirement, the archaeologist offers his entire collection of digital records to the library (since no publisher has ever known what to do with them) - but he offers them on the condition that the library treat these records as a special collection, catalogue them, and make them available through the web to other researchers and students of archaeology. This example makes plain the problems that libraries will inevitably face as they come to collect digital resources produced by scholars outside of library (and quite possibly, publishing) frameworks. The problem is likely to be especially acute in the areas of architecture and archaeology, where data is likely to have been produced by researchers in digital form, and where we have few (if any) established conventions for collecting, normalizing, cataloguing, providing, or preserving such data. A single map or CAD drawing could represent hundreds of hours of research, data gathering, and expert analysis - as valuable, in principle, as a monograph or a journal - and yet libraries might well be unable to accept it, for lack of appropriate systems and procedures. As a pilot project in this area, we can recruit large existing collections of digital architectural and archaeological data (from The Pompeii Forum, Victorian London, The Waters of the City of Rome, Jefferson's Architecture, and other IATH projects), and use that data to experiment with cataloging, collections, and preservation issues raised in such contexts. At the end of three years, we would expect to have brought several such collections into the library.Scenario 3: Co-creation of digital resources by scholars, publishers, and libraries
A historian, working together with technical experts in the library's Geospatial and Statistical Data Center, uses census data, eyewitness accounts, military records and contemporary GIS information to generate a time-indexed, geo-referenced reconstruction of troop movements in a famous civil-war battle. The research is going to be published by a university press, and the press has contributed original vector data for the underlying map. At different points in this process, the press, the historian, the historian's graduate research assistants, and library experts all need to share editorial control of the evolving data set. At the end of the process, the data set needs to be published by the press, collected in the library, and connected to textual records of the event. Increasingly, we believe, scholars and libraries and publishers will enter into collaborative arrangements involving the production of digital primary resources by the library, a scholarly treatment of those resources, and electronic publication of the result. We have already seen many instances of this pattern in IATH research projects. In retrospect, it seems perfectly reasonable that the institution owning the primary resources (a rare book, a painting, a statue, a map) would want to produce its initial digital representation; once that digital representation exists, it seems inevitable that scholars will want to do what they have always done - edit, contextualize, re-present, and analyze the (now digital) object. And, if not inevitable, it seems at least likely that the result of this scholarly engagement with digital primary resources will be the stuff of scholarly publishing. There are many unanswered questions, though, behind these three reasonable assumptions: should it be a goal to have a single authoritative version of the digital object? If so, how might scholars and/or publishers register corrections or revisions to the original, if the original is produced (and presumably owned) by a library or museum? If several scholars disagree on the verisimilitude of the digital representation, how will their range of opinions be recorded and connected to that representation? If electronic editions of the artifact become the norm, instead of an authoritative version with apparatus, then how should those editions be derived and denoted? At IATH, we already have several projects that raise this sort of problem - the Valley of the Shadow, the Walt Whitman Archive, the Victorian London project, and others. We have a document management system (Astoria) that will help to address some of the practical procedural issues involved in managing multiple authorship; we will experiment with integrating that system into the library's production strategies, to address those situations in which a single authoritative version is necessary or desirable, but we would also expect to experiment with managing and coordinating multiple divergent editions of a single base object, or multiple perspectives on an object.
In order to address the many problems - some technical, some social, some
intellectual - raised in these three scenarios, we need to move beyond
the simple production and cataloguing of digital collections, and begin
to recognize that, in the library of the future as in libraries of the
past and present, most materials will be produced by many hands, not
few; most materials will incorporate many perspectives, not one; and
most materials will need to support specialized and pointed research as
well as general, blunt queries.
Recognizing these things, we will undertake a collaborative investigation
of advanced digital library problems, including library absorption of
scholar-produced digital resources, library/scholar co-creation of such
resources, and analytical use of digital humanities data. Within this
investigation, our emphasis will be on metadata practices, library
systems, and production protocols that support scholarly use. And though
we don't promise to solve all the problems that might be raised in this
area, we will establish guidelines that will be useful to others,
produce examples that others can imitate, and learn which problems are
easy to solve and which are difficult.