Daniel Paul O'Donnell
Disciplinary Impact and Technological Obsolescence in Digital Medieval Studies1
In May 2004, I attended a lecture by Elizabeth Solopova at a workshop at the University of Calgary on the past and present of digital editions of medieval works. The lecture looked at various approaches to the digitization of medieval literary texts and discussed a representative sample of the most significant digital editions of English medieval works then available: the Wife of Bath's Prologue from the Canterbury Tales Project (Robinson and Blake 1996), Murray McGillivray's Book of the Duchess (McGillivray 1997), Kevin Kiernan's Electronic Beowulf (Kiernan 1999), and the first volume of the Piers Plowman Electronic Archive (Adams et al. 2000). Solopova herself is an experienced digital scholar and the editions she was discussing had been produced by several of the most prominent digital editors then active. The result was a master class in humanities computing: an in-depth look at markup, imaging, navigation and interface design, and editorial practice in four exemplary editions.
From my perspective in the audience, however, I was struck by two unintended lessons. The first was how easily digital editions can age: all of the CD-ROMs Solopova showed looked quite old fashioned to my 2004 eyes in the details of their presentation and organization and only two, Kiernan's Beowulf and McGillivray's Book of the Duchess, loaded and displayed on the overhead screen with no difficulties or disabled features.
For the purposes of Solopova's lecture these failures were not very serious: a few missing characters and a slightly gimpy display did not affect her discussion of the editions' inner workings and indeed partially illustrated her point concerning the need to plan for inevitable technological obsolescence and change at all stages of edition design. For end users consulting these editions in their studies or at a library, however, the problems might prove more significant: while well-designed and standards-based editions such as these can be updated in order to accommodate technological change, doing so requires skills that are beyond the technological capabilities of most humanities scholars; making the necessary changes almost certainly requires some post-publication investment on the part of the publisher and/ or the original editors. Until such effort is made, the thought and care devoted by the original team to editorial organization and the representation of textual detail presumably is being lost to subsequent generations of end users.
The second lesson I learned was that durability was not necessarily a function of age or technological sophistication. The editions that worked more or less as intended were from the middle of the group chronologically and employed less sophisticated technology than the two that had aged less well: they were encoded in relatively straightforward HTML (although Kiernan's edition makes sophisticated use of Java and SGML for searching) and rendered using common commercial web browsers. The projects that functioned less successfully were encoded in SGML and were packaged with sophisticated custom fonts and specialized rendering technology: the Multidoc SGML browser in the case of the Piers Plowman Electronic Archive and the Dynatext display environment in the case of the Canterbury Tales Project. Both environments were extremely advanced for their day and allowed users to manipulate text in ways otherwise largely impossible before the development and widespread adoption of XML- and XSL-enabled browsers.
Neither of these lessons seems very encouraging at first glance to medievalists engaged in or investigating the possibilities of using digital media for new projects. Like researchers in many humanities disciplines, medievalists tend to measure scholarly currency in terms of decades, not years or months. The standard study of the Old English poem Cædmon's Hymn before my recent edition of the poem (O'Donnell 2005a) was published nearly seventy years ago. Reference works like Cappelli's Dizionario di abbreviature latine ed italiane (first edition, 1899) or Ker's Catalogue of Manuscripts Containing Anglo-Saxon (first edition, 1959) also commonly have venerable histories. In the case of the digital editions discussed above — especially those already showing evidence of technological obsolescence — it is an open question whether the scholarship they contain will be able to exert nearly the same long-term influence on their primary disciplines. Indeed, there is already some evidence that technological or rhetorical problems may be hindering the dissemination of at least some of these otherwise exemplary projects' more important findings. Robinson, for example, reports that significant manuscript work by Daniel Mosser appearing in various editions of the Canterbury Tales Project is cited far less often than the importance of its findings warrants (Robinson 2005: §11).
The lesson one should not draw from these and other pioneering digital editions, however, is that digital projects are inevitably doomed to early irrelevance and undeserved lack of disciplinary impact. The history of digital medieval scholarship extends back almost six decades to the beginnings of the Index Thomisticus by Roberto Busa in the mid-1940s (see Fraser 1998 for a brief history). Despite fundamental changes in focus, tools, and methods, projects completed during this time show enough variety to allow us to draw positive as well as negative lessons for future work. Some digital projects, such as the now more than thirty-year-old Dictionary of Old English (DOE), have proven themselves able to adapt to changing technology and have had an impact on their disciplines — and longevity — as great as the best scholarship developed and disseminated in print. Projects which have proven less able to avoid technological obsolescence have nevertheless also often had a great effect on our understanding of our disciplines, and, in the problems they have encountered, can also offer us some cautionary lessons (see Keene n.d. for a useful primer in conservation issues and digital texts).
Premature Obsolescence: the Failure of the Information Machine
Before discussing the positive lessons to be learned from digital medieval projects that have succeeded in avoiding technological obsolescence or looking ahead to examine trends that future digital projects will need to keep in mind, it is worthwhile considering the nature of the problems faced by digital medieval projects that have achieved more limited impact or aged more quickly than the intrinsic quality of their scholarship or relevance might otherwise warrant — although in discussing projects this way, it is important to realize that the authors of these often self-consciously experimental projects have not always aimed at achieving the standard we are using to judge their success: longevity and impact equal to that of major works of print-originated and disseminated scholarship in the principal medieval discipline.
In order to do so, however, we first need to distinguish among different types of obsolescence. One kind of obsolescence occurs when changes in computing hardware, software, or approach render a project's content unusable without heroic efforts at recovery. The most famous example of this type is the Electronic Domesday Book, a project initiated by the BBC in celebration of the nine-hundredth anniversary of King William's original inventory of post-conquest Britain (Finney 1986–2006; see O'Donnell 2004 for a discussion). The shortcomings of this project have been widely reported: it was published on video disks that could only be read using a customized disk player; its software was designed to function on the BBC Master personal computer — a computer that at the time was more popular in schools and libraries in the United Kingdom than any competing system but is now hopelessly obsolete. Costing over £2.5 million, the project was designed to showcase technology that it was thought might prove useful to schools, governments, and museums interested in producing smaller projects using the same innovative virtual reality environment. Unfortunately, the hardware proved too expensive for most members of its intended market and very few people ended up seeing the final product. For sixteen years, the only way of accessing the project was via one of a dwindling number of the original computers and disk readers. More recently, after nearly a year of work by an international team of engineers, large parts of the project's content finally have been converted for use on contemporary computer systems.
The Domesday Project is a spectacular example of the most serious kind of technological obsolescence, but it is hardly unique. Most scholars now in their forties and fifties probably have disks lying around their studies containing information that is for all intents and purposes lost due to technological obsolescence — content written using word processors or personal computer database programs that are no longer maintained, recorded on difficult-to-read media, or produced using computers or operating systems that ultimately lost out to more popular competitors. But the Domesday Project did not become obsolete solely because it gambled on the wrong technology: many other digital projects of the time, some written for mainframe computers using languages and operating systems that that are still widely understood, have suffered a similar obsolescence even though their content theoretically could be recovered more easily.
In fact the Domesday Project also suffered from an obsolescence of approach — the result of a fundamental and still ongoing change in how medievalists and others working with digital media approach digitization. Before the second half of the 1980s, digital projects were generally conceived of information machines — programs in which content was understood to have little value outside of its immediate processing context. In such cases, the goal was understood to be the sharing of results rather than content. Sometimes, as in the case of the Domesday Book, the goal was the arrangement of underlying data in a specific (and closed) display environment; more commonly, the intended result was statistical information about language usage and authorship or the development of indices and concordances (see for example, the table of contents in Patton and Holoien 1981, which consists entirely of database, concordance, and statistical projects). Regardless of the specific processing goal, this approach tended to see data as raw material rather than an end result.2 Collection and digitization were done with an eye to the immediate needs of the processor, rather than the representation of intrinsic form and content. Information not required for the task at hand was ignored. Texts encoded for use with concordance or corpus software, for example, commonly ignored capitalization, punctuation, or mise-en-page. Texts encoded for interactive display were structured in ways suited to the planned output (see for example the description of database organization and video collection in Finney 1986–2006). What information was recorded was often indicated using ad hoc and poorly documented tokens and codes whose meaning now can be difficult or impossible to recover (see Cummings 2006).
The problem with this approach is that technology ages faster than information: data that require a specific processing context in order to be understood will become unintelligible far more rapidly than information that has been described as much as possible in its own terms without reference to a specific processing outcome. By organizing and encoding their content so directly to suit the needs of a specific processor, information machines like the Domesday Project condemned themselves to relatively rapid technological obsolescence.
Content as End-product: Browser-based Projects
The age of the information machine began to close with the development and popular acceptance of the first internet browsers in the early 1990s. In an information machine, developers have great control over both their processor and how their data is encoded. They can alter their encoding to suit the needs of their processors and develop or customize processors to work with specific instances of data. Developers working with browsers, however, have far less control over either element: users interact with projects using their own software and require content prepared in ways compatible with their processor. This both makes it much more difficult for designers to produce predictable results of any sophistication and requires them to adhere to standard ways of describing common phenomena. It also changes the focus of project design: where once developers focused on producing results, they now tend to concentrate instead on providing content.
This change in approach explains in large part the relative technological longevity of the projects by McGillivray and Kiernan. Both were developed during the initial wave of popular excitement at the commercialization of the internet. Both were designed to be used without modification by standard internet web browsers operating on the end-users' computer and written in standard languages using a standard character set recognized by all internet browsers to this day. For this reason — and despite the fact that browsers available in the late 1990s were quite primitive by today's standards — it seems very unlikely that either project in the foreseeable future will need anything like the same kind of intensive recovery effort required by the Domesday Project: modern browsers are still able to read early HTML-encoded pages and Java routines and are likely to continue to do so, regardless of changes in operating system or hardware, as long as the internet exists in its current form. Even in the unlikely event that technological changes render HTML-encoded documents unusable in our lifetime, conversion will not be difficult. HTML is a text-based language that can easily be transformed by any number of scripting languages. Since HTML-encoded files are in no way operating system or software dependent, future generations — in contrast to the engineers responsible for converting the Electronic Domesday Book — will be able to convert the projects by Kiernan and McGillivray to new formats without any need to reconstruct the original processing environment.
The separation of content from processor did not begin with the rise of internet browsers. HTML, the language which made the development of such browsers possible, is itself derived from work on standardized structural markup languages in the 1960s through the 1980s. These languages, the most developed and widely used at the time being Standard General Markup Language (SGML), required developers to make a rigid distinction between a document's content and its appearance. Content and structure were encoded according to the intrinsic nature of the information and interests of the encoder using a suitable standard markup language. How this markup was to be used and understood was left up to the processor: in a web browser, the markup could be used to determine the text's appearance on the screen; in a database program it might serve to delimit it into distinct fields. For documents encoded in early HTML (which used a small number of standard elements), the most common processor was the web browser, which formatted content for display for the most part without specific instructions from the content developer: having described a section of text using an appropriate HTML tag as <i> (italic) or <b> (bold), developers were supposed for the most part to leave decisions about specific details of size, font, and position up to the relatively predictable internal style sheets of the user's browser (though of course many early web pages misused structural elements like <table> to encode appearance).
SGML was more sophisticated than HTML in that it described how markup systems were to be built rather than their specific content. This allowed developers to create custom sets of structural elements that more accurately reflected the qualities they wished to describe in the content they were encoding. SGML languages like DocBook were developed for the needs of technical and other publishers; the Text Encoding Initiative (TEI) produced a comprehensive set of structural elements suitable for the encoding of texts for use in scholarly environments. Unfortunately, however, this flexibility also made it difficult to share content with others. Having designed their own sets of structural elements, developers could not be certain their users would have access to software that knew how to process them.
The result was a partial return to the model of the information machine: in order to ensure their work could be used, developers of SGML projects intended for wide distribution tended to package their projects with specific (usually proprietary) software, fonts, and processing instructions. While the theoretical separation of content and processor represented an improvement over that taken by previous generations of digital projects in that it treated content as having intrinsic value outside the immediate processing context, the practical need to supply users with special software capable of rendering or otherwise processing this content tended nevertheless to tie the projects' immediate usefulness to the lifespan and weaknesses of the associated software. This is a less serious type of obsolescence, since rescuing information from projects that suffer from it involves nothing like the technological CPR required to recover the Domesday Project. But the fact that it must occur at all almost certainly limits these projects' longevity and disciplinary impact. Users who must convert a project from one format to another or work with incomplete or partially broken rendering almost certainly are going to prefer texts and scholarship in more convenient formats.
XML, XSLT, Unicode, and Related Technologies
Developments of the past half-decade have largely eliminated the problem these pioneering SGML-based projects faced in distributing their projects to a general audience. The widespread adoption of XML, XSLT, Unicode, and similarly robust international standards on the internet means that scholars developing new digital projects now can produce content using markup as flexible and sophisticated as anything possible in SGML without worrying that their users will lack the necessary software to display and otherwise process it. Just as the projects by Kiernan and McGillivray were able to avoid premature technological obsolescence by assuming users would make use of widely available internet browsers, so to designers of XML-based projects can now increase their odds of avoiding early obsolescence by taking advantage of the ubiquity of the new generation of XML-, XSLT-, and Unicode-aware internet clients.3
Tools and Community Support
The fact that these technologies have been so widely accepted in both industry and the scholarly world has other implications beyond making digital projects easier to distribute, however. The establishment of robust and stable standards for structural markup has also encouraged the development of a wide range of tools and organizations that also make such projects easier to develop.
Perhaps the most striking change lies in the development of tools. When I began my SGML-based edition of Cædmon's Hymn in 1997, the only SGML-aware and TEI-compatible tools I had at my disposal were GNU-Emacs, an open source text editor, and the Panorama and later Multidoc SGML browsers (what other commercial tools and environments were available were far beyond the budget of my one-scholar project). None of these were very user-friendly. Gnu-Emacs, though extremely powerful, was far more difficult to set up and operate than the word processors, spreadsheets, and processors I had been accustomed to use up to that point. The Panorama and Multidoc browsers used proprietary languages to interpret SGML that had relatively few experienced users and a very limited basis of support. There were other often quite sophisticated tools and other kinds of software available, including some — such as TACT, Collate, TUSTEP, and various specialized fonts like Peter Baker's original Times Old English — that were aimed primarily at medievalists or developers of scholarly digital projects. Almost all of these, however, required users to encode their data in specific and almost invariably incompatible ways. Often, moreover, the tool itself also was intended for distribution to the end user — once again causing developers to run the risk of premature technological obsolescence.
Today, developers of new scholarly digital projects have access to a far wider range of general and specialized XML-aware tools. In addition to GNU-Emacs — which remains a powerful editor and has become considerably easier to set up on most operating systems — there are a number of full-featured, easy to use, open source or relatively inexpensive commercial XML-aware editing environments available including Oxygen, Serna, and Screem. There are also quite a number of well-designed tools aimed at solving more specialized problems in the production of scholarly projects.
Daniel Paul O'Donnell
Several of these, such as Anastasia and Edition Production and Presentation Technology (EPPT), have been designed by medievalists. Others, such as the University of Victoria's Image Markup Tool and other tools under development by the TAPoR project, have been developed by scholars in related disciplines.
More significantly, these tools avoid most of the problems associated with those of previous decades. All the tools mentioned in the previous paragraph (including the commercial tools) are XML-based and have built-in support for TEI XML, the standard structural markup language for scholarly projects (this is also true of TUSTEP, which has been updated continuously). This means both that they can often be used on the same underlying content and that developers can encode their text to reflect their interests or the nature of the primary source rather than to suit the requirements of a specific tool. In addition, almost all are aimed at the developer rather than the end user. With the exception of Anastasia and EPPT, which both involve display environments, none of the tools mentioned above is intended for distribution with the final project. Although these tools — many of which are currently in the beta stage of development — ultimately will become obsolete, the fact that almost all are now standards compliant means that the content they produce almost certainly will survive far longer.
A second area in which the existence of stable and widely recognized standards has helped medievalists working with digital projects has been in the establishment of community-based support and development groups. Although Humanities Computing, like most other scholarly disciplines, has long had scholarly associations to represent the interests of their members and foster exchanges of information (e.g., Association for Literary and Linguistic Computing [ALLC]; Society for Digital Humanities/Société pour l'étude des médias interactifs [SDH-SEMI]), the past half-decade has also seen the rise of a number of smaller formal and informal Communities of Practice aimed at establishing standards and providing technological assistance to scholars working in more narrowly defined disciplinary areas. Among the oldest of these are Humanist-l and the TEI — both of which pre-date the development of XML by a considerable period of time. Other community groups, usually narrower in focus and generally formed after the development of XML, Unicode, and related technologies, include MENOTA (MEdieval and NOrse Text Archive), publishers of the Menota handbook: Guidelines for the encoding of medieval Nordic primary sources; MUFI (Medieval Unicode Font Initiative), an organization dedicated to the development of solutions to character encoding issues in the representation of characters in medieval Latin manuscripts; and the Digital Medievalist, a community of practice aimed at helping scholars meet the increasingly sophisticated demands faced by designers of contemporary digital projects, which organizes a journal, wiki, and mailing list devoted to the establishment and publication of best practice in the production of digital medieval resources.
These tools and organizations have helped reduce considerably the technological burden placed on contemporary designers of digital resources. As Peter Robinson has argued, digital projects will not come completely into their own until "the tools and distribution … [are] such that any scholar with the disciplinary skills to make an edition in print can be assured he or she will have access to the tools and distribution necessary to make it in the electronic medium" (Robinson 2005: abstract). We are still a considerable way away from this ideal and in my view unlikely to reach it before a basic competence in Humanities Computing technologies is seen as an essential research skill for our graduate and advanced undergraduate students. But we are also much farther along than we were even a half-decade ago. Developers considering a new digital project can begin now confident that they will be able to devote a far larger proportion of their time to working on disciplinary content — their scholarship and editorial work — than was possible even five years ago. They have access to tools that automate many jobs that used to require special technical know-how or support. The technology they are using is extremely popular and well supported in the commercial and academic worlds. And, through communities of practice like the Text Encoding Initiative, MENOTA, and the Digital Medievalist Project, they have access to support from colleagues working on similar problems around the globe.
Future Trends: Editing Non-textual Objects
With the development and widespread adoption of XML, XSLT, Unicode, and related technologies, text-based digital medieval projects can be said to have emerged from the incunabula stage of their technological development. Although there remain one or two ongoing projects that have resisted incorporating these standards, there is no longer any serious question as to the basic technological underpinnings of new text-based digital projects. We are also beginning to see a practical consensus as to the basic generic expectations for the "Electronic Edition": such editions almost invariably include access to transcriptions and full color facsimiles of all known primary sources, methods of comparing the texts of individual sources interactively, and, in most cases, some kind of guide, reading, or editorial text. There is still considerable difference in the details of interface (Rosselli Del Turco 2006), mise en moniteur, and approach to collation and recension. But on the whole, most developers and presumably a large number of users seem to have an increasingly strong sense of what a text-based digital edition should look like.
Image, sound, and animation: return of the information machine?
Things are less clear when digital projects turn to non-textual material. While basic and widely accepted standards exist for the encoding of sounds and 2D and 3D graphics, there is far less agreement as to the standards that are to be used in presenting such material to the end user. As a result, editions of non-textual material often have more in common with the information machines of the 1980s than contemporary XML-based textual editions. Currently, most such projects appear to be built using Adobe's proprietary Flash and Shockwave formats (e.g., Foys 2003; Reed Kline 2001). Gaming applications, 3D applications, and immersive environments use proprietary environments such as Flash and Unreal Engine or custom-designed software. In each case, the long-term durability and cross-platform operability of projects produced in these environments is tied to that of the software for which they are written. All of these formats require proprietary viewers, none of which are shipped as a standard part of most operating systems. As with the BBC Domesday Project, restoring content published in many of these formats ultimately may require restoration of the original hard- and software environment.
Using technology to guide the reader: three examples4
Current editions of non-textual material resemble information machines in another way, as well: they tend to be over-designed. Because developers of such projects write for specific processors, they — like developers of information machines of the 1980s —are able to control the end-user's experience with great precision. They can place objects in precise locations on the user's screen, allow or prevent certain types of navigation, and animate common user tasks.
When handled well, such control can enhance contemporary users' experience of the project. Martin Foy's 2003 edition of the Bayeux Tapestry, for example, uses Flash animation to create a custom-designed browsing environment that allows the user to consult the Bayeux Tapestry as a medieval audience might — by moving back and forth apparently seamlessly along its 68-meter length. The opening screen shows a section from the facsimile above a plot-line that provides an overview of the Tapestry's entire contents in a single screen. Users can navigate the Tapestry scene-by-scene using arrow buttons at the bottom left of the browser window, centimeter by centimeter using a slider on the plot-line, or by jumping directly to an arbitrary point on the tapestry by clicking on the plot-line at the desired location. Tools, background information, other facsimiles of the tapestry, scene synopses, and notes are accessed through buttons at the bottom left corner of the browser. The first three types of material are presented in a separate window when chosen; the last two appear under the edition's plot-line. Additional utilities include a tool for making slideshows that allows users to reorder panels to suit their own needs.
If such control can enhance a project's appearance, it can also get in the way —encouraging developers to include effects for their own sake, or to control end-users' access to the underlying information unnecessarily. The British Library Turning the Pages series, for example, allows readers to mimic the action of turning pages in an otherwise straightforward photographic manuscript facsimile. When users click on the top or bottom corner of the manuscript page and drag the cursor to the opposite side of the book, they are presented with an animation showing the page being turned over. If they release the mouse button before the page has been pulled approximately 40 percent of the way across the visible page spread, virtual "gravity" takes over and the page falls back into its original position.
This is an amusing toy and well suited to its intended purpose as an "interactive program that allows museums and libraries to give members of the public access to precious books while keeping the originals safely under glass" (British Library Board n.d.). It comes, however, at a steep cost: the page-turning system uses an immense amount of memory and processing power — the British Library estimates up to 1 GB of RAM for high quality images on a standalone machine — and the underlying software used for the internet presentation, Adobe Shockwave, is not licensed for use on all computer operating systems (oddly, the non-Shockwave internet version uses Windows Media Player, another proprietary system that shares the same gaps in licensing). The requirement that users drag pages across the screen, moreover, makes paging through an edition unnecessarily time- and attention-consuming: having performed an action that indicates that they wish an event to occur (clicking on the page in question), users are then required to perform additional complex actions (holding the mouse button down while dragging the page across the screen) in order to effect the desired result. What was initially an amusing diversion rapidly becomes a major and unnecessary irritation.
More intellectually serious problems can arise as well. In A Wheel of Memory: The Hereford Mappamundi (Reed Kline 2001), Flash animation is used to control how the user experiences the edition's content — allowing certain approaches and preventing others. Seeing the Mappamundi "as a conceit for the exploration of the medieval collective memory… using our own collective rota of knowledge, the CD-ROM" (§ I [audio]), the edition displays images from the map and associated documents in a custom-designed viewing area that is itself in part a rota. Editorial material is arranged as a series of chapters and thematically organized explorations of different medieval Worlds: World of the Animals, World of the Strange Races, World of Alexander the Great, etc. With the exception of four numbered chapters, the edition makes heavy use of the possibilities for non-linear browsing inherent in the digital medium to organize its more than 1,000 text and image files.
Unfortunately, and despite its high production values and heavy reliance on a nonlinear structural conceit, the edition itself is next-to-impossible to use or navigate in ways not anticipated by the project designers. Text and narration are keyed to specific elements of the map and edition and vanish if the user strays from the relevant hotspot: because of this close integration of text and image it is impossible to compare text written about one area of the map with a facsimile of another. The facsimile itself is also very difficult to study. The customized viewing area is of a fixed size (I estimate approximately 615 × 460 pixels), with more than half this surface given over to background and navigation: when the user chooses to view the whole map on screen, the 4-foot-wide original is reproduced with a diameter of less than 350 pixels (approximately 1/10 actual size). Even then, it remains impossible to display the map in its entirety: in keeping with the project's rota conceit, the facsimile viewing area is circular even though the Hereford map itself is pentagonal: try as I might, I never have been able to get a clear view of the border and image in the facsimile's top corner.
Future standards for non-textual editions?
It is difficult to see at this point how scholarly editions involving non-textual material ultimately will evolve. Projects that work most impressively right now use proprietary software and viewers (and face an obvious danger of premature obsolescence as a result); projects that adhere to today's non-proprietary standards for the display and manipulation of images, animation, and sound currently are in a situation analogous to that of the early SGML-based editions: on the one hand, their adherence to open standards presumably will help ensure their data is easily converted to more popular and better supported standards once these develop; on the other hand, the lack of current popular support means that such projects must supply their own processing software — which means tying their short-term fate to the success and flexibility of a specific processor. Projects in this field will have emerged from the period of their technological infancy when designers can concentrate on their content, safe in the assumption that users will have easy access to appropriate standards-based processing software on their own computers.
Collaborative Content Development
The development of structural markup languages like HTML were crucial to the success of the internet because they allowed for unnegotiated interaction between developers and users. Developers produce content assuming users will be able to process it; users access content assuming it will be suitable for use with their processors. Except when questions of copyright, confidentiality, or commerce intervene, contact between developers and users can be limited to little more than the purchase of a CD-ROM or transfer of files from server to browser.
The past few years have seen a movement toward applying this model to content development as well. Inspired by the availability of well-described and universally recognized encoding standards and encouraged no doubt by the success of the Wikipedia and the open source software movement, many projects now are looking for ways to provide for the addition and publication of user-contributed content or the incorporation of work by other scholars. Such contributions might take the form of notes and annotations, additional texts and essays, links to external resources, and corrections or revision of incorrect or outdated material.
An early, pre-wiki, model of this approach is the Online Reference Book for Medieval Studies (ORB). Founded in 1995 and run by a board of section editors, ORB provides a forum for the development and exchange of digital content by and for medievalists. Contributors range from senior scholars to graduate students and interested amateurs; their contributions cover a wide variety of genres: encyclopedia-like articles, electronic primary texts, on-line textbooks and monographs, sample syllabi, research guides, and resources for the non-specialist. Despite this, the project itself is administered much like a traditional print-based encyclopedia: it is run by an editorial board that is responsible for soliciting, vetting, and editing contributions before they are published.
More recently, scholars have been exploring the possibilities of a different, unnegotiated approach to collaboration. One model is the Wikipedia — an on-line reference source that allows users to contribute and edit articles with little editorial oversight. This approach is frequently used on a smaller scale for the construction of more specialized reference works: the Digital Medievalist, for example, is using wiki software to build a community resource for medievalists who use digital media in their research, study, or teaching. Currently, the wiki contains descriptions of projects and publications, conference programs, calls for papers, and advice on best practice in various technological areas.
Other groups, such as a number of projects at the Brown Virtual Humanities Laboratory, are working on the development of mechanisms by which members of the community can make more substantial contributions to the development of primary and secondary sources. In this case, users may apply for permission to contribute annotations to the textual database, discussing differences of opinion or evidence in an associated discussion form (Armstrong and Zafrin 2005; Riva 2006).
A recent proposal by Espen Ore suggests an even more radical approach: the design of unnegotiated collaborative editions — i.e., projects that are built with the assumption that others will add to, edit, and revise the core editorial material: texts, introductory material, glossaries, and apparatus (Ore 2004). In a similar approach, the Visionary Rood Project has proposed building its multi-object edition using an extensible architecture that will allow users to associate their own projects with others to form a matrix of interrelated objects, texts, and commentary (Karkov et al. 2006). Peter Robinson has recently proposed the development of tools that would allow this type of editorial collaboration to take place (Robinson 2005).
These approaches to collaboration are still very much in their earliest stages of development. While the technology already exists to enable such community participation in the development of intellectual content, questions of quality control, intellectual responsibility, and especially incentives for participation remain very much unsettled. Professional scholars traditionally achieve success — both institutionally and in terms of reputation — by the quality and amount of their research publications. Community-based collaborative projects do not easily fit into this model. Project directors cannot easily claim intellectual responsibility for the contributions of others to "their" projects — reducing their value in a profession in which monographs are still seen as a standard measure of influence and achievement. And the type of contributions open to most participants — annotations, brief commentary, and editorial work — are difficult to use in building a scholarly reputation: the time when a carefully researched entry on the Wikipedia or annotation to an on-line collaborative edition will help scholars who are beginning or building their careers is still a long way away (see O'Donnell 2006 who discusses a number of the economic issues involved in collaborative digital models).
Digital scholarship in Medieval Studies has long involved finding an accommodation between the new and the durable. On the one hand, technology has allowed scholars to do far more than was ever possible in print. It has allowed them to build bigger concordances and more comprehensive dictionaries, to compile detailed statistics about usage and dialectal spread, and to publish far more detailed collations, archives, and facsimiles. At the same time, however, the rapidly changing nature of this technology and its associated methods has brought with it the potential cost of premature obsolescence. While few projects, perhaps, have suffered this quite so spectacularly as the BBC's Domesday Book, many have suffered from an undeserved lack of attention or disciplinary impact due to technological problems. The emphasis on information as a raw material in the days before the development of structural markup languages often produced results of relatively narrow and short-term interest —often in the form of information machines that could not survive the obsolescence of their underlying technology without heroic and costly efforts at reconstruction. Even the development of early structural markup languages like SGML did not entirely solve this problem: while theoretically platform-independent and focused on the development of content, SGML-based projects commonly required users to acquire specific and usually very specialized software for even the most basic processing and rendition.
Of the projects published in the initial years of the internet revolution, those that relied on the most widely supported technology and standards — HTML and the ubiquitous desktop internet browsers — survived the best. The editions by Kiernan and McGillivray showcased by Solopova in her lecture that summer still function well — even if their user interfaces now look even more old fashioned two years on.
In as much as the new XML- and Unicode-based technologies combine the flexibility and sophistication of SGML with the broad support of early HTML, text-based medieval digital scholarship is now leaving its most experimental period. There remain economic and rhetorical issues surrounding the best ways of delivering different types of scholarly content to professional and popular audiences; but on the whole the question of the core technologies required has been settled definitively.
The new areas of experimentation in medieval digital studies involve editions of non-textual material and the development of new collaborative models of publication and project development. Here technology both has even more to offer the digital scholar and carries with it even greater risks. On the one hand, the great strides made in computer-based animation, gaming, and 3D imaging in the commercial world offer projects the chance to deal with material never before subject to the kind of thorough presentation now possible. We already have marvelous editions of objects —maps, tapestries, two-dimensional images — that allow the user to explore their subjects in ways impossible in print. In the near future we can expect to see a greater use of 3D and gaming technology in the treatment of sculpture, archaeological digs, and even entire cities. With the use of wikis and similar types of collaborative technologies, such projects may also be able to capture much more of the knowledge of the disciplinary experts who make up their audiences.
For projects dealing with non-textual objects, the risk is that the current necessity of relying on proprietary software intended for the much shorter-term needs of professional game designers and computer animators will lead to the same kind of premature and catastrophic obsolescence brought on by the equally-advanced-for-its-day Domesday Project. Sixteen years from now, animation design suites like Director (the authoring suite used for producing Shockwave files) and gaming engines like Unreal engine (an authoring engine used to produce current generations of video games) are likely to be different from, and perhaps incompatible with, current versions in a way that XML authoring technologies and processors will not. While we can hope that reconstruction will not be as difficult as it proved to be in the case of the Domesday Project, it seems likely that few of today's non-textual editions will still be working without problems at an equivalent point in their histories, two decades from now.
In the case of experimentation with collaborative software, the challenge is more economic and social than technological. In my experience, most professional scholars initially are extremely impressed by the possibilities offered by collaborative software like wikis and other forms of annotation engines — before almost immediately bumping up against the problems of prestige and quality control that currently make them infeasible as channels of high-level scholarly communication. Indeed at one recent conference session I attended (on the future of collaborative software, no less!) the biggest laugh of the morning came when one of the speakers confessed to having devoted most of the previous month to researching and writing a long article for the Wikipedia on his particular specialism in Medieval Studies.
That current text-based digital editions seem likely to outlive the technology that produced them can be attributed to the pioneering efforts of the many scholars responsible for editions like those by Adams, Kiernan, McGillivray, and Robinson discussed by Solopova in her lecture. The current generation of scholars producing editions of non-textual objects and experimenting with collaborative forms of scholarship and publication are now filling a similar role. The solutions they are developing may or may not provide the final answers; but they certainly will provide a core of experimental practice upon which the final answers most certainly will be built.
1 The focus of this chapter is on theoretical and historical problems that have affected digital scholarship in Medieval Studies in the past and are likely to continue to do so for the foreseeable future. Scholars seeking more specific advice on technological problems or best practice have access to numerous excellent Humanities Computing societies, mailing lists, and internet sites. For some specific suggestions, see Part IV, "Methodologies," pp. 389–576, below. I thank Roberto Rosselli Del Turco for his help with this chapter.
2 Exceptions to this generalization prove the rule: pre-internet-age projects, such as the Dictionary of Old English (DOE) or Project Gutenberg, that concentrated more on content than processing, have aged much better than those that concentrated on processing rather than content. Both the DOE and Project Gutenberg, for example, have successfully migrated to HTML and now XML. The first volume of the DOE was published on micro-fiche in 1986 — the same year as the BBC's Domesday Book; on-line and CD-ROM versions were subsequently produced with relatively little effort. Project Gutenberg began with ASCII text in 1971.
3 Not all developers of XML-encoded medieval projects have taken this approach. Some continue to write for specific browsers and operating systems (e.g., Muir 2004a); others have developed or are in the process of developing their own display environments (e.g., Anasta-sia, Elwood [see Duggan and Lyman 2005: Appendix]). The advantage of this approach, of course, is that — as with information machines like the BBC Domesday Book — developers acquire great control over the end user's experience (see for example McGillivray 2006 on Muir 2004b); the trade-off, however, is likely to be more rapid than necessary technological obsolescence or increased maintenance costs in the future.
4 The discussion in this section has been adapted with permission from a much longer version in O'Donnell 2005b.
References and Further Reading
Organizations and support
Digital Medievalist. An international web-based Community of Practice for medievalists working with digital media. Operates a mailing list, peer-reviewed journal, and wiki <http://www.digitalmedievalist.org/>.
Humanist-l. An international electronic seminar on humanities computing and the digital humanities <http://www.princeton.edu/humanist/>.
MENOTA. (MEdieval and NOrse Text Archive), publishers of the Menota handbook: Guidelines for the encoding of medieval Nordic primary sources <http://www.menota.org/>.
MUFI. (Medieval Unicode Font Initiative), an organization dedicated to the development of solutions to character encoding issues in the representation of characters in medieval Latin manuscripts <http://gandalf.aksis.uib.no/mufi/>.
TEI (Text Encoding Initiative). An international and interdisciplinary standard that enables libraries, museums, publishers, and individual scholars to represent a variety of literary and linguistic texts for online research, teaching, and preservation. Also operates a mailing list <http://www.tei-c.org/>.
Adams, Robert, Hoyt N. Duggan, Eric Eliason, Ralph Hanna III, John Price-Wilkin, and Thorlac Turville-Petre (2000). Corpus Christi College Oxford MS 201 (F) [CD-ROM]. Ann Arbor: University of Michigan Press.
Armstrong, Guyda, and Vika Zafrin (2005). "Towards the Electronic Esposizioni: The Challenges of the Online Commentary." Digital Medievalist 1.1 [Online Journal]. <http://www.digitalmedievalist.org/article.cfm?RecID 1>.
British Library Board (n.d.). "Turning the Pages: Welcome" [Webpage]. <http://www.armadillosystems.com/ttp_commercial/home.htm>.
Cummings, James (2006). "Liturgy, Drama, and the Archive: Three Conversions from Legacy Formats to TEI XML." Digital Medievalist 2.1 [Online Journal]. <http://www.digitalmedievalist.org/article.cfm?RecID 11>.
Duggan, Hoyt N. (2005). "A Progress Report on The Piers Plowman Electronic Archive" with a contribution by Eugene W. Lyman. Digital Medievalist 1.1 [Online Journal]. <http://www.digitalmedievalist.org/article.cfm?RecID 3>.
Finney, Andy (1986–2006). "The Domesday Project" [Website]. <http://www.atsf.co.uk/dottext/domesday.html>.
Foys, Martin K. (2003). The Bayeux Tapestry: Digital Edition [CD ROM]. Leicester: SDE.
Fraser, Michael (1998). "The Electronic Text and the Future of the Codex I: The History of the Electronic Text" [Unpublished Lecture]. History of the Book Seminar, Oxford University. January 1998. <http://users.ox.ac.uk/-mikef/pubs/hob_fraser_1998.html>.
Karkov, Catherine, Daniel Paul O'Donnell, Roberto Rosselli Del Turco, James Graham, and Wendy Osborn (2006). "The Visionary Cross Project" [Webpage]. <http://www.visionarycross.org/>.
Keene, Suzanne (n.d.). "Now You See It, Now You Won't" [Webpage]. <http://www.suzannekeene.info/conserve/digipres/index.htm>.
Kiernan, Kevin S. (1999). Electronic Beowulf [CD< ROM]. London: British Library.
McGillivray, Murray (2006). [Review of Muir 2004b]. Digital Medievalist 2.1 [Online Journal]. <http://www.digitalmedievalist.org/article.cfm?RecID 14>.
McGillivray, Murray (1997). Geoffrey Chaucer's Book of the Duchess: A Hypertext Edition [CD-ROM]. Calgary: University of Calgary Press.
Muir, Bernard James (2004a). The Exeter Anthology of Old English Poetry: An Edition of Exeter Dean and Chapter MS 3501. Rev. 2nd [CD ROM] edn. Exeter: Exeter University Press.
Muir, Bernard James (2004b). A digital facsimile of Oxford, Bod-leian Library MS. Junius 11. Software by Nick Kennedy. Bodleian Library Digital Texts 1. Oxford: Bodleian Library.
O'Donnell, Daniel Paul (2004). "The Doomsday Machine, or, 'If You Build It, Will They Still Come Ten Years From Now?': What Medievalists Working in Digital Media Can Do to Ensure the Longevity of Their Research." Heroic Age 7 [Online Journal]. <http://www.mun.ca/mst/heroicage/issues/7/ecolumn.html>.
O'Donnell, Daniel Paul (2005a). Cædmon's Hymn: A Multimedia Study, Archive and Edition. Society for early English and Norse electronic texts A.7. Cambridge: D. S. Brewer in association with SEENET and the Medieval Academy.
O'Donnell, Daniel Paul (2005b). "O Captain! My Captain! Using Technology to Guide Readers Through an Electronic Edition." Heroic Age 8 [Online Journal]. <http://www.mun.ca/mst/heroicage/issues/8/em.html>.
O'Donnell, Daniel Paul (2006). "Why Should I Write for Your Wiki: Towards a New Economics of Academic Publishing." Unpublished Lecture: "New Technologies and Renaissance Studies IV: Publication and New Forms of Collaboration," 52nd Annual Meeting of the Renaissance Society of America, San Francisco, CA, March 23.
Ore, Espen S. (2004). "Monkey Business – or What Is an Edition?" Literary and Linguist Computing 19: 35–44.
Patton, Peter C. and Renee A. Holoien, Eds. (1981). Computing in the Humanities Lexington, MA: Lexington Books.
Reed Kline, Naomi (2001). A Wheel of Memory: The Hereford Mappamundi [CD ROM]. Ann Arbor: University of Michigan Press.
Riva, Massimo (2006). "Online Resources for Collaborative Research: The Pico Project at Brown University." Unpublished Lecture: "New Technologies and Renaissance Studies IV: Publication and New Forms of Collaboration," 52nd Annual Meeting of the Renaissance Society of America, San Francisco, CA, March 23.
Robinson, Peter (2005). "Current Issues in Making Digital Editions of Medieval Texts – or, Do Electronic Scholarly Editions Have a Future?" Digital Medievalist 1.1 [Online journal]. <http://www.digitalmedievalist.org/article.cfm?RecID 6>.
Robinson, Peter, and N. F. Blake (1996). The Wife of Bath's Prologue on CD-ROM. Canterbury Tales Project. Cambridge: Cambridge University Press.
Rosselli Del Turco, Roberto (2006). "After the Editing Is Done: Designing a Graphic User Interface for Digital Editions" [Unpublished lecture]. Delivered at: Session 640 "Digital Publication," 41st International Congress on Medieval Studies, Western Michigan University, May 6.