“Mapping from objects to markup: a springboard for
multiple-strategy electronic publishing”
Gary
F.
Simons
Summer Institute of Linguistics
gary.simons@sil.org
One of the challenges of electronic publishing is getting the information
into the right format for the particular publishing strategy being pursued.
Another is keeping up with the fast pace of change as new technologies are
developed that offer more or better ways of electronically publishing
information.
This paper reports on the experience of the Summer Institute of Linguistics
in developing electronic publishing solutions for its LinguaLinks product
(SIL 1996). LinguaLinks is an electronic performance support system designed
to assist field workers with a wide range of tasks related to language
learning, language analysis, and language development.
The paper first introduces the LinguaLinks model of performance support and
CELLAR--the object-oriented database system that is used to implement it.
Our approach to electronic publishing is to first build the information as a
structure of objects in the database, and then to use multiple CELLAR
stylesheets to map the information onto multiple markup schemes. The object
database thus serves as a springboard that allows us to vault the
information into any number of formats for publishing.
The paper illustrates this approach to electronic publishing by focusing on
one application area that LinguaLinks supports, namely, lexical database
development. It first shows how the tutorial and reference documents that
give help on how to build a dictionary are mapped onto different markup
schemes for publication as a Folio Views infobase, a Windows help system,
and an HTML Web document. It then shows how the dictionaries that are built
by using LinguaLinks are mapped onto HTML markup to provide a display format
on the Web and onto TEI markup to provide a richer format for information
interchange and archiving.
1. The LinguaLinks model of electronic performance support
The notion of electronic performance support systems (Gery 1991) is one that is gaining momentum throughout industry. An EPSS seeks to support a knowledge worker in performing his or her job by providing an electronic working environment that integrates the software tools needed to do the job with the tutorial and reference materials that are needed to know how to do the job. This goes well beyond the typical help system of a software program to include not just information on how to use the program but also a library of background information on the problem domain. LinguaLinks is designed to support tasks in the domains of anthropology, language learning, linguistics, literacy, and sociolinguistics. Work continues to add more and more resources in subsequent versions of the product. In version 1.0, one of the areas that is most deeply developed is lexical database management. This component includes a data management tool that helps the user to build a lexical database and then to use the information in that database to produce dictionaries for publication. LinguaLinks takes advantage of the object-oriented paradigm to provide performance support that is tailored to what the user is trying to do. One of the hardest problems in offering electronic performance support is knowing just what the user is trying to do. For instance, if a word processing program were being used to write a dictionary, it would be very difficult to implement performance support that could sense the context within a dictionary entry and offer appropriate help. By building a dictionary in an object-oriented database, however, two kinds of performance support fall out naturally. First, the software developers have performed an object- oriented analysis (Coad and Yourdon 1991, Booch 1994) with domain experts in order to build the conceptual model for the database. The definitions of object classes (including their attributes) that make up this model provide performance support by guiding the user to create dictionary entries that are structured like ones the domain experts would have built; these definitions also prevent ill-formed entries from being constructed. Second, the software always knows what the user is working on by observing what object and attribute is currently selected or being editing. Thus if the etymology is being edited, the system knows to offer tutorial and reference material on how to write etymologies when the user asks for help. When the user switches focus to the part of speech, then the focus of help also switches to that aspect of dictionary making, and so on for all the parts of the conceptual model of a dictionary entry. Section 2 of the paper gives an overview of the object- oriented database system that is used to implement performance support in LinguaLinks. Section 3 describes how the object database is used as a springboard to markup for electronic publishing. Section 4 then shows how the tutorial and reference materials are mapped into markup in order to support multiple strategies for providing performance support. Section 5 in turn shows how the dictionaries built by users can be mapped into markup for multiple publishing strategies.2. An overview of the CELLAR object-oriented database system
The database system used to implement LinguaLinks is called CELLAR--for Computing Environment for Linguistic, Literary, and Anthropological Research. Developed by the Summer Institute of Linguistics, it is an object-oriented database system for storing multilingual textual information. A full discussion of the user requirements that motivated the development of the system is given in Simons 1996. Rettig, Simons, and Thomson (1993) discuss some of the significant ways in which CELLAR extends the traditional object model. An application in CELLAR is a declarative model of the problem domain. A complete domain model has the following four components:- Conceptual model. Declares all the object classes in the problem domain and their attributes, including integrity constraints on attributes that store values and built-in queries on those that compute their values on-the-fly.
- Visual model. Declares one or more ways in which objects of each class can be formatted for display to the user.
- Encoding model. Declares one or more ways in which objects of each class can be encoded in plain text files so that users can import data from external sources or export them.
- Manipulation model. Declares one or more tools which translate the interactive gestures of the user into direct manipulation of objects in the knowledge base.
3. Using CELLAR as a springboard to electronic publishing markup
The strategy for implementing multiple markup systems involves the visual modeling component of CELLAR. For the single conceptual model of the information, multiple visual models are defined. This approach has been illustrated in another paper (Simons, in press) where multiple ways of displaying tagged texts and critical texts are generated from the same underlying database. In this application, each visual model generates a display format that happens to be a markup scheme for a particular electronic publishing system. For each class of object, a view is defined that declares how the information in that object is to be mapped onto the display format. All of the views that cooperate to define a single visual model are collected together into a stylesheet. In this section, the full paper will present some source code examples that illustrate how the views can be made to map objects onto markup.4. Strategies for publishing helps on building dictionaries
The tutorial and reference materials that provide performance support in LinguaLinks are objects in the underlying CELLAR database. One strategy we have followed for electronically publishing them is to present them in a view that looks like a conventional document. But this is just one strategy. We have followed three others as well:- 1. In order to make jumps to the helps virtually instantaneous, we have mapped them onto RTF markup and compiled them as a Windows help system.
- 2. In order to offer full-text boolean search and retrieval access to the library of helps, we have mapped them onto Folio Flat File format and compiled them into a Folio Views infobase.
- 3. In order to offer access to these helps on the World Wide Web, we have mapped them onto HTML markup.
5. Strategies for publishing the resulting dictionaries
Similarly, the lexical database that the user builds in LinguaLinks is a collection of objects in the underlying CELLAR database. One strategy for electronically publishing a dictionary is to deliver it as a CELLAR database with views that present it in conventional display formats. But this strategy will not reach a wide audience. We have thus implemented two other strategies as well:- 1. In order to produce a dictionary that can be read on the World Wide Web by any browser, we can map the objects onto HTML markup.
- 2. In order to produce a dictionary data file that can be archived without any loss of structural information, we can map the objects onto the TEI markup for print dictionaries (Sperberg-McQueen and Burnard 1994).
References
Grady Booch. Object-oriented analysis and design with applications. Redwood City, CA: Benjamin/Cummings Publishing Co, 1994.
Peter Coad Edward Yourdon. Object-oriented analysis. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1991.
Gloria J. Gery. Electronic performance support systems. Boston: Weingarten Publications, 1991.
Marc Rettig Gary Simons John Thomson. “Extended objects.” Communications of the ACM. 1993. 36: 19-24.
SIL. LinguaLinks: electronic helps for language field work. Dallas, TX: Summer Institute of Linguistics, 1996.
Gary F. Simons. “The nature of linguistic data and the requirements of a
computing environment for linguistic research.” Dutch studies on Near Eastern languages and literatures. 1996. 2: 111-128.
Gary F. Simons. “Conceptual modeling versus visual modeling: a
technological key to building consensus.” Computers and the Humanities. 1996. : .
C. M. Sperberg-McQueen Lou Burnard. Guidelines for electronic text encoding and interchange. Chicago and Oxford: Text Encoding Initiative, 1994.