“SCHOLIA: a web-based reading tool for the study of
language and literature ”
Wendell
Piez
Mulberry Technologie, Inc.
wapiez@mulberrytech.com
This project was developed for a very practical reason. I was sitting down to
read a text in German, a language in which I have some facility but no great
fluency. When studying classical languages, I was schooled in a straightforward
technique that can apply to any foreign literary text: it entails looking up
unknown words in a dictionary or lexicon as I go, and writing an apparatus (in
pencil on note paper) to help me put it all together in rereadings. After some
years of (fairly feeble) practice, I have even refined this process somewhat:
the apparatus takes the form of two separate texts, a running vocabulary and a
translation. A running vocabulary, of course, takes the simple form of a
glossary, listed in order of the appearance of glossed words or phrases in the
original (thus facilitating reference to them when rereading). The translation
provides a record of the larger sense of sentences and passages.
Of course, sitting down to do this recently I found the work both as laborious
and as inefficient as I had when trying to learn languages as an undergraduate.
Reflecting on the shortcomings of my process -- and comparing these methods to
the methods of others who, I imagined, were better language students -- I came
up with a design for a set of XML-based tools that I thought would help lighten
the sheer "toil factor" of the exercise, letting me concentrate on the reading.
My objective was twofold:
- 1. Make it easy to transcribe my apparatus in a standards-based, non-proprietary electronic form tractable for automated processing;
- 2. Provide this format with a reading interface that would make consulting the text and apparatus easier than the old paper-based, back-and-forth, page-to-notes process I was used to. It should be lightweight and easy to use.
- A native-language version of the text under study (in the prototype, Robert Musil's German-language short story "Die Amsel") appears in a panel to the left of the screen. Certain terms in the text are highlighted by appearing in a different color.
- Roving a mouse over a highlighted ("flagged") text and letting it sit for a second pops up a small "tool tip" for a moment. An English-language translation or short gloss of the word or phrase is provided.
- Also, if you click on the text, a running translation (in English) of the sentence or clause you have clicked on appears in the right-hand panel of the screen.
- The entire running translation (right-hand panel) may be "toggled" on and off by clicking on it. Roving your mouse over it also provides visual cues lining up the sentences or clauses in the original with the translation.
- Base the encoding on a TEI-compatible format. Starting with TEI I am able to get a head start with stylesheets and tool configuration, while potentially easing interchange (and helping other users who may also know TEI).
- Demonstrate the utility of W3C DOM interface scripting on the client beyond simply adding gloss, jazz or "dancing bologna" to a web page: here it provides an application with an actual real-world use (the study of literature in a foreign language) at least for one user (me), and hints at others (since the process model and even tagging could be fairly readily adapted to other uses).
- Demonstrate some practical advantages of a two-tier publishing model. The complex and cumbersome HTML presentation code does not have to be written by hand, but instead is generated out of an XML source whose design is optimized for authoring. Novice learners of XML markup should be able to create the texts. Teams of students and scholars can break up the work of creating the texts, and yet share access to them.
- Demonstrate how XML-based transformation technologies enable all these benefits while providing for a measure of interoperability with other uses and forms of electronic text (especially, in this case, TEI texts); and yet illuminate the tradeoffs and design challenges that this objective poses.
- A formal document model (TEI-derived) taking the form of a custom XML DTD along with a related SGML DTD written in TEI P3-compatible form. (The SGML DTD is a close superset of the XML DTD, which is considered to be normative.) Either DTD can be used for structural validation, which in itself is enough to assure referential integrity of cross-references in the output (thus cross-references do not need to be encoded explicitly).
- A small set of extra-DTD parameters for "driving" and/or configuring the function of the (DOM/DHTML) output, described in related documentation, and capable of being validated with an XSLT stylesheet.
- One or more XML documents, such as my prototype "Amsel", conforming to the models and extra-DTD constraints.
- An XSLT stylesheet that can be used to transform an XML Scholia document into HTML, for viewing in a DOM-compliant web browser. (The conversion can be run either in batch mode, allowing a server to deliver HTML, or dynamically on a client machine.)
- An HTML page, whose design is determined by the stylesheet, proving source text with the reading interface. This is what readers (students of the text) will work with.
- Stylesheets to create other reading versions, for example annotated or parallel texts in print;
- A transformation that would enable Scholia-compliant texts to be written out in strict TEI interchange format;
- A transformation that would "scrub" any arbitrary TEI text into a form suitable for marking up with new scholia;
- A supplementary transformation that would enable the conversion of "repository" or "interchange" versions into the scripted HTML reading version (or other reading/browsing versions);
- Transformations that would filter Scholia-compliant texts into other forms, for example to facilitate web- or email-based collaboration in writing them;
- A "unified" interface for composition and reading, potentially web-based, for example using forms or editing capabilities of modern browsers and the latest DOM standards.
- A concise review of project goals, non-goals and requirements;
- A demonstration of the Scholia reading interface (in the Netscape 6.2 and IE5.5 web browsers),
- A demonstration of an authoring interface (using SoftQuad's XMetaL) along with (if time permits), an alternative "low end" authoring interface (using a plain text editor),
- A presentation of the enhancements or "layer" on top of TEI that provides the authoring document model, with discussion of the design principles applied and decisions made, along with the rationales for those decisions in the light of requirements.
- Any attention to the "guts" of the DOM scripting and XSLT transformations that time may allow.