“A broadcast architecture for distributed text
tools”
Steven
J.
DeRose
Computer Center University of Illinois at
Chicago
C.
M.
Spergberg-McQueen
Computer Center University of Illinois at
Chicago
cmsmcq@acm.org
Adequate textual analysis software is difficult to create. Scholarly users
have special requirements seldom met by commercial packages, e.g. lexical,
syntactic, and statistical analysis; special layouts for interlinear texts;
synchronized scrolling of multiple translations or editions; and flexible
tools for searching and for organizing search results and making latent
patterns visible. Disparity in document formats and levels of tagging and
meta-information long made it difficult to share text software. And the cost
of software development frequently exceeds the resources available for
humanities computing infrastructure.
Thanks to SGML, XML, the TEI, and even HTML, we are now closer to having a
uniform way to exchange information about documents and their structures.
And thanks to other existing and emergent standards, it is now possible to
specify a simple architecture that can help organize a modular system, into
which a variety of analysis, display, and other tools can be plugged. This
would allow independent development, maintenance, and use of far more tools
than could ever be handled with a monolithic approach.
A simple scenario
Consider a user viewing a large collection of texts; perhaps all the literary works of a single author or period, using several tools:- a fully-formatted view;
- a word list, from which searches may be issued;
- a Key Word in Context (KWIC) view;
- an interlinear view with grammatical, thematic, or other information displayed in association with text portions.
- 1. Almost all scholarly analysis tools can be construed as "views" of an underlying corpus.
- 2. Little communication is required between the views. Each must have efficient access to the underlying data, but individual views only need communicate terse information (e.g. the new focal word-type) to others when they change state.
- 3. When one view changes state, other views can respond by changing their own state; the first view need not control the others. This allows the user to have some KWIC or text views which respond to new selections in the wordlist, and others which are unaffected. Thus, our architecture decentralizes inter-view control: any view can respond to others, but no view is controlled by another.
- 4. A view's response to events elsewhere may be simple (e.g. scroll) or arbitrarily complex (e.g. recalculate a statistical description of the text).