“Kirrkirr: Software for Browsing and Visual Exploration
of a Structured Warlpiri Dictionary”
Christopher
D.
Manning
Stanford University, USA
This paper discusses the goals, architecture, and usability of Kirrkirr, a
Java-based visualization tool for XML dictionaries, currently being used with a
dictionary for Warlpiri, an Australian Aboriginal language.
While dictionaries on computers are now common, there has been surprisingly
little work on innovative ways of utilising the capabilities of computers for
visualization, hypertext linking and multimedia in order to provide a richer
experience of dictionary content. Most electronic dictionaries present the
search-dominated interface of classic information retrieval (IR) systems, which
are only effective when the user has a clearly specified information need and a
good understanding of the content being searched. The ability to browse often
makes paper dictionaries easier and more pleasant to use than such electronic
dictionaries. Search interfaces are ineffective for information needs such as
exploring a concept. Some work in IR has emphasised the need for new methods of
information access and visualization for browsing document collections (Pirolli
et al. 1996), and we wish to extend such ideas into the domain of dictionaries,
in part because indications are that current interfaces are unlikely to have
much direct educational benefit for students (Kegl 1995).
Our goal has been to provide a fun dictionary tool that is effective for browsing
and incidental language learning. In particular we attempt to address Sharpe's
(1995) "distinction between information gained and knowledge sought". The speed
of information retrieval that e-dictionaries deliver, and the focused
decontextualized search results they provide, can frequently lead to loss of the
memory retention benefits and chances for random learning that manually
searching through paper dictionaries provides.
Within the Australian context, indigenous dictionary structure and usability are
often dictated by professional linguists, while the needs of others (speakers,
semi-speakers, young users, second language learners) are not met. Another major
goal has been to design an interface usable by, and interesting to, young users
and other language learners. From this viewpoint, the low level of literacy in
the region, and the inherently captivating nature of computers suggests that an
e-dictionary is potentially more useful than a paper edition. Among other
benefits, we can provide an interface less dependent on good knowledge of
spelling and alphabetical order.
Our dictionary interface initially targeted Warlpiri, a language of Central
Australia, for which there has been an extensive on-going project for the
compilation of semantically-rich lexical materials (Laughren and Nash 1983, Hale
and Laughren [to appear]). We converted this data from a non-standard format
into a richly-structured XML version (XML 1999). The current version uses ad hoc
indexing of this textual version for efficient access, but we expect to move to
XQL, as this standard matures. Our system is written in Java, using the Swing
API, and runs on all major platforms (Windows, Mac, Unix).
For dictionaries with plain textual content behind them, there is little that
they can provide in the way of output but an on-line reflection of a printed
page. In contrast, XML allows definition of the precise semantics of the
dictionary content, while leaving unspecified its form of presentation to the
user. We exploit this flexibility in our application, by having the program
mediate between the lexical data and the user. The interface can select from and
choose how to present information, in ways customised to a user's preferences
and abilities.
One dimension is that as well as the definitions of words, users frequently want
to know their relationships to other words, and the patterning in these
relationships. Kirrkirr provides a color-coded network display of semantic links
between words, which can be explored, manipulated and customised interactively
by the user (Jansz et al. 1999) using the animated graph-drawing techniques of
(Eades et al. 1998, Huang et al. 1998). In their spring algorithm, a network of
words become nodes which are held apart by gravitational repulsion, but kept
from becoming too far apart by springs which have a natural length. This graph
algorithm differs from most others by providing iterative updating of the graph
layout, which means that users can drag nodes across the screen, and the
algorithm will cause other nodes to flee out of the way, while words related to
another word are dragged along. The detailed semantic markup of the dictionary,
with many kinds of semantic links (such as synonyms, antonyms, hyponyms, and
other forms of relationships) allows us to provide a rich browsing experience.
For example, the ability to display different link types graphically as
different colors solves one of the recurring problems of the present web, with
its one type of link: users have some idea of what type of relationship there is
to another word before clicking. Thinking of the lexicon as a semantic network
with various kinds of links was a leading idea of the WordNet project (Miller et
al. 1993), but the simple text based computer interface they provide fails to do
justice to the richness of the underlying data. Others have attempted to remedy
this lack (e.g., Plumbdesign 1998), but we feel that our work is better aimed at
providing the kind of simple network display suitable for our users.
To augment traditional semantic relations in the dictionary, we provide also
linkages derived automatically from collocational analysis (of the limited
amount of online Warlpiri text), and present an interface derived from semantic
domains. These interfaces both address the notion of "terminology sets" - words
that belong together, a notion which seems particularly salient for native
speakers (Goddard and Thieberger 1997). We discuss the determination of
collocational bonds, using the method of Dunning (1993), and the limitations of
what we can do with the data available.
Formatted dictionary entries, displayed using HTML, are produced from the
underlying XML by the use of XSL stylesheets (XSL 1999). These provide
conventional hypertext for navigating between entries, in particular providing a
color-coding of different kinds of semantic relationships between words which is
consistent with that in the network display. A variety of XSL stylesheets are
provided, which can give different formatting to the dictionary content
appropriate to different users. For instance, items such as abbreviations for
parts of speech, and other grammatical notes, and detailed decompositional
definitions can be confusing for most Aboriginal users (Corris et al. 1999), and
style sheets can provide just the desired information in large easy-to-read
type.
In addition to the above, the dictionary incorporates multimedia - the user can
hear words and see appropriate pictures - and a conventional search interface.
The dictionary provides a user-friendly console where search results can be
sorted and manipulated. As well as standard keyword search, which can optionally
be restricted to appearance within a specified XML entity, the system provides
two features targeted towards two principal groups of users. Linguists often
want to search for particular sound patterns (such as certain types of consonant
clusters), and so the system allows regular expression matching for such expert
users. On the other hand, the limited literacy level of many potential users
means that they will have particular problems looking up words. In part this is
due to particular problems whereby the phonetic orthography of Warlpiri does not
match very closely to the (rather arcane) spelling rules of English in which
their literacy skills are usually based. To alleviate this problem, we have
implemented a "fuzzy spelling" algorithm which attempts to find the intended
word by using rules which capture common mistakes, sound confusions and
alternative spellings.
We have performed some preliminary trialling of the dictionary through visits by
Mim Corris to Yuendumu and Willowra, and Jane Simpson to Lajamanu. This has
involved completing dictionary tasks, and observational use with primary and
lower secondary students and trainee Warlpiri literacy workers, and comments
from teachers and other adults. In general reactions have been quite
enthusiastic, and the dictionary does appear to succeed in creating and
maintaining interest. We have received suggestions on how to make it a better
basis for classroom activities, which we hope to incorporate in future versions.
The diversity of areas researched in this work is rare relative to past work in
electronic dictionaries, which often addresses the problems of storage,
processing and visualisation/teaching as unrelated. Despite some significant
research into the construction of lexical databases that go beyond the confined
dimensions of their paper ancestors, there has been little attempt at seeing
this work through to benefiting people such as language learners, who could
truly gain from a better interface to dictionary information. Additionally, the
range of potential users here is considerably more diverse than encountered in
typical studies of dictionary usability (e.g., Atkins and Varantola 1997). For
instance, issues such as low levels of literacy are rarely touched on. Our
system has attempted to reduce the importance of knowing the written form of the
word before the application can be used, while having ample opportunities to
learn written forms. Features such as an animated, clearly laid out network of
words and their relationships, multimedia and hypertext aim at making the system
interesting and enjoyable to use. At the same time, features such as advanced
search capabilities and note-taking make the system practical as a reference
tool. Having designed the system to be highly customisable by the user, it is
also highly extensible, allowing new modules to be incorporated with relative
ease. We thus think that it is a good foundation for an electronic dictionary,
and while the focus of this research has been on Warlpiri, this research (and
the software constructed) can be easily applied to other languages.
References
B. T. S. Atkins K. Varantola. “Monitoring dictionary use.” International Journal of Lexicography. 1997. 10: 1-45.
M. Corris C. Manning S. Poetsch J. Simpson. “Using dictionaries of Australian Aboriginal
languages.” Paper presented at the Applied Linguistics Association of Australia Annual Congress, Perth.