“Demonstration of TATOE: Text Analysis Tool with Object
Encoding”
Melina
Alexa
Integrated Publication and Information Systems
Institute, GMD-IPSI alexa@darmstadt.gmd.de
Lothar
Rostek
Integrated Publication and Information Systems
Institute, GMD-IPSI rostek@darmstadt.gmd.de
TATOE, a Text Analysis Tool with Object Encoding, is a support tool for
semi-automated text analysis. It has been designed and implemented at
GMD-PSI in order to assist various tasks related to the multilingual,
data-driven and multi-layered text analysis. TATOE is implemented in the
Smalltalk-based programming environment VisualWorks 2.0 (from ParcPlace).
For the data modeling we have used the Smalltalk Frame Kit (SFK), an object
oriented modeling tool which offers a spectrum of features to make model
descriptions operational (Fisher and Rostek, In Preparation).
Various processes are supported:
- 1. structuring, compiling, importing and working with one or more text corpora,
- 2. determining one or more, hierarchically or non-hierarchically structured, categorization schemata to be used for on-text mark up,
- 3. importing or re-using an already existing categorization schema, and (or) defining and structuring one's own categorization schema,
- 4. enabling the integration of automatic tagging/encoding tools for a supplementary annotation,
- 5. performing on-text annotation according to more than one categorization schema concurrently,
- 6. flexible viewing of both annotated and non-annotated text segments (this includes on the one hand selecting and arranging according to different criteria - frequency of occurrences, encoded category types, etc. - and on the other hand presenting by meaningful layout styles - fonts, colours, etc.),
- 7. calculating different statistics on the basis of the text corpora themselves, the encoded text segments and - if available - the hierarchical relations within the categorization schema, e.g. frequency of occurrence of word types, word tokens or categories or how these are distributed in the text(s)
- 8. re-using the encoded information as input to further processing by exporting it in an appropriate format, e.g. sgml.
References
Melina Alexa. “Making principled selections: A methodology for
register analysis and description for text generation.” Presented at the 22nd International Systemic-Functional Congress, Beijing, China, July 1995. : , 1995.
Dietrich Fischer Lothar Rostek. SFK: A Smalltalk Frame Kit. Technical report. : GMD/Institut fuer Integrierte Publikations- und Informationssysteme, 1996.
Michael A. K. Halliday. An Introduction to Functional Grammar. London: Edward Arnold, 1985.