“TAPoR Tools: Portal text analysis tools and other
primitives”
Geoffrey
Rockwell
McMaster University
grockwel@mcmaster.ca
Lian
Yan
McMaster University
lyan@mcmaster.ca
Stéfan
Sinclair
University of Alberta
Stefan.Sinclair@ualberta.ca
This poster will demonstrate a collection of text processing tools designed
to work through a portal over the Web. The tools are designed to work on
plain text, html or xml encoded e-texts. They are easily used to search
electronic texts without the need to install software, preprocess the texts,
or master complex tools.
HOW DO TAPOR.TOOLS (T.TOOLS) WORK?
T.tools are written in Ruby, an object-oriented scripting language like Perl and Python.° The T.tools are written so that they can be run on the command line or as CGI programs off our portal. This means that users of the tools need not install or maintain them, but if they wish, advanced users can download and adapt them. Using Web forms as an interface to the tools gives T.tools the capacity to be easily adapted to hide complexity or to provide local adaptations. Simple search and concordance forms can be created that can utilize xml markup without having to change the tools. Small scale publishers of electronic texts can provide T.tool Web forms that process their Web accessible e-texts without installing the software. The forms simply pass the URL for the text in a hidden field to the appropriate tool residing on our portal for processing. (We will demonstrate the adaptation of these tools to support the Hyperliste project, a collection of French medieval poetry online.)WHAT ARE PORTAL TOOLS?
A portal is an entry point into a field.° In this case the T.tools are written to provide simple text processing tools for TAPoR (Text Analysis Portal for Research), a multi-institutional project which has Canada Foundation for Innovation funding to create a portal for text analysis.°They are designed to provide a suite of simple text transformations that will eventually be managed by a portal environment that additionally provides user and interface customization tools. At present they can do the following:- List and count words in a text.
- List and count elements in an xml text.
- List attributes and values in an xml text.
- Extract elements from an xml text.
- Find patterns (words or phrases) in a text.
- Find patterns in specific elements in a text.
- Create a concordance of found patterns or elements.
- Output results in either html for reading or xml for further processing.