“XSL - Characteristics, Status, and Potentials for Text
Processing Applications in the Humanities”
Wendell
Piez
Mulberry Technologies, USA
What XSL Is
The Extensible Style Language (XSL) is a specification currently being finalized (May 2000) by the W3 Consortium, the vendor consortium that proposes recommendations for web standards including HTML, CSS and now XML and its related technologies. XSL's immediate purpose is to support various kinds of presentation of arbitrarily marked-up documents in XML format. In an XSL system, any well-formed XML document could be formatted for print, displayed in hypertext (including on the web), or presented in other media, more easily and more effectively than is currently the case, and in a standards-based way. In a networked environment, processing documents for display on screen could happen on either server or client. In order to support its task of presenting XML (that is, applying to an arbitrary tag set a formatting description for a user interface such as screen or printer), XSL evidently has to provide for granular access to markup structures, so as to be able, for example, to derive tables of contents, text for running heads, indexes, and other common (presentational) expressions of underlying document architecture. In the course of working out XSL it became increasingly clear that (as is often the case with computer data processing problems) this problem was more easily, and more powerfully, addressed, if it was treated as a special case of a more general capability, namely the "transformation" of one markup structure into another. Accordingly, XSL is formally divided into two parts:- "XSL Transformations" (XSLT): on a standalone basis, provides a language to describe many of the kinds of rearrangement and filtering of markup structures that a reasonably powerful XML presentation language requires.
- "XSL Formatting objects" (XSLFO): provides a vocabulary for describing, in a standard and abstract way, formatting of text for visual display in print or on screen (and possibly for alternative media presentation).
XSLT's Capabilities
- -Presentational XSLT
XSLT is already used to convert XML into HTML. In this, it is a ready alternative to a scripting approach (Perl, Omnimark etc.) or to the ISO standard DSSSL - and easier to learn than either. It also compares favorably in price: tools for XSLT conversions are free. - -Analytic XSLT
XSL processing is dependent on markup in the source text for navigation as opposed to (say) character offsets or line numbers. While very good at presenting information encoded in markup, it is not good at recognizing or construing implicit information such as character patterns. It does no tokenizing, hence cannot recognize "word" boundaries. By default, string processing and matching in XSLT is case-sensitive, and cannot readily be configured otherwise.
- 1. It leverages investments made in
markup:
Many repositories have XML texts, or texts readily convertible into XML. These are all ready for XSL processing, and can be enhanced to support more sophisticated processing. - 2. It produces "publishable" results as a natural work product: Since the end result of an XSL transformation can be HTML or an XML format ready for further processing, it is easy to generate results in a form that can be displayed as is.
- 3. An investment in XSL is worth making for other reasons: Since XSLT processors are so inexpensive (free), the real investment is in time to learn it. And XSLT is so portable and versatile, it pays off this investment in expertise fairly quickly.
- 4. It can be combined with other methods: An XSLT stylesheet can also be used to prepare XML texts for other kinds of work. An XSLT stylesheet can generate COCOA encoding from XML, that can be used to support TACT or another tool that takes advantage of COCOA markup of events in a text stream (such as chapter breaks or shifts in narrative voice). [An XSL stylesheet that creates COCOA markup from an XML TEI source can be demonstrated.]
Role Of XSL/XSLT In The Future
- Possibilities for XSL extension:
The XSL specification also provides allowance for its extension. Extension functions, in Java or an alternative scripting language, could be made available to an XSL processor. Tokenizing functions, sophisticated string processing and matching, database-integration services (for retrieving data such as morphological variants or checking values against an authority list) could all be addressable, given a good API, from within XSL stylesheets. It is unlikely, however, that such extensions (at least, those especially suited for the types of analysis academic humanists are interested in) would be developed in the private sector - not that they would be without profitable application there. But academic researchers, with clear focus on their own functional requirements, have to lead the way.-An XSL browser as "analytical engine":
XSL's potentials in these respects suggest that it could play a role in the markup-aware "analytical engine" that many of us keep envisioning (cf. the ELTA initiative). An XML browser that supported XSL stylesheets could be integrated with an editing environment allowing on-the-fly emendation of the stylesheets, and/or the extension functions they call. Stylesheets and function libraries could be pulled "off the shelf," or written especially to address local problems and questions. Specialized functions would have the capability of integrating XSL's presentation/analytical capabilities with other tools such as databases or network applications. Not only would such a system be very versatile; also, in it, research results could take the form of ready-made publishable material, in HTML or any other markup-based form. Since it would basically be an XML web browser, it could also be readily networked, especially as concerns the XML source text (the text under analysis), which could be located anywhere on the Internet. Analytical stylesheets in XSL would be portable and applicable to any text that conformed to the same (sufficiently constrained) document model.Present Advantages [as of the end of 1999]
-XSL tools are freely available:
As of this writing, free XSLT processors are available in Java, and are not difficult to set up and run. Learning the stylesheet language itself is the biggest barrier to entry, and there are free and inexpensive resources for this as well.-XSL is easy to get going with:
By design, XSL is a declarative language, abstracted at a fairly high level. As a result, it is not difficult to learn, at least for most ordinary operations, and is very portable (making it easier to learn from others' work).Present Disadvantages [as of the end of 1999]
-XSL is somewhat arcane:
Although the rudiments of XSL are not difficult, some users take to
it less easily than others. It is a "functional" and "declarative"
language unlike most scripting languages, so expertise in other
computer languages is not readily applicable to it. Naïve users seem
to have less trouble learning it than experts. The model of the text
on which it operates, the "document tree," although it leverages
document markup in a very simple and powerful way, is not a
self-evident approach to developers used to looking at text as a
stream of characters.
-XSL processing is XML-based; requires well-formed XML to start:
Obviously, XSL requires an XML text to operate on. Either this is a problem, or it isn't.-Tools are rudimentary (although improving):
Strong support for internationalization, for example, is envisioned by the specification but not yet widely implemented in interfaces or tools.
As mentioned above, it is unlikely that the private sector would, on
its own initiative, develop function libraries that would provide
for all the kinds of functions wanted by scholars in the Humanities.
(Some, like support for sorting texts in major European and Asian
languages, can be hoped for, although not necessarily for free.)
Conclusions
-What XSL will be good for:
Presentation, filtering/rearrangement, markup-based processing such as indexing supported by markup. Some kinds of validation. Especially extended or in combination with other methods, XSL will also be capable of supporting sophisticated analytical functions on text marked up in XML.-What the emergence of XSL tells us about our markup projects:
- 1. the up-front investment in the text (editorial work) remains the most difficult, interesting and important phase of work. Much or most further processing "down stream," and the types of processing possible, are directly dependent on the features of the text represented through its markup.
- 2. investments in valid SGML/XML formats are demonstrating their resilience through readiness for new applications