“GODDAG: A Data Structure for Overlapping
Hierarchies”
C.
M.
Sperberg-McQueen
Computer Center University of Illinois at
Chicago
cmsmcq@acm.org
Claus
Huitfeldt
Wittgenstein Archives
University of Bergen
Claus.Huitfeldt@hit.uib.no
The success of SGML is based in part on the natural way in which SGML
documents can be regarded as linearizations of trees using a notation
essentially similar to labeled bracketing, and on the natural way in which
the tree so represented can be interpreted as a parse tree or an abstract
syntax tree for the grammar defined in the documents' document type
definition (DTD). The close ties among grammar, tree structure, and
linearization give SGML an intellectual coherence missing from some other
markup schemes.
When it comes to overlapping structures, however, these notions cannot be
applied with the same naturalness.
Therefore, a major challenge to any attempt to provide better support for
markup of overlap is to provide not only a convenient file format for
recording overlap, but also a notation for expressing constraints on
documents with overlap, and plausible data structures for representing
documents with overlap. In this paper we address the third of these
problems.
The GODDAG Structure
Consider the following simple example of a document with overlapping structures:<s/<a/ John <b/ likes /a> Mary /b>/s> The sentence John likes Mary is tagged as a sentence (s). The words John likes are tagged as an a, and the words likes Mary as a b. Because the a and the b overlap, the document has no convenient representation in SGML, and no SGML-style tree can be drawn for it. However, we can draw the containment relations among s, a, b, and the words of the text in a natural way. The result is a graph like this: The graph resembles a tree, but differs from a tree in that multiple parent nodes can contain the same child. The multiple parents of such a child are the elements which overlap in the original document; the child is the scope of the overlap. For our purposes, overlap is simply multiple parentage. Graphs like the one just shown may be constructed for any document with overlapping structures. If a document has no overlaps, the graph reduces to a tree. If a document does have overlap, the corresponding graph will preserve as many of the characteristics of the document tree as possible. Because the structure described is an acyclic directed graph in which nodes have ordered descendants, we refer to it as a General Ordered-Descendant Directed Acyclic Graph (GODDAG).