Abstract
Is hierarchical XML apt for the encoding of complex manuscript materials? Some
scholars have argued that texts are non-hierarchical entities and that XML
therefore is inadequate. This paper argues that the nature of text is such that
it supports both hierarchical and non-hierarchical representations. The paper
distinguishes (1) texts from documents and document carriers, (2) writing from
"texting", (3) actions that can be performed by one agent only from actions that
require at least two agents to come about (“shared actions”), (4) finite
actions from potentially infinitely ongoing actions. Texts are described as
potentially infinitely ongoing shared actions which are co-produced by author
and reader agents. This makes texts into entities that are more akin to events
than to objects or properties, and shows, moreover, that texts are dependent on
human understanding and thus mind-dependent entities. One consequence from this
is that text encoding needs to be recognized as an act participating in texting
which in turn makes hierarchical XML as apt a markup for “text
representation”, or rather: for texting, as non-hierarchical markup. The
encoding practices of the Bergen Wittgenstein Archives (WAB) serve as the main
touchstone for my discussion.
1. Introduction
Amongst the many theoretical questions about text, there is a philosophical, or
more specifically, an ontological question. The most general form of this
question is perhaps ‘What
is text?’. In Digital Humanities, the
issue has partly been focused around the question whether texts are hierarchical
or rather non-hierarchical structures. Examples of this discussion include the
statement that “text is best represented as an ordered
hierarchy of content object (OHCO), because that is what text really
is”
[
DeRose et al. 1990, 3], or, in opposition to it, the
statement that “humanists are trying to represent what
they all agree are non-hierarchical structures”
[
Schmidt 2010, 344]. Conflicting lessons for text
encoding have been drawn from these two opposed approaches to the question. Some
have concluded that the hierarchical markup grammar XML can be regarded as
adequate for text encoding, because that is in their view what texts basically
are, viz., hierarchical entities. Others reject XML and embedded markup more
generally as inadequate, precisely on the basis of the view that texts have, or
at least can have, non-hierarchical structures. In this paper, I want to argue
that both conclusions have been drawn prematurely due to an erroneous approach
to the ontological question about text.
I shall start with presenting a brief example of philosophical authorship from
the last century. A reflection on the editorial history of this example and
other writings from the same authorship will lead us to a view into editorial
philology and, in particular, digital editorial philology. It is in
this digital context that the question about the ontological nature of text and
its consequences for text encoding have most forcefully been asked. I shall
attempt to demonstrate that a philosophical reflection on the hermeneutical
nature of our text practices not only helps to understand better the question
about the ontology of texts, but also to dispel the idea that the nature of text
would as such, i.e. as independent from our text practices, dictate either
hierarchical or non-hierarchical markup.
2. Writing
In July 1931, a philosopher in Cambridge reads Augustine’s Confessiones. Augustine’s account of how he learned to speak as a
child makes a strong impression on him. Our philosopher reads the account at a
time when he is struggling with theoretical questions about language and
meaning. He therefore is very sensitive to anything that even remotely deals
with these things. Augustine’s description seems generally fair and
representative of how we think language acquisition works. But our philosopher
gets puzzled about a few sentences. Perhaps he draws a line in the margin of the
book, to highlight the passages that he finds perplexing. Later, then, he notes
down his thoughts in a notebook, recording what he believes was right and what
he believes was wrong with the account. Later still he returns to these notes,
and develops his ideas into a longer discussion. He develops an entire argument
around Augustine’s account. He regards his discussion of Augustine as a way of
becoming clearer about his own thought concerning linguistic meaning, and about
the role that humans play in establishing the relation between
words and objects. The exact intentions behind Augustine’s original account are
of less importance to him now.
He has the discussion of Augustine, together with many other notes and remarks,
typed. The resulting typescript he then cuts into paper slips. The slips are of
varying sizes: some contain one remark or even only part of a remark, others
contain series of remarks or also an entire page. He collects the slips together
with cuttings from other typescripts. Next, he reorganizes the contents of his
collection. He inserts additional sheets, with handwritten titles for chapters
and subchapters. Soon he has this new arrangement of his remarks typed again,
hereby producing a large new typescript. It contains more than
4,000 remarks — he calls them “Bemerkungen”. The Bemerkungen
are typically separated from each other by one or more blank lines. The
typescript looks much like an advanced book manuscript. But soon our philosopher
starts to make changes, namely adding, deleting, rearranging and revising
remarks and sentences in it. In many places, he adds alternative phrasings. Some
parts of the typescript he goes through more than once, making changes in
pencil, black ink and red ink. The amount of changes and revisions grows larger
and larger. The changes now begin to also extend into parallel notebooks and
other writing books. Entire new sections are added, some in the margins of the
typed pages, others on the typescript’s verso pages, and yet others in separate
notebooks.
About a year later he begins considering the idea of making his discussion of
Augustine the beginning of a new book in philosophy, to appear in a parallel
German-English edition. About fifteen years earlier he had published the Logisch-philosophische Abhandlung; this book had given
him some status. He now produces a concise summary of his argument about
Augustine, making it the beginning of a discussion about small and well-defined
samples of language use. These he calls “language
games” (Sprachspiele), and he intends to
make the idea of language games the backbone of his entire new book project.
Eventually, after ten more years’ hard work of producing many new
Bemerkungen and revising, rearranging, adding and
deleting, he has yet another typescript produced that looks ready for the press.
The typescript even includes a title (
Philosophische
Untersuchungen), a motto and a preface. However, in the remaining
five years of his life, our philosopher can never bring himself to finish the
work for publication. Two years after his death, in 1953, his friends finally
edit and publish it with a parallel translation in English.
[1]
3. Scholarship
So far I have done nothing but portray a real example of philosophical authorship
from the twentieth century. Note that I did not use the word “text” even
once. I could have used the word, but I need not have used it. In some places, I
could have said “text” instead of “remark” or, in others, instead of
“discussion” or “argument” or “book” or “work”. But in
all these cases it would have been replaceable. Our author may
sometimes have asked himself: Which text should I choose here? Will I ever
finish my text? Is my text good enough? Etc. However, again, the occurrences of
the expression “text” in such questions are replaceable with words such as
“manuscript”, “book”, “phrasing”, “sequence” or
“version”. Regarding the notion of text, our real case example did not
seem to pose any special theoretical difficulties. Most importantly, the
ontological question of what text is, clearly need
not have bothered our author. For the author the notion of text
need not be problematic at all. An author may just write, delete, rearrange,
rewrite, compose, and so on. Neither did I have to be bothered by
the notion when telling the story of the example. So, if there is a specific
philosophical, ontological issue about text, where does it come in?
Considering the further development of our philosopher’s story may help to find
the answer to our question. Let us first try to locate, with the help of our
narrative, the points at which text can become a theoretical issue of
any sort. As this particular tale goes, before he dies, our
philosopher appoints three friends who are to manage the publication of his
writings. The three find themselves confronted with a huge mass of pages (which
they first have to collect from different places), some handwritten, some typed,
some bound in notebooks, some on loose sheets, some in orderly dossiers. This is
now standardly referred to as our philosopher’s “Nachlass”. For some of the
books and pieces that they decide to edit from this Nachlass, they are able to use neat enough typescripts — for most
of the publications, however, they have to make selections and combinations on
both large and small scales, and need to do some substantial editing. They have
to decide what to choose for publication; which version to use; how to arrange
it; how many and which of our author’s variant phrasings to include; whether to
use also variants added in other manuscripts; whether to obey all his
instructions or only those that they find conducive; whether always to omit what
our philosopher himself had deleted; whether to stick to at least some of his
idiosyncratic style and punctuation; whether, and how much, to expand on his
elliptic references to either his own ideas or also the ideas and works of
others; how much to bother the reader with information about the character of
the original Nachlass source; etc. etc.
If not for our author, text now seems to have become an issue at least for his
editors, or for any editors of a Nachlass such as
Wittgenstein’s. In the processes of editorial decision-making, such editors will
often refer to precisely this thing, the text, and find themselves confronted
with issues of so-called “textual criticism”. We can imagine
them discussing and debating these issues both amongst themselves and with the
users of their editions. Both the editors and their critics will argue for their
respective positions by reference to what they call “the text”; and this
invocation of the text, while the word itself often seems to refer to different
things for the different sides, always seems to lend their respective
standpoints and arguments strength and significance. Surely, for many of the
arising disputes, the expression “text” will again be replaceable by some
other words, e.g. “source”. In several cases, however, the expression
clearly carries something which is not contained in those other expressions,
something like the marker of a norm or standard, or of the right interpretation,
and “the text” is precisely the expression to be used.
Let us complete the story with some perceptions and questions from the readers’
side. The Wittgenstein readers asked: Have we received all the text, or are
parts missing? Is the text displayed in the correct sequence? Does it contain
transcription errors? Does the edition maybe mislead me to adopt a wrong
interpretation? Have I been given the right text? Have I been given the text as
it was intended by the author? To what extent is the text authorized by its
author Ludwig Wittgenstein? Does the edited text correspond to the original? The
“textual” situation in the Wittgenstein Nachlass itself is often far from clear. The edited text could be
something that physically never existed before, or no longer existed — thus, was
something that had to be (re)constructed. Or, the editors came at different
times to different conclusions about what “the text” was referring to, and
for a few items different editions, as also different translations, were
produced. Readers would again ask: Which is the text / translation I should use
for my interpretation?
Now, it is true that these questions and issues bring us closer to theoretical
discourse about text. But none of them necessarily brings us to the
ontological question about
what text is. Moreover,
there are disciplines that not only treat these questions and issues, but also
provide answers and solutions to them. I think here in particular of editorial
philology, and, of course, especially digital editorial philology. In the
following, I will first stress that digital editorial philology provides
solutions to the above-mentioned issues and questions. But in this context we
will notice that the very same disciplines that provide the solutions, also in
fact seem to give rise to our ontological question about text.
[2]
4. Digital scholarship
Methods of textual criticism have been developed for many purposes, including for
finding solutions to exactly the kind of issues and questions brought up in the
previous section. Twentieth-century textual criticism has improved these methods
further through the application of digital techniques. For instance, while the
practice of producing editions comprising both facsimiles and transcriptions,
ranging from ultra-diplomatic to so-called “students” versions, has already
existed in the pre-digital age, the introduction of digital techniques has made
producing such editions easier, cheaper and more efficient. But the digital
medium has not only provided improved ways of implementing solutions that had
already existed before — it has furthermore brought
new solutions
and possibilities. XML-based user-steered or “interactive dynamic presentation”
[
Pichler and Bruvik 2014, 181] of online text
archives is
entirely new, that is, a genuine achievement of digital
editorial philology, and it offers something that had not been possible
before.
Many of the innovations are due to the discipline of text encoding ([
TEI 2007]; [
Hockey 2000, ch.3]). Text encoding
enables us to deal with editorial challenges such as the issues from the
previous section by, first, separating representation or transcription matters
from presentation matters, and, second, serving the different interests we might
have in editing a source by explicitly addressing them through different groups
of codes ([
Huitfeldt 1994]; [
Pichler 1995]). For
example, while one group of codes may record a manuscript’s chronological
sequence, another one can take care of the physical sequence, and a third one of
a specific sequence in content. Subsequently, the three encodings can be invoked
independently of each other, or also in various combinations, just as required
by individual users’ research needs. While pre-digital book editing, if it had
not been for certain material restrictions, could have delivered some of the
same possibilities, it could never have delivered the same degree of
interaction and
transparency which characterizes
digital editorial philology that is based on text encoding. In traditional
editing, the editor and publisher decide how the source is presented, while the
user mostly tends to remain in a purely passive role. The typical user of
traditional editions merely receives what experts have prepared for her and is
rarely in a position to adequately verify the edition received. In contrast to
this, with digital editing and publishing driven by text encoding, users are no
longer dependent, so-to-say purely on the basis of good faith, on the decisions
made by editors. Instead they are now able to check editorial decisions and,
moreover, with interactive dynamic presentation tools to also complement the
experts’ editing by producing alternative filterings and presentations of the
source materials. Editorial philology today can satisfactorily address most of
the issues about text brought up in the previous section. We can now make
available all versions of a work, all variants of alternative phrasings, all
editorial interpretations of a passage — in principle all options between which
editors before had to choose due to material restrictions. The user of the
editions will still have questions: “Which of the many versions made available
to me is the one I shall use?” But this was a question also for our
Cambridge author
himself.
These achievements of digital editorial philology have become possible through
text encoding. At the same time, it is also
precisely scholars of text encoding who have forcefully
embarked on the ontological question “What is text?”.
5. Philosophy
Hierarchical vs. non-hierarchical representation
It appears that it is exactly digital editorial philology with text encoding
at its heart which has motivated the emerging, or at least the notable
reinforcement, of what I have called the ontological question about text. It
is particularly the question whether hierarchical text encoding grammars
such as XML are adequate for the transcription of manuscript source
materials that has caused considerable controversy.
[3] Opponents of the view often justify
their position by invoking a non-hierarchical conception of the nature of
text: it is the belief that texts are non-hierarchical which leads them to
conclude that hierarchical encoding or markup cannot be the correct method.
Paradigmatic cases they appeal to include complex manuscript materials
which, so their view, are fundamentally characterized by non-hierarchy or at
least multiple structures which overlap with each other. Our philosopher’s
Nachlass could be regarded as such a case
in question. Against this kind of argument, in turn, proponents of
hierarchical markup grammars — though they grant that overlap and multiple
hierarchies exist — have argued in favour of adopting the precisely opposite
conception of the nature of text, namely a chiefly hierarchical one. Thus,
the fundamental issue is no longer one about “Which is the right
text?”, but concerns the ontological nature of text.
But is the view that text is a hierarchical object [
DeRose et al. 1990] or, in opposition to it, the view that it is
a non-hierarchical object [
Schmidt 2010], justified? And if
either of the two is justified, does this lend argumentative support to a
hierarchical or a non-hierarchical approach in text encoding? In answering
this question more fully I would have to address at least the following two
sets of questions. First, can the general assumption according to which
texts are either hierarchical or non-hierarchical, put any demands on the
structure of any particular markup system? Does the fact that a particular
object of encoding is hierarchical, entail the demand that the encoding
itself be hierarchical or, if it is non-hierarchical, that the encoding be
non-hierarchical? Against the view that it does, one could argue that we
ordinarily accept that three-dimensional entities are represented in
two-dimensional structures. Similarly, we make use of hierarchical
taxonomies for domains that in fact can be regarded as non-hierarchical;
and, whilst being fully aware of the general vagueness, context-sensitivity,
ambiguity etc. of ordinary language, we nevertheless take advantage of exact
grammars, logics, strictly organized thesauri or computational ontologies
for their analysis and processing. What, then, is it that makes it
unacceptable to use hierarchical markup-languages for non-hierarchical
sources, or non-hierarchical markup-languages for hierarchical sources?
Secondly, are the assumptions that texts are either hierarchical or
non-hierarchical objects themselves justified? On what grounds, and in what
sense, can it be said that the nature of text is either of a hierarchical or
a non-hierarchical structure?
Document carriers — Documents — Texts
In this paper, I have a direct focus on the second set of questions, but will
provide at least a partial answer also to the first set of questions. Now,
to answer the question whether texts are hierarchical entities, we should
first try to find out what sort of entities texts could be on a
general level. This is after all also what Renear and
others wanted: To answer the question what text (really) is.
But this ontological question, in turn, should first bring us back to the
issue of writing. What is writing? It seems a safe thing to say
that writing is an action, and as such it should be possible to describe it
in terms of action theory. This implies the application of concepts such as
“agent”, “basic action”, “action result”, and others. I
would like to suggest the following characteristics of writing:
- First: Writing is, at least in terms of its physical movements, a
basic action [Danto 1963, 435f] and thus not caused
by other actions.
- Second: Writing produces a finite action result, the written. The
written is writing’s intended result; we call it document.
- Third: Writing does not need more than one agent.
It seems important to appreciate the fact that producing documents, writing,
is not the same as producing texts, and thus, to distinguish the action of
producing documents from the action of producing texts. One important
difference is that producing texts is producing documents with meaning, as
we normally do when we write, or also furnishing documents with meaning, as
we do when we read with understanding. Writing on the other hand does not
need to produce meaningful documents and can also be
performed by machines. Reading as such can equally be performed by machines
(namely “reading machines”), but not reading
with
understanding.
[4] I would now like to introduce for the rest of
this paper the technical term “texting” for the action of producing
texts. Let us look at some more differences between writing and texting in
terms of action theory:
- First, texting is not a basic action but is co-caused by two other
actions, writing and reading. (Or: If you look at the matter as one of
spoken communication, the two actions that co-cause texting are speaking
and hearing.)
- Secondly (and consequently), while the action of writing can be
performed by only one agent, it seems then clear that
texting is performed by more than one agent. One agent is
the author, another is the understanding reader (naturally, the author
and the reader can coincide in one and the same person). Consequently,
texting is, unlike writing can be, not under the sole control of the
author alone. Rather, texting evolves through actions that are shared
among a multitude of agents. Therefore, when attempting to adequately
describe texting, it is vital to include not only the author agent, but
also the reader agent.
- Third, while writing produces a finite and rather stable result
(namely documents), texting does not; rather it produces an instable and
potentially continuously ongoing, endless and open-ended result. Writing
has a clearly determinable beginning and end in time. Texting can have a
clearly determinable beginning in time, coinciding with the beginning of
the action of writing with understanding, but it does not have a clearly
determinable end. Now, ontologically speaking: What sort of entities
exactly are then the results of texting, namely
texts?
If we start from a widely accepted tripartite division of what exists into
objects, properties and events, it seems to make perfect sense to think of
written documents, the products from writing, as
objects.
Equally it seems to make perfect sense to conceive of the carriers of
written documents — paper, trees, stone, pergament etc. — as objects. More
specifically, documents and document carriers are
concrete,
material objects. But does the same hold true of texts? Very
often the expression “text” is used to mean the same as
“document”. However, it is important to note that “text” often
also denotes something very different from a document, and that the
conditions of identity in the case of text in this sense are not the same as
the conditions of identity for documents. This applies for example when we
say “The work exists in many drafts and different versions” (one text,
many documents), or to any ambiguous sentence, e.g. “John went to the
bank,” as well as cases of homonymy and polysemy (one document, many
texts). Texts in this sense clearly cannot be concrete objects.
[5] Some have suggested that texts
are abstract objects (e.g. Renear in [
Hockey et al. 1999]; [
Huitfeldt et al. 2012]). But there are also some factors which
speak against this view, be text now conceived as an abstract object in the
sense of a
type or as an abstract object in the sense of being
an
immaterial object.
[6] Consequently, though both “document” and “text”
are nouns, and many nouns denote objects, it may be that “text” does
not denote an object — or that, to speak with Wittgenstein, the “surface
grammar” of “text” misleads us into believing that it denotes an
object [
Wittgenstein 2009, §664]).
Some of the arguments which speak against the view that texts are some kind
of abstract object, are the same arguments which actually support the view
that texts may be
events. To classify texts as events rather
than as abstract objects or a property will at first seem a strange thing to
say, but it is merely so because we are used to think of texts in analogy to
documents, or even document
carriers: manuscripts, books,
sheets of paper, computer screens etc. which all belong to the domain of
objects rather than events. One of the aspects which speak
in favour of the event view is that a text at no single (non-durative) point
in time seems to be present in its
entirety — which is a
characteristic of events [
Kanzian 2015, 897]. A
consequence from the event view of text is that the locus of a text is
temporally and spatially distributed: As any event’s locus is the locus of
its
bearers, so must then also a text’s locus be the locus of
its bearers. The text bearers cannot however only be books or computer
screens; these, considered by themselves, are document rather than text
bearers. If the event view of text is correct, then not only the document
itself must be regarded a text bearer, but also the author and the
understanding reader. Thus, the
text event will need to be seen
as taking place exactly in the geographically and chronologically dispersed
interplay between authors, documents and readers. This fits very well with
our observation above, namely that texts are shared among and coproduced by
authors and readers. One advantage from the event view of texts seems to be
that it does, ontologically speaking, not demand more than the following
ontologically rather uncontroversial entities: as bearers of the event the
concrete object document, the concrete object author, and the concrete
object reader, and as event proper the action of (understanding)
reading.
This implies that it not only makes sense to conceive of texts as events, but
indeed events of a special kind, namely
actions. Thus, texts
not only seem to be produced by actions — they seem themselves to be
actions. Within the group of actions, texts can then further be
characterized by being actions which are co-produced by authors and readers,
thus
shared actions.
[7]
In the last couple of paragraphs I have proposed a way of looking at the
ontological nature of text which recognizes text as event rather than
object, and within the category of event as action, and within the category
of action, as shared action. But independent of whether the reader wants to
follow me in my proposal to conceive of texts as actions that are
co-produced by authors and readers, or rather wants to perceive of texts as
abstract objects, or as properties of some kind — the reader will still be
able to go along with me in the view that text is something which cannot
exist without being sustained by an act of reading with understanding. A
text that loses the understanding reader will fall back on pure document
level and cease to exist
as a text. This aspect of the relation
between document and text can be compared to the relation between music
score and music: There is no music unless the music score is played (played
at least in one’s mind). Naturally, the document can continue to exist even
when the text ceases or pauses its existence. But the
text is
for its existence mind-dependent on the reader agent. A paused text can
resume its existence as soon as the document is processed again in its
significatory potential — in short: read with understanding by a
reader.
[8] This position at
least, I hope, should not be controversial, at least if one agrees with the
principle that signs have meaning because they are furnished with meaning by
humans, and that reading with understanding is thus meaning and structure
constituting rather than merely meaning and structure
depicting — a principle that is treasured by hermeneutics [
Gadamer 1960].
[9] However, the view that reading a document with
understanding is constitutive for the meaning of this document, has then
also consequences for our conception of what is going on in text
encoding.
Text encoding
Text encoding can record data about the document carrier, the document as
well as the text. Saying that the source is a notebook or a typescript or
that it is written in ink or pencil, pertains to the first; recording which
words it contains or which letters are deleted and which are added, pertains
to the second; talking about the document’s meaning and stating that there
are implicit references and allusions to a work by another author in the
document, pertains to the third.
[10] On whatever level text encoding moves, it will
always also record data about the encoder’s engagement with the source. This
becomes particularly clear where it aims at recording the
text
and thus moves on the third level. However, already on the level of
recording data about the document carrier, the encoding attributes structure
to the source rather than simply depicting a pre-existing structure (D.R.
Raymond in [
Biggs and Huitfeldt 1997, 358]). In the
language of the above suggested event conception of text one could say that
the encoder becomes herself inevitably one of the
bearers of
the text.
What are then the implications of our philosophical investigation for our
question whether hierarchical or rather
non-hierarchical markup
is appropriate for the encoding of texts? I think the main implication is,
to make a long story short, that both are equally appropriate. For,
following the present argument, what we encode are as much our own
signifying text actions as the source (the source “as such”, as one is
tempted to say). Transcription is, with Sahle’s words, “a protocol of perception, mapping and interpretation”
[
Sahle 2015]. Whether the text itself will be hierarchical or
non-hierarchical will therefore depend on
us as encoders.
Therefore, both the position holding that markup is to be hierarchical
because text itself is hierarchical and the opposed view, can be seen to be
in one sense correct, but wrong in another. Both seem to draw their
consequences for text encoding on an — at least ontologically — unfounded
basis. They are making it sound as though the question would be essentially
a matter of finding out which is the right representation of a pre-given
structure of text. But “hierarchical” or “non-hierarchical”
describe aspects of our active engagement with the source and therefore
concern the nature of our own
actions rather than the nature of
independent entities. “‘An ‘OHCO structure’
is’, as Dino Buzzetti says, ‘not a model of the text,
but a possible model of its expression’”
[
Buzzetti 2002, 71]. The OHCO view of text
could thus be rephrased to: “Text is a hierarchical
ordering
of content objects”. According to Desmond Schmidt, complex manuscript
variant structures pose overwhelming challenges for hierarchical markup, and
consequently form a primary case for the
non-hierarchical
approach (as also for non-embedded markup). However, text variants are, on
the background of the argument proposed here, not independent entities that
put insurmountable constraints on our mapping acts either. What makes up a
text variant is namely already co-constituted by our reading and mapping of
the source. With Wittgenstein we could thus say that
both sides
of the debate mix sign talk and symbol talk, and that the primary field of
text encoding belongs to the realm of symbols rather than that of signs. A
symbol is the sign
with meaning: the sign as
symbolized
[
Wittgenstein 1963, 3.32]. Whether to encode a source in
hierarchical or non-hierarchical ways is a question of how to map —
symbolize — the signs of the source.
What could, or rather: what
should then bring us to encode
hierarchically rather than non-hierarchically, or the other way around? In
the end, it can only be our scholarly interests and needs. If we are
interested in encoding
document structures, then it may be
important to record what we regard as overlapping structures, e.g.
overlapping structures at the cross points between sentence or paragraph
units on the one hand and page units on the other, through non-hierarchical
encoding, or even standoff markup. If we are interested in encoding the
sequence of (as such: genetically linear)
writing acts, a
markup system permitting for recording the points where these writing acts’
manifestations cross, equally may be the thing to choose. But even in these
cases, practicing one of the TEI’s recommendations for handling overlap
through hierarchical XML may be equally in place.
[11]
In any way, it seems problematic to hold that it is the text’s nature, as
something independent of us, which requires overlap markup. It is rather the
nature of our
representation of the source which requires
hierarchical or non-hierarchical markup, thus something which is under our,
not the source’s control.
If this wasn’t true, and consequently: if it wasn’t true that we can
adequately transcribe complex primary sources in hierarchical XML, it would
be quite mysterious why so many projects manage to encode and edit intricate
and multifaceted, so-called overlapping and non-hierarchical handwritten
materials with hierarchical XML. They do so in an effective manner, living
up to the (still evolving) standards for digital scholarly editions. One
example is editorial work on the Wittgenstein
Nachlass by the Wittgenstein Archives at the University of
Bergen (WAB). It is the ambition of WAB’s XML transcriptions to contain an
accurate graphemic record of each single letter that Wittgenstein wrote in
the
Nachlass, and of the writing acts it was
produced by, or subjected to. This information is converted to
“diplomatic” version outputs in HTML which, in short, represent the
source on the level of its letters and the author’s writing acts. At the
same time, our XML transcriptions also permit to produce “linearized”
and “normalized” versions, and make yet other, strongly user-steered
outputs produced via “interactive dynamic
presentation”
[
Pichler and Bruvik 2014, 181] in the spirit of
Web 2.0 possible. A characteristic of the twenty thousand pages Wittgenstein
Nachlass is the abundance of, partly rather
complicated, text variance. Each of the around 65,000 occurrences is at WAB
XML encoded not only on letter, but also on word level,
which again makes outputs in diplomatic, linearized, normalized and other
formats possible. It is
XML that permits all this. However, at
least in my view there is nothing in the
source which requires
us to choose the hierarchical XML over a non-hierarchical approach for
achieving all this, or a non-hierarchical approach over hierarchical
XML.
[12]
6. Conclusion
The version of the comment on Augustine’s account of language acquisition that
our Cambridge philosopher, Ludwig Wittgenstein, eventually ended up with,
includes the following passage:
In this picture of
language we find the roots of the following idea: Every word has a
meaning. This meaning is correlated with the word. It is the object for
which the word stands.
[Wittgenstein 2009, §1]
An analogous observation
can be made about the debate on hierarchical vs. non-hierarchical markup. The
way in which this debate is largely conducted suggests that the central issue
concerns the accurate representation of some mind- and action-independent
reality. It is assumed that, if texts are hierarchical, the correct depiction
must be hierarchical; if they are non-hierarchical, the correct depiction must
be non-hierarchical. According to this picture, text encoding is an act of
correlating codes with objects and structures
in and of themselves.
But any text action including text encoding is a creative symbolizing action
and, thus, already in the realm of symbols. This is nothing out of the ordinary;
it is simply what meaningfully engaging with the world looks like on an everyday
basis; it is what each of us does all the time, without running into any
theoretical difficulties. Moreover, though one sometimes can hear that the need
to escape relativism and to produce encodings and editions that will benefit
others (including future generations) requires strict avoidance of
interpretation in the domain of encoding, it needs being said that the way of
looking at things proposed here does not entail any support to relativism.
Rather than worrying about relativism, we simply have to ensure — and all the
time work to ensure! — that there is sufficient agreement in our
interpretations. Successful communication is not dependent on there being
non-interpreted facts, but on there being shared interpretations (or rather,
more generally, shared understandings). The TEI substantially helps with
that.
The issues from Section 3, as we are now in a position to appreciate, are not to
be regarded as pre-given. Rather, as much as they concern the sources to be
edited, studied, translated etc., they equally concern ourselves: as authors,
editors, readers and scholars, with our preferences, intentions, and the
purposes of our actions. A simple question such as “Should the edition follow
the physical, chronological or content order of the written?” is as much
about what we want to do with the source as about the source “as
such”. Questions of this kind ask for an engaged action.
Through websites such as WAB’s “Interactive Dynamic Presentation” platform
this aspect is put to the fore, and the fact that actions are required, is, at
least exemplarily, made explicit. Users of the WAB site can utilize XML
transcriptions and XSLT tools as basis for creating text following their own
editorial choices. The resulting texts will be shared actions, co-produced by at
least the following agents: Ludwig Wittgenstein, WAB’s transcribers and editors,
the software authors, the interacting users. The ways in which we talk and argue
about text manifests that texts originate in and are carried by understanding
and acting human subjects. Texts are mappings of signs onto symbols. Thus, when
discussing which of the texts emerging from a rich and complex Nachlass to choose, or what to identify as a
“work” in it, etc., we are discussing, first, how to best map this
Nachlass’ significatory potential onto symbols
and, secondly, which of the symbolizations to give preference to. If it is true
that texts are actions, then it therefore lies in the nature of text-talk that
it can be evaluative and normative. For it lies in the nature of talk about
actions that it can be evaluative and normative. With the later Wittgenstein, we
might say that scholarly talk about “text” typically exhibits a
normative grammar. This explains why the editorial issues
described in Section 3 indeed are issues.
In this paper, I have tried to show that the debate about hierarchical vs.
non-hierarchical markup can be resolved by a reflection on the “depth grammar”
[
Wittgenstein 2009, §664] or “logical grammar”
[
Wittgenstein 1963, 3.325] of “text”. This
grammar is, due to texts’ specific ontological nature, categorially different
from the grammar of “document”. Texts in the sense in which they are
different from documents are ontologically difficult entities and may, as I
tried to argue for here, not be objects. Writing alone does not produce texts,
but documents. It seems however a fact that texts are in their existence
dependent on human understanding, and that it is the meaning and structure
constituting aspects of document understanding which at the same time make texts
something under
our command and responsibility. Therefore, text
encoding is no passive depiction but co-constitutes its subject: It never
records the mind-independent state of the source alone; rather, it always also
records its own actions of recording, its specific representation of the source.
Naturally, this goes also for WAB’s own XML transcriptions of the Wittgenstein
Nachlass: They are no understanding-free
depictions of the source, but already the results from precisely acts of
understanding. The point that texts, and also transcriptions, result from acts
of understanding, does however, as I have tried to explain, not need to involve
any sort of unwanted relativism. The fact that it is us as understanding
subjects that decide on the structure of texts explains in turn why XML can be
such a successful markup system also for the encoding of complex manuscript
materials as indeed it is — which it should not be if it were an independent
hierarchical or non-hierarchical structure of the source that decides on the
success or failure of our encoding. It is only when these central points are
neglected that the debate about hierarchical vs. non-hierarchical markup can
arise in the first place.
[13]
Works Cited
Augustinus 1975 Augustinus, A. De Dialectica. Trans. B. D. Jackson, from the text
newly edited by J. Pinborg. Synthese Historical Library, no. 16. Reidel,
Dordrecht (1975).
Augustinus 2013 Augustinus, A. Confessions. Trans. Fr Benignus O'Rourke O.S.A,
foreword by M. Laird. DLT Books, London (2013).
Biggs and Huitfeldt 1997 Biggs, M. and
Huitfeldt, C. “Philosophy and Electronic Publishing”,
The Monist. Interactive Issue, 80/3 (1997):
348-367.
Buzzetti 2002 Buzzetti, D. “Digital Representation and the Text Model”. New
Literary History, 33/1 (2002): 61-88.
Danto 1963 Danto, Arthur C. “What We Can Do”, Journal of Philosophy,
60 (1963): 435-445.
DeRose et al. 1990 DeRose, St.J., Durand,
D.G., Mylonas, E. and Renear A. “What Is Text,
Really?”, Journal of Computing in Higher
Education, 1/2 (1990): 3-26.
Gabler 2012 Gabler, H.W. “Wider die Autorzentriertheit in der Edition”, Jahrbuch des Freien Deutschen Hochstifts (2012): 316-342.
Gadamer 1960 Gadamer, H.-G. Wahrheit und Methode. J.C.B. Mohr, Tübingen (1960).
Hintikka 1991 Hintikka, J. “An impatient man and his papers”, Synthese, 87/2 (1991): 183-201
Hockey 2000 Hockey, S. Electronic Texts in the Humanities: Principles and Practice. Oxford
(2000).
Hockey et al. 1999 Hockey, S., Renear, A. and
McGann J.J. “Panel: What is text? A debate on the
philosophical and epistemological nature of text in the light of humanities
computing research”. In
Annual joint meeting of
the Association for Computers and the Humanities (ACH) and the Association
for Literary and Linguistic Computing (ALLC), University of
Virginia, Charlottesville (1999):
http://www2.iath.virginia.edu/ach-allc.99/proceedings/hockey-renear2.html
http://www2.iath.virginia.edu/ach-allc.99/proceedings/hockey-renear2.html Huitfeldt 1994 Huitfeldt, C. “Multi-Dimensional Texts in a One-Dimensional Medium”,
Computers and the Humanities, 28 (1994):
235-241.
IFLA 1998 International Federation of Library
Associations (IFLA). Functional Requirements for
Bibliographic Records: Final Report. UBCIM Publications-New Series
19, München (1998).
Kanzian 2015 Kanzian, Chr. “Kunstwerke als Artefakte”, Metafísica:
Problemas Contemporâneos, 71 (2015): 895-912.
Pichler 1995 Pichler, A. “Transcriptions, Texts and Interpretation”. In K.S. Johannessen and
T. Nordenstam (eds.), Culture and Value. Beiträge des 18.
Internationalen Wittgenstein Symposiums. 13.-20. August 1995,
Kirchberg am Wechsel (1995), 690-695.
Pichler and Bruvik 2014 Pichler, A. and
Bruvik, T.M. “Digital Critical Editing: Separating Encoding
from Presentation”. In D. Apollon, C. Bélisle and Ph. Régnier (eds.),
Digital Critical Editions, Urbana Champaign
(2014), 179-202.
Renear and Dubin 2007 Renear, A., Dubin,
D. “Three of the Four FRBR Group 1 Entity Types are Roles,
not Types”. In A. Grove (eds.), Proceedings of
the 70th Annual Meeting of the American Society for Information Science and
Technology (ASIST), Milwaukee (2007).
Sahle 2013 Sahle, P.
Digitale
Editionsformen. Zum Umgang mit der Überlieferung unter den Bedingungen des
Medienwandels. BoD, Norderstedt (2013):
http://kups.ub.uni-koeln.de/5353/
http://kups.ub.uni-koeln.de/5353/.
Sahle 2015 Sahle, P. “Traditions of Scholarly Editing and the Media Shift”. Presentation
at International Seminar on Digital Humanities: Scholarly Editing and the Media
Shift – Procedures and Theory. Verona (2015), 8.9.2015
Schmidt 2010 Schmidt, D. “The
inadequacy of embedded markup for cultural heritage texts”, Literary & Linguistic Computing, 25/3 (2010):
337-356.
Schmidt 2012 Schmidt, D. “The
Role of Markup in the Digital Humanities”, Historical Social Research / Historische Sozialforschung, 37/3
(2012): 125-146.
TEI 2007 Text Encoding Initiative Consortium.
TEI: P5 Guidelines for Electronic Text Encoding and
Interchange. 3.0.0, March 29, 2016:
http://www.tei-c.org/Guidelines/P5/
http://www.tei-c.org/Guidelines/P5/ (2007).
Wittgenstein 1963 Wittgenstein, Ludwig.
Tractatus Logico-Philosophicus. Translated by
D. F. Pears and B. F. McGuinness. Routledge and Kegan Paul, London (1963).
Wittgenstein 2009 Wittgenstein, Ludwig.
Philosophical Investigations / Philosophische
Untersuchungen. Ed. by P. M. S. Hacker & Joachim Schulte,
transl. by G. E. M. Anscombe, P. M. S. Hacker & Joachim Schulte.
Wiley-Blackwell, Oxford (2009).
Wittgenstein 2015 Wittgenstein, Ludwig.
Wittgenstein Source Bergen Nachlass Edition
(BNE). Edited by the Wittgenstein Archives at the University of
Bergen under the direction of Alois Pichler. In: Wittgenstein Source (2009-)
[wittgensteinsource.org]. WAB, Bergen (2015-).
Wittgenstein 2016 Wittgenstein, Ludwig.
Interactive Dynamic Presentation (IDP) of Ludwig
Wittgenstein's philosophical Nachlass [http://wittgensteinonline.no/]. Ed. by the Wittgenstein
Archives at the University of Bergen under the direction of Alois Pichler. WAB,
Bergen (2016).
Wright 1982 Wright, G. H. von. The Wittgenstein Papers. In G. H. von Wright, Wittgenstein. Oxford (1982), 35-62.