“English literature, electronic text and computer
analysis: An impossible combination?”
Claire
Warwick
Department of Information Studies University
of Sheffield
c.warwick@sheffield.ac.uk
In 1991 Corns discovered that despite the potential usefulness of computational
text analysis techniques in the study of English Literature, very little work
had been published in the field which showed any evidence of their use. He hoped
that this was due a lack of knowledge on the part of more traditional literary
professionals. Knowledge is now more widespread and electronic text and analysis
tools easier to find and use. However, the application of the same method of
quantitative analysis of the research output in selected journals suggests that
computational analysis of English literary texts is no more common now than it
was eight years ago. This paper will suggest reasons for this, and argue that
the discontinuity between the way that machines and humans read prevents the
more widespread use of electronic texts by literary scholars.
Electronic text is still basically defined in terms of its content. (Renear) Thus
the tools which we have at our disposal for analysing electronic literary text
work in terms of information extraction. (eg. how many times does a word occur,
in what collocation?) Even if the text is encoded, the searches we can perform
are more complex versions of a content model. (eg how many times does Hamlet as
speaker of the word Ophelia happen as opposed to the reverse?) Computational and
corpus linguists have been able to produce a great deal of valuable work, based
on this sort of data, yet to date very little has emerged as a result of
applying computer analysis of electronic text in the field of English
literature.
Researchers who are interested in tracking cultural or historical patterns in
large amounts of data, or charting textual variants may find computational
techniques a great use. However, most scholars still believe that the core
activity of the literary critic in whatever language is critical analysis and
close reading. Although we have not fully understood what we do when we read a
literary text, we know that we do not simply collect quantitative data. Reading
conflates the activities of information retrieval, (How many times does x
occur?); text analysis, when we examine the significance of the data, (i.e.
having found out how many times a word occurs, in a given writer, is it
different from that of any of his contemporaries, and if so, does it matter to
me?) and the identification of emotional effects (I notice that a character
tends to be presented in such a way, this determines how I as reader perceive
that character and the action in which they are involved). Therefore, while
critics may use quantitative data to support further analysis, the definition of
'close reading' is much less easy. What we do know is that it involves
intangible concepts such as sensibility, originality, creativity and is
predicated upon things that are nuanced and unprovable. These characteristics
can be comprehended by humans. But they are much more difficult to adapt to the
right or wrong, on or off, world of logical hierarchies that are ideal for
computer analysis. Furthermore, unlike linguists, literary scholars often do not
need large quantities of information in order to come to their judgements, which
they admit should not necessarily be absolute or objective. Humanists do not
necessarily expect that a problem can be solved once and for all nor that their
findings must be incontestable. (Watisboone)
To make any profitable use of computer techniques of analysis, humans must also
be able to define exactly the problem under investigation, what the nature of
the data is, and why results are significant. This is something that many
English literature scholars find difficult, and this may be a reflection as much
of the nature of the subject as the competence of the researcher. Text encoders
might suggest that the text under analysis is insufficiently well analysed and
marked up for the user's purpose. Perhaps therefore they should spend some time
marking up their text. But what should they mark up? Even if they could define
the sort of literary nuances that they are looking for, or translate them into
an encoding system, would this really be a good use of time? The text would have
to be so heavily marked up that the critic might as well just read it
anyway.
Should h