The systematic study and analysis of literature dates back to the beginnings of literary "text production"; even the earliest forms of oral literature were practiced in a context of descriptive and prescriptive aesthetics. With
the rise of written literature emerged a canon of rules that could be applied to text in order to evaluate its adherence to
poetic norms and values, and very soon quantitative and qualitative methods of text analysis were applied in textual exegesis.
But the analysis of literature is traditionally seen as a subjective procedure. Objectivity, based on empirical evidence,
does not seem to figure prominently in studies that elucidate meaning from literary texts. In most studies, however, some
kind of exemplary textual sampling does take place, and scholars occasionally arrive at value judgments that are based on
the observation of frequent occurrences or the absence of certain textual features. The exact number of occurrences and/or their distribution in long texts is difficult to establish, because literary texts, in particular novels, make a thorough
analysis of every single word or sentence almost impossible. Empirical evidence that is truly representative for the whole
text is extremely difficult to come by, and mainstream literary scholarship has come to accept this limitation as a given
A simultaneous possession by the reader of all the words and images of Middlemarch), À la recherche du temps perdu, or Ulysses may be posited as an ideal, but such an ideal manifestly cannot be realized. It is impossible to hold so many details in
the mind at once.
(Miller 1968: 23)
The first computer-assisted studies of literature of the 1960s and 1970s used the potential of electronic media for precisely
these purposes – the identification of strings and patterns in electronic texts. Word lists and concordances, initially printed as books but
later made available in electronic format, too, helped scholars come to terms with all occurrences of observable textual features.
The "many details", the complete sets of textual data of some few works of literature, suddenly became available to every scholar. It was no
longer acceptable, as John Burrows pointed out, to ignore the potential of electronic media and to continue with textual criticism
based on small sets of examples only, as was common usage in traditional literary criticism: "It is a truth not generally acknowledged that, in most discussions of works of English fiction, we proceed as if a third,
two-fifths, a half of our material were not really there" (Burrows 1987: 1).
Literary computing was seen to remedy this shortcoming, and it has provided substantial insights into some questions of style
and literary theory. Most studies on patterns and themes that were published in the last twenty years question concepts of
text and method, and by investigating literature with the help of a powerful tool these studies situate themselves in a context
of meta-discourse: the question of "method" remains at the heart of most electronic analysis of literature.
Seen as a mere tool without any inherent analytical power of its own, the computer in literary studies enhances the critic's
powers of memory electronically, thereby providing a complete database of findings that meet all predefined patterns or search
criteria. As error-prone manual sampling becomes obsolete, textual analysis as well as the ensuing interpretation of a text
as a whole can be based on a complete survey of all passages that promise results, no matter how long the text is. Comparative
approaches spanning large literary corpora have become possible, and the proliferation of primary texts in electronic form
has contributed significantly to the corpus of available digital texts. In order to be successful, literary computing needs
to use techniques and procedures commonly associated with the natural sciences and fuse them with humanities research, thereby
bringing into contact the Two Cultures: "What we need is a principal use of technology and criticism to form a new kind of literary study absolutely comfortable with
scientific methods yet completely suffused with the values of the humanities" (Potter 1989: xxix).
The history of literary computing, however, shows that only a limited number of textual phenomena can be analyzed profitably
in the context of quantitative and qualitative computer-based analyses of style. These phenomena have to be linked to some
surface features that can be identified by electronic means, usually by some form of pattern matching. Computers are exceptionally
well suited for this kind of analysis, and only human intuition and insight, in combination with the raw computing power of
machines programmed to act as highly specialized electronic tools, can make some texts or textual problems accessible to scholars.
As Susan Hockey writes:
In the most useful studies, researchers have used the computer to find features of interest and then examined these instances
individually, discarding those that are not relevant, and perhaps refining the search terms in order to find more instances.
They have also situated their project within the broader sphere of criticism on their author or text, and reflected critically
on the methodology used to interpret the results.
(Hockey 2000: 84)
The methodological implications of such approaches to literary texts accommodate computer-based and computer-assisted studies
within the theoretical framework of literary-linguistic stylistics. In this context, texts are seen as aesthetic constructs
that achieve a certain effect (on the reader) by stylistic features on the surface structure of the literary text. These features
are sometimes minute details that the reader does not normally recognize individually but that nevertheless influence the
overall impression of the text. The presence or absence of such features can only be traced efficiently by electronic means,
and while the reader may be left with a feeling of having been manipulated by the text without really knowing how, the computer
can work out distribution patterns that may help understand how a particular effect is achieved. "[U]nexpectedly high or low frequencies or occurrences of a feature or some atypical tendency of co-occurrence are, in their
very unexpectedness or atypicality, noteworthy", Michael Toolan maintains, but then continues that "[elaborate statistical computations are unlikely to be illuminating in these matters of subtle textual effect" (1990: 71). This view, frequently expounded by scholars who see literary computing more critically, points at one of the
central shortcomings of the discipline: in order to be acknowledged by mainstream criticism, computer-based literary studies
need to clarify that the computer is a tool used for a specific result in the initial phases of literary analysis. No final
result, let alone an "interpretation" of a text, can be obtained by computing power alone; human interpretation is indispensable to arrive at meaningful results.
And in particular the aim of the investigation needs to be clarified; every "computation into criticism", to use Burrows's term, has to provide results that transcend the narrow confines of stylo-statistical exercises.
As for studies of punctuation, sentence length, word length, vocabulary distribution curves, etc., the numbers have been crunched
for about twenty years now. It is clearly established that the distribution of such features is not random, or normal in the
statistical sense. The extent of such variance from the models has been measured with great precision. But since no one ever
claimed that a literary text was a random phenomenon, or a statistically normal distribution, it is difficult to see the point
of the exercise.
(Fortier 1991: 193)
Statistics, in conjunction with quantifiable data and a (supposedly) positivistic attitude toward textual phenomena, have
contributed to the image of computer-based literary analysis as a "difficult" or "marginal" pursuit. And in the context of a shift away from close reading toward a more theory-oriented approach to literary texts,
new models of textuality seemed to suggest that literary computing was occupied with fixed meanings that could be elucidated
by counting words and phrases. This image was further enhanced by references to this procedure in literature itself, as David
Lodge shows in his novel Small World:
"What's the use? Let's show him, Josh." And he passed the canister to the other guy, who takes out a spool of tape and fits it on to one of the machines. "Come over here", says Dempsey, and sits me down in front of a kind of typewriter with a TV screen attached. "With that tape", he said, "we can request the computer to supply us with any information we like about your ideolect." "Come again?" I said. "Your own special, distinctive, unique way of using the English language. What's your favourite word?" "My favourite word. I don't have one." "Oh yes you do!" he said. "The word you use most frequently."
The simplistic view of computer-based studies as "counting words" has been a major factor for later studies that were seen in this light. Contrary to received opinion, studies of literature
that use electronic means are mostly concerned with questions of theory and method. Especially the notion of what constitutes
a "text" and how, therefore, a given theory of text influences the procedures of analysis and interpretation, form the basis of every
A literary text, interpreted as an aesthetic construct that achieves a certain effect through the distribution of words and
images, works on various levels. Without highly elaborate thematic – and therefore by definition interpretative – markup, only surface features of texts can be analyzed. These surface features are read as significant in that they influence
the reader's understanding and interpretation of the text. This has a number of theoretical implications: if a literary text
carries meaning that can be detected by a method of close reading, then computer-assisted studies have to be seen as a practical
extension of the theories of text that assume that "a" meaning, trapped in certain words and images and only waiting to be elicited by the informed reader, exists in literature.
By focusing primarily on empirical textual data, computer studies of literature tend to treat text in a way that some literary
critics see as a reapplication of dated theoretical models:
One might argue that the computer is simply amplifying the critic's powers of perception and recall in concert with conventional
perspectives. This is true, and some applications of the concept can be viewed as a lateral extension of Formalism, New Criticism,
Structuralism, and so forth.
(Smith 1989: 14)
Most studies, both quantitative and qualitative, published in the context of literary humanities computing after powerful
desktop computers became available, tend to prioritize empirical data, either in the form of automatically extracted stylistic
features, or as encoded thematic units that are then quantified, mapped, and interpreted.
Most suitable for this kind of literary analysis are studies of repeated structures in texts. These are usually characters,
syllables, words, or phrases that reappear throughout a text or a collection of texts. These repetitions are frequently recognized
by readers as structural devices that help segment a text, or link passages in texts. Chapters, characters, locations, thematic
units, etc., may thus be connected, parallels can be established, and a systematic study of textual properties, such as echoes,
contributes substantially to the understanding of the intricate setup of (literary) texts. This type of analysis is closely
linked to theoretical models of intertextuality used in non-computer-based literary studies, and here the impact of electronic
procedures is felt most acutely. Repetitions and echoes can be traced throughout a text in a consistent fashion; it takes,
however, a sound theoretical model that allows one, first to identify, and then to isolate common formal properties of these
textual units. The criterion of reliability and verifiability of results and findings is all-important in studies of repeated
structures, and maps of distribution and significant presences and absences of textual features are used as the basis for
a more detailed analysis. In this area computer-assisted approaches substantially contributed to the understanding of literary
texts, and electronic studies of literary texts provided empirical evidence for the analysis of a broad range of intertextual
The methodological problems connected with this kind of approach feature prominently in nearly all electronic studies of literary
texts: how does traditional criticism deal with formal properties of text, and where do electronic studies deviate from and/or enhance established techniques. Most computer-assisted studies of literature published in the last twenty years examine
their own theoretical position and the impact of formal(ized) procedures on literary studies very critically. Nearly all come
to the conclusion that rigorous procedures of textual analysis are greatly enhanced by electronic means, and that the basis
for scholarly work with literary texts in areas that can be formalized is best provided by studies that compile textual evidence
on an empirical basis.
The concept of rigorous testing, ideally unbiased by personal preferences or interpretation by the critic, relies on the assumption
that textual properties can be identified and isolated by automatic means. If automatic procedures cannot be applied, stringent
procedures for the preparation of texts have to be designed. It has been, and still is, one of the particular strengths of
most electronic studies of literature that the criteria used in the process of analysis are situated in a theoretical model
of textuality that is based on a critical examination of the role of the critic and the specific properties of the text.
These textual properties often need to be set off against the rest of the text, and here markup as the most obvious form of
"external intervention" plays a leading role. The importance of markup for literary studies of electronic texts cannot be overestimated, because
the ambiguity of meaning in literature requires at least some interpretative process by the critic even prior to the analysis
proper. Words as discrete strings of characters, sentences, lines, and paragraphs serve as "natural" but by no means value-free textual segments. Any other instance of disambiguation in the form of thematic markup is a direct
result of a critic's reading of a text, which by definition influences the course of the analysis. As many computer-based
studies have shown, laying open one's criteria for encoding certain textual features is of prime importance to any procedure
that aspires to produce quantifiable results. The empirical nature of the data extracted from the electronic text and then
submitted to further analysis allows for a far more detailed interpretation that is indeed based on procedures of close reading,
and this "new critical analyse de texte, as well as the more recent concepts of inter-textuality or Riffaterrian micro-contexts can
lead to defensible interpretations only with the addition of the rigour and precision provided by computer analysis" (Fortier 1991: 194).
As the majority of literary critics still seem reluctant to embrace electronic media as a means of scholarly analysis, literary
computing has, right from the very beginning, never really made an impact on mainstream scholarship. Electronic scholarly
editions, on the contrary, are readily accepted in the academic community and they are rightly seen as indispensable tools
for both teaching and research. But even the proliferation of electronic texts, some available with highly elaborate markup,
did not lead to an increasing number of computer-based studies.
This can no longer be attributed to the lack of user-friendly, sophisticated software specifically designed for the analysis
of literature. If early versions of TACT, Word-Cruncher, OCP, or TuStep required considerable computing expertise, modern
versions of these software tools allow for easy-to-use routines that literary scholars without previous exposure to humanities
computing can master. In addition, dedicated scholarly software has become very flexible and allows the user to dictate the
terms of analysis, rather than superimpose certain routines (word lists, concordances, limited pattern matching) that would
prejudice the analysis.
Early computer-based studies suffered greatly from hardware and software constraints, and as a result software tools were
developed that addressed the specific requirements of scholarly computing. Although these tools proved remarkably effective
and efficient given that the hardware available for humanities computing was rather slow and basic, it still took considerable
expertise to prepare electronic texts and convert them into machine-readable form. As no standardized form of encoding existed
until the Text Encoding Initiative (TEI) was formed, most scholars adopted some system of markup that reflected their particular
research needs. Initially, these systems were non-standardized, but later the majority of studies used COCOA tags for markup,
but these systems for the scholarly encoding of literary texts needed to be adjusted to the specific software requirements
of the programs used for the analysis. Accessing the results of computer-assisted studies in the form of printouts was equally
cumbersome, and any statistical evaluation that extended the range of predefined options of standard software would have to
be designed specifically for every individual application. Visualization, the plotting of graphs, or the formatting of tables
required considerable expertise and expensive equipment and was thus mostly unavailable to one-person projects.
In the light of these technical difficulties it seemed that once hardware limitations no longer existed and the computing
infrastructure was up to the demands of scholarly computing, the electronic analysis of literature would become a major field
of research. Methodological problems addressed in studies that wanted to but could not, for technical reasons, attempt more
demanding tasks that required large sets of data, access to a multitude of different texts and enough computing power to scan
long texts for strings, for example, seemed a direct result of technical limitations.
But the three basic requirements, seen as imperative for eventually putting literary computing on the map of mainstream scholarship,
have been met since the early 1960s and 1970s:
• virtually unlimited access to high-quality electronic texts;
• sophisticated software that lets the user define the terms of analysis rather than vice versa;
• powerful computing equipment that supplies unlimited computing power and storage capacity.
Despite impressive advances in both hardware and software development, and although electronic texts with markup based on
the TEI guidelines have become available on the net, literary computing still remains a marginal pursuit. Scholarly results
are presented at international conferences organized by the Association for Literary and Linguistic Computing (ALLC) and the
Association for Computers and the Humanities (ACH) that are designed to inform humanists with a background in the discipline.
The results are published in scholarly journals (L&LC, Literary and Linguistic Computing;, and CHum, Computers and the Humanities) but rarely make an impact on mainstream scholarship. This dilemma has been commented on repeatedly: Thomas Corns, Rosanne
Potter, Mark Olsen, and Paul Fortier show that even the most sophisticated electronic studies of canonical works of literature
failed to be seen as contributions to the discourse of literary theory and method. Computer-based literary criticism has not
"escaped from the ghetto of specialist periodicals to the mainstream of literary periodicals", Corns writes, and continues that the "tables and graphs and scattergrams and word lists that are so characteristic of computer-based investigation are entirely
absent from mainstream periodicals" (Corns 1991: 127).
One reason for this, apart from a general aversion to all things electronic in traditional literary criticism, is described
by Jerome McGann as the notion of relevance, because
the general field of humanities education and scholarship will not take the use of digital technology seriously until one
demonstrates how its tools improve the ways we explore and explain aesthetic works – until, that is, they expand our interpretational procedures. (McGann 2001: xii)
It is important that computer-assisted studies position themselves in the field of recent scholarship, take up the theoretical
issues of text and textuality, and convey to the field of non-experts that the results merit closer inspection. Computers
are not used for the sake of using new tools, but computers can supplement the critic's work with information that would normally
be unavailable to a human reader. Speed, accuracy, unlimited memory, and the instantaneous access to virtually all textual
features constitute the strength of the electronic tool. By tapping into the ever-growing pool of knowledge bases and by linking
texts in ways that allow them to be used as huge repositories of textual material to draw on, traditional literary criticism
can profit substantially from the knowledge and expertise accumulated in the search for a more rigorous analysis of literature
as practiced in computer-based studies.
By looking at the history of literary computing, however, one cannot fail to see that most contributions add significant insight
in a very narrow spectrum of literary analysis – in the area of stylistic studies that focus on textual features. The input of computing in these studies is limited to the
preparation and preparatory analysis of the material under consideration. No immediate result, of course, can be obtained
by the computer, but data are collected that allow for and require further analysis and interpretation by the researcher.
The results, however, are impressive. Numerous studies of individual, and collections of, texts show that empirical evidence
can be used productively for literary analysis. The history of literary computing shows that the field itself is changing.
Stylo-statistical studies of isolated textual phenomena have become more common, even if the computing aspect does not always
figure prominently. More and more scholars use electronic texts and techniques designed for computing purposes, but the resulting
studies are embedded in the respective areas of traditional research. The methods, tools, and techniques have thus begun to
influence literary criticism indirectly.
Right from the very beginning, humanities computing has always maintained its multi-dimensional character as far as literary
genre, socio-cultural context and historic-geographical provenance of literary texts is concerned. Studies have focused on
poetry, drama, and narrative from antiquity to the present day. Although an emphasis on literature in English can be observed,
texts in other languages have also been analyzed. The variety of approaches used to come to terms with heterogeneous textual
objects, the multitude of theoretical backgrounds and models of literature brought to bear on studies that share as a common
denominator neither one single technique nor one "school of thought", but the application of a common tool, are the strong points of studies of literature carried out with the help of the computer.
Discussions of literary theory, textuality, and the interdisciplinary nature of computer-assisted literary analysis feature
prominently in modern studies. In this respect, mainstream literary criticism is most open to contributions from a field that
is, by its very nature, acutely aware of its own theoretical position. In the future, the discourse of meta-criticism, however,
may be fused with innovative approaches to literary texts. As Jerome McGann points out:
A new level of computer-assisted textual analysis may be achieved through programs that randomly but systematically deform
the texts they search and that submit those deformations to human consideration. Computers are no more able to "decode" rich imaginative texts than human beings are. What they can be made to do, however, is expose textual features that lie outside
the usual purview of human readers.
(McGann 2001: 190–1)
References for Further Reading
Ball, C. N. (1994). Automated Text Analysis: Cautionary Tales. Literary and Linguistic Computing 9: 293–302.
Burrows, J. F. (1987). A Computation into Criticism. A Study of Jane Austen's Novels and an Experiment in Method. Oxford: Oxford University Press.
Burrows, J. F. (1992). Computers and the Study of Literature. In C. S. Butler (ed.), Computers and Written Texts (pp. 167–204). Oxford: Blackwell.
Busa, R. (1992). Half a Century of Literary Computing: Towards a "New" Philology. Literary and Linguistic Computing 7: 69–73.
Corns, T. N. (1991). Computers in the Humanities: Methods and Applications in the Study of English Literature. Literary and Linguistic Computing 6: 127–30.
Feldmann, D., F.-W. Neumann, and T. Rommel, (eds.) (1997). Anglistik im Internet. Proceedings of the 1996 Erfurt Conference on Computing in the Humanities. Heidelberg: Carl Winter.
Finneran, R. J., (ed.) (1996). The Literary Text in the Digital Age. Ann Arbor: University of Michigan Press.
Fortier, P. A. (1991). Theory, Methods and Applications: Some Examples in French Literature. Literary and Linguistic Computing 6, 192–6.
Fortier, P. A., (ed.) (1993–4). A New Direction for Literary Studies? Computers and the Humanities 27 (special double issue).
Hockey, S. (1980). A Guide to Computer Applications in the Humanities. London: Duckworth.
Hockey, S. (2000). Electronic Texts in the Humanities. Principles and Practice. Oxford: Oxford University Press.
Landow, G. P. and P. Delany, (eds.) (1993). The Digital Word: Text-Based Computing in the Humanities. Cambridge, MA: MIT Press.
McGann, J. (2001). Radiant Textuality: Literature After the World Wide Web. New York: Palgrave.
Miall, D. S., (ed.) (1990). Humanities and the Computer: New Directions. Oxford: Oxford University Press.
Miller, J. H. (1968). Three Problems of Fictional Form: First-person Narration in David Copperfield and Huckleberry Finn. In R. H. Pearce (ed.), Experience in the Novel: Selected Papers from the English Institute (pp. 21–48). New York: Columbia University Press.
Opas, L. L. and T. Rommel, (eds.) (1995). New Approaches to Computer Applications in Literary Studies. Literary and Linguistic Computing 10: 4.
Ott, W (1978). Metrische Analysen zu Vergil, Bucolica. Tübingen: Niemeyer.
Potter, R. G., (ed.) (1989). Literary Computing and Literary Criticism: Theoretical and Practical Essays on Theme and Rhetoric. Philadelphia: University of Pennsylvania Press.
Renear, A. (1997). Out of Praxis: Three (Meta) Theories of Textuality. In K. Sutherland (ed.), Electronic Textuality: Investigations in Method and Theory (pp. 107–26). Oxford: Oxford University Press.
Robey, D. (1999). Counting Syllables in the Divine Comedy: A Computer Analysis. Modern Language Review 94: 61–86.
Rommel, T. (1995). "And Trace It in This Poem Every Line." Methoden und Verfahren computerunterstützter Textanalyse am Beispiel von Lord Byrons Don Juan. Tübingen: Narr.
Smedt, K. et al., (eds.) (1999). Computing in Humanities Education: A European Perspective. ACO*HUM Report. Bergen: University of Bergen HIT Center.
Smith, J. B. (1989). Computer Criticism. In R. G. Potter (ed.), Literary Computing and Literary Criticism: Theoretical and Practical Essays on Theme and Rhetoric (pp. 13–44). Philadelphia: University of Pennsylvania Press.
Sutherland, K., (ed.) (1997). Electronic Textuality: Investigations in Method and Theory. Oxford: Oxford University Press.
Toolan, M. (1990). The Stylistics of Fiction: A Literary-Linguistic Approach. London: Routledge.