David L. Hoover
Quantitative Analysis and Literary Studies
History, Goals, and Theoretical Foundation
Modern quantitative studies of literature begin about 1850, with periods of intense activity in the 1930s and the 1980s. Fortunately, several excellent overviews discuss earlier work in the context of computers and literary studies (Burrows 1992a), stylometry (Holmes 1998), and authorship attribution (Holmes 1994; Love 2002). We can thus concentrate here on recent advances, driven primarily by the huge growth in the availability of electronic texts, increasingly sophisticated statistical techniques, and the advent of much more powerful computers that have produced much more accurate and persuasive analyses.
Quantitative approaches to literature represent elements or characteristics of literary texts numerically, applying the powerful, accurate, and widely accepted methods of mathematics to measurement, classification, and analysis. They work best in the service of more traditional literary research, but recent and current work often necessarily concentrates much of its effort on the development of new and improved methodologies. The availability of large numbers of electronic literary texts and huge natural language corpora has increased the attractiveness of quantitative approaches as innovative ways of "reading" amounts of text that would overwhelm traditional modes of reading. They also provide access to kinds of information that are not available even in principle without them. Quantitative approaches are most naturally associated with questions of authorship and style, but they can also be used to investigate larger interpretive issues like plot, theme, genre, period, tone, and modality.
A concrete example will suggest some of the benefits of quantitative analysis. In To the Lighthouse, Virginia Woolf describes a vacation house that has been closed up for the winter:
Nothing it seemed could break that image, corrupt that innocence, or disturb the swaying mantle of silence Once only a board sprang on the landing; once in the middle of the night with a roar, with a rupture, as after centuries of quiescence, a rock rends itself from the mountain and hurtles crashing into the valley, one fold of the shawl loosened and swung to and fro.
A critic struck by the comparison of a rock hurtling into a valley with a shawl loosening and swinging might also be interested in the apparent self-agency of the rock, the board, and the shawl, and might want to investigate Woolf's use of inanimate objects where animates are expected, a type of personification. The critic would normally deploy a series of supporting examples selected from the novel through careful reading, perhaps including some striking examples like these:
all round the table, beginning with Andrew in the middle, like a fire leaping from tuft to tuft of furze, her children laughed
It was as if the water floated off and set sailing thoughts which had grown stagnant on dry land, and gave to their bodies even some sort of physical relief.
And now in the heat of summer the wind sent its spies about the house again.
The list might be expanded with similar examples involving body parts:
His hands clasped themselves over his capacious paunch, his eyes blinked, as if he would have liked to reply kindly to these blandishments
Indeed she had been keeping guard over the dish of fruit … hoping that nobody would touch it until, oh, what a pity that they should do it — a hand reached out, took a pear, and spoilt the whole thing.
For how could one express in words these emotions of the body? … It was one's body feeling, not one's mind.
Most readers will agree that Woolf's personifications are striking, but their literary functions seem quite varied. In the first, the comparison of laughter and fire seems an apt and vivid way of characterizing the spontaneous, variable, and contagious outbreak of humor, while the personification of the hand in the fifth example, by removing the agency, focuses our attention on the fruit basket still life.
A careful enough reading can examine all the uses of inanimate objects and body parts in the novel. If the goal is merely to point out the personifications or to categorize them, there may be little gain in quantifying the analysis, though categorizing the personifications would seem peculiar without any indication of the frequencies of the various categories. Examples are rarely significant, however, unless they are either unusual or characteristic of the novel or the author — otherwise why analyze them? And the unusual and the characteristic must be validated by counting and comparison: the bare claim that Woolf uses a great deal of personification is without value and nearly meaningless unless it is quantified. In rare cases the quantification can be implicit: no mathematical demonstration is necessary to show that a novel without the word "the" is unusual, but Woolf's use of inanimate subjects is another matter. A single remarkable use of personification can certainly be significant and noteworthy, but most stylistic and interpretive observations rest upon patterns, and, therefore, upon repetition. Basing an argument about To the Lighthouse on the prevalence of personification, then, requires counting those personifications and at least a rough comparison of their frequency with some kind of norm or reference point. Finding dozens of odd inanimate subjects in To the Lighthouse and only a few in other modernist novels of roughly the same length might be sufficient.
Readers who know Woolf's novel well may doubt the centrality of personification to its interpretation: in To the Lighthouse, the personification seems to be an aesthetic literary device rather than an important and integral stylistic characteristic. The same cannot be said of The Inheritors (Golding 1955). In that strange novel, the extreme prevalence of body parts and inanimate objects as agents and subjects of verbs of motion (and even verbs of perception) is central to Golding's creation of the imagined Neanderthal world-view of the text (see Hoover 1999 for discussion). Many stylistic and interpretive patterns, however, are far more pervasive or far more subtle, and they require more sophisticated, more powerful, and more explicit quantification.
Almost any item, feature, or characteristic of a text that can be reliably identified can be counted, and most of them have been counted. Decisions about what to count can be obvious, problematic, or extremely difficult, and poor initial choices can lead to wasted effort and worthless results. Even careful planning leaves room for surprises, fortunately often of the happy sort that call for further or different quantification. The frequencies of various letters of the alphabet and punctuation marks, though not of obvious literary interest, have been used successfully in authorship attribution, as have letter n-grams (short sequences of letters). Words themselves, as the smallest clearly meaningful units, are the most frequently counted items, and syntactic categories (noun, verb, infinitive, superlative) are also often of interest, as are word n-grams (sequences) and collocations (words that occur near each other). Thematic or semantic categories (angry words, words related to time), while more difficult to count, have the advantage of being clearly relevant to interpretation, and automated semantic analysis may reduce the effort involved. Phrases, clauses, syntactic patterns, and sentences have often been counted, as have sequences or subcategories of them (prepositional phrases, subordinate clauses, passive sentences). Many of the items listed above are also used as measures of the lengths of other items: word length in characters, sentence or clause length in letters or words, text length in words, sentences, paragraphs, and so forth. Nonlinguistic textual divisions ranging from small units like lines and couplets to larger structural units like paragraphs, stanzas, scenes, acts, and chapters can also sometimes be usefully counted, as can literary categories like narrators and characters (including subcategories like first-person and third-person narrators, and characters divided by age, ethnicity, nationality, class, and gender), and plot elements (marriages, deaths, journeys, subplots).
The most obvious place to count whatever is counted is a single literary text that is of interest, as with the example from Woolf above. The need for some kind of comparative norm suggests that counting more than one text will often be required and the nature of the research will dictate the appropriate comparison text. In some cases, other texts by the same author will be selected, or contemporary authors, or a natural language corpus. In other cases, genres, periods, or parts of texts may be the appropriate focus. Counting may be limited to the dialogue or narration of a text, to one or more speakers or narrators, or to specific passages.
In the simplest quantifications, the numbers are merely presented and interpreted or offered as evidence that further investigation is likely to be productive. A critic interested in how writers differ in their vocabularies may find the raw counts of the numbers of different words (word types) in the first 50,000-word sections of a group of novels worth studying. In the first section of Sinclair Lewis's Main Street, for example, about 8,300 different words appear, but only 4,400 in Faulkner's Light in August, where the localization of the story may make the huge difference seem comprehensible. The 5,200 different words in the first section of James's The Ambassadors and the 6,600 in London's The Sea Wolf will require different explanations, and few readers would predict that Main Street has an exceptionally large vocabulary or Light in August an exceptionally small one.
Quantification does not end with counting or measurement and presentation, of course, and many different kinds of mathematical operations have been applied to the numbers. Among the simplest of these is comparing frequencies or averages among a group of texts, often using an appropriate statistical test of significance, such as Student's T-test or Chi-square, to gauge the likelihood that the observed difference could have arisen by chance. Authorship or style cannot reasonably be analyzed if the differences observed are likely to occur without the author's intervention. Fortunately, the patterns found in literary texts are often so obviously significant that no statistical testing is required, but it is easy to overestimate the oddity of a pattern, and statistical tests help to avoid untenable claims.
The standard deviation (roughly, the average difference, in either direction, of all frequencies from the mean), which measures how widely scattered the values are, and the z-score, which measures the distance of any given value from the mean in standard deviations, are often valuable for questions of textual difference. For example, in a corpus of 46 Victorian novels by six authors, the average rate of occurrence per 10,000 words is about 11 for "upon" and 63 for "on." In Silas Marner the frequencies are 4 "upon," 72 "on," and in Vanity Fair 17 "upon" and 50 "on," so that the difference between these two novels seems more extreme for "upon" than for "on." The standard deviations for the two words tell a different story: "upon" is quite variable in these six authors, with a standard deviation of about 9 words per 10,000 (not far below its average frequency of 11), but "on" is distributed much more evenly, with a standard deviation of about 15 (less than one-fourth its average frequency). Thus the frequencies of these words in the two novels are well within a single standard deviation from the mean, with z-scores between -0.84 and 0.71. Because they differ less than the average difference from the mean, frequencies in this range are quite likely to occur by chance, though the combination of the differences between the two words is suggestive.
Another simple operation is dividing the frequency of an item in one text by its frequency in another, yielding the distinctiveness ratio (DR), a measure of the difference between the texts. Ratios below 0.67 or above 1.5 are normally considered worth investigating. Returning to inanimate subjects in Woolf, even my example of 24 such subjects in one novel and 8 in another gives a distinctiveness ratio of 3. But note that in the example of "upon" above the DR between the novels is greater than 4, so that some care should be taken not to over-interpret a DR when the frequencies of the words vary a great deal. Some measures of vocabulary richness or concentration, such as Yule's characteristic constant K, take into account the frequencies of all the word types in a text and require more complex calculations, as do other measures of vocabulary richness based on probabilistic models.
Recent years have seen a trend toward multivariate methods that are especially designed to deal with large amounts of data — methods such as principal components analysis, cluster analysis, discriminant analysis, correspondence analysis, and factor analysis. Statistical programs have made these methods much more practical by performing long sequences of required computations rapidly and without error. Principal components analysis, the most popular of these methods, allows the frequencies of many different items with similar distributions in a group of texts to be combined into a single component. The result is a small number of unrelated measures of textual difference that account for most of the variation in the texts. The first two of these components are typically used to create a scatter plot in which the distance between any two texts is a simple visual measure of their similarity. This technique provides a graphical method of "reading" a large number of frequencies at once, and is much easier to interpret than the list of frequencies themselves.
Delta, a promising new measure of textual difference based on word frequency, has stirred a great deal of interest (Burrows 2002a). Delta is designed to pick the likeliest author of a questioned text from among a relatively large number of possible authors. Burrows begins by recording the frequencies of the most frequent words of a primary set of texts by the possible authors and calculating the mean frequency and standard deviation for each word in this set of texts. He then uses z-scores to compare the difference between the mean and each of the primary authors with the difference between the mean and the questioned text for each of the words. He completes the calculation by averaging the absolute values of the z-scores of all the words to produce Delta, a measure of the difference between the test text and each primary-set author. The primary set author with the smallest Delta is suggested as the author of the test text. A further innovation in Delta is that Burrows expands the set of words analyzed to the 150 most frequent rather than the 30–100 used in earlier work with PCA, and I have shown that further expansion of the list to the 800 or even the 4,000 most frequent words often produces even stronger results on long texts (Hoover 2004a).
Recently Burrows has introduced two further measures, Zeta and Iota, which concentrate on words of moderate and low frequencies, respectively (2006). For both measures, a word frequency list is created for a sample of text by a primary author, and then the sample is divided into several sections of equal size. The heart of the procedure is to record the number of these sections that contain each of the words and then record which of the words occur in samples by other authors and in texts to be tested for authorship. By sorting the word list on the basis of how many of the primary author's text sections contain each word, Burrows eliminates the very frequent words that occur in most texts and concentrates on different parts of the word frequency spectrum. For Zeta, he retains only moderately frequent words, ones that occur in a subset of the primary author's sections. Where only two poets are being compared, he then further reduces the list of words by removing those that exceed a specific frequency in the works of the second poet. Where many authors are being compared, he removes words that appear in the text samples of most of the other authors. Whether there are two or many authors, the result is a list of words that are moderately frequent in the primary author and very infrequent in the other author(s). For Iota, words are removed that appear in most of the sections by the primary author. For two authors, words are also removed that do not appear in the second author's sample, and, where many authors are being tested, the second step removes words that appear in about half or more of the other authors. Both of these methods are remarkably effective in attributing poems as short as 1,000 words, and the discussion of methodology and substance is rich enough to provoke another wave of interest in authorship attribution and stylometry.
Techniques related to artificial intelligence are also increasingly being applied, including neural networks, machine learning, and data mining (see, for example, Waugh, Adams, and Tweedie 2000). These methods, which require more computing power and expertise than many other methods, are sometimes used in authorship attribution, but more often in forensic than in literary contexts. One reason for this is that they treat authorship attribution as a classification problem, and their results are more difficult to extend to traditional literary questions. Many of these techniques, as well as key word analysis, are well suited to the analysis of content, especially in the context of the huge amounts of text being produced on the World Wide Web.
Many kinds of studies of literary texts use quantitative methods. Quantitative thematic analysis can trace the growth, decay, or development of vocabulary within a thematic domain, or study how authors differ in their expressions of a theme (see Fortier 2002). Many empirical studies of literature translate readers' judgments into numerical scales to study literary response using techniques borrowed from the social sciences. Metrical analysis, because of the inherent reliance of meter on pattern, is a natural area for quantitative study, though there has been less research in this area than one might have expected.
A growing area of research is in the study of manuscript relationships, where techniques designed for the study of genetic relationship among organisms have been ingeniously and fruitfully applied to the study of the manuscripts of Chaucer's Canterbury Tales (see, for example, Spencer et al. 2003). These studies take literally the metaphor of genetic relationships among manuscripts, treating differences among them as if they were differences in DNA sequences. The huge amount of data involved in some manuscript traditions invites and practically requires sophisticated statistical techniques, whether those of evolutionary biology or other multivariate techniques. Genre and period definition and classification also benefit from quantitative approaches, especially factor analysis and other multivariate techniques.
Authorship attribution and statistical stylistics (or stylometry), currently two of the most important areas of quantitative analysis of literature, deserve a fuller treatment. They share many basic assumptions and methods, though some techniques that are effective in distinguishing authors may have no clear interpretive value. A discussion of authorship attribution in the present context necessarily forces a distinction between forensic and literary authorship attribution that is sometimes without a difference. Determining who wrote a text generally requires much the same methodology whether the text is ransom note, a threatening letter, a legal opinion, the federalist papers, a contemporary political novel like Joe Klein's Primary Colors, an anonymous eighteenth-century verse satire, or play by Shakespeare.
Yet two differences between the forensic and literary attribution must be kept in mind. First, in many forensic contexts, the identity of the person who produced the language of the text may be irrelevant, while the identity of the person responsible for sending it may be crucial. A kidnapper may force a victim to write a ransom note, and a manifesto may be cribbed from a series of websites; determining these facts may or may not help to solve the crime. Second, the text in a forensic problem typically has little intrinsic value and becomes irrelevant once the attribution is made and the crime solved. In the case of literary attribution, however, and preeminently for Shakespeare's plays, the aesthetic and contested cultural value of the texts lies at the heart of the problem. One consequence of these differences is that literary attribution is often only a first step, so that methods easily turned to stylistic or interpretive purposes tend to be favored.
Only when external evidence fails is it reasonable to apply quantitative methods, and the presence or absence of a closed set of possible authors and differences in the size and number of documents available for analysis are usually more significant than the kind of text involved. Here I will concentrate on reasonably tractable kinds of literary authorship problems in which the questioned text is of a reasonable size and similar texts by the claimant authors are available.
Authorship attribution has often been based on a single variable like word length, sentence length, or vocabulary richness, and some researchers continue to achieve good results using a small number of such variables. Most current research, however, has turned to more robust multivariate methods such as principal components analysis and cluster analysis, often combining the results of more than one method to solve a problem. In his excellent overview of computers and the study of literature, Burrows (1992a) uses principal components analysis (PCA) of the fifty most frequent words to argue against the possibility that Lady Vane had a hand in the "Memoirs of a Lady of Quality" which Smollet includes in his Peregrine Pickle. In other work, he shows that the seventy-five most frequent words can successfully distinguish 4,000-word sections of novels by the Brontë sisters and that the twenty most frequent words can distinguish 500-word sections of letters by Scott and Byron (Burrows 1992b). Even more remarkable, when statistical tests are used to select the words that most effectively discriminate between Scott and Byron, the ten most effective of these do an excellent job of separating the works of Scott and Byron even across several genres. A good recent example of this methodology, using a careful approach that also takes into account traditional methods, persuasively adds additional shore journalism to Stephen Crane's small oeuvre (Holmes et al. 2001). Principal components analysis (PCA) of the fifty most frequent words of the texts shows that Crane's fiction can be distinguished from Conrad's, that his fiction can be distinguished from his shore journalism and New York City journalism, and that these two kinds of journalism are different from his war journalism. The same method shows that Crane's shore and New York journalism are different from that of his brother Townley and two other contemporary journalists. Both PCA and cluster analysis strongly suggest that seventeen pieces of previously unattributed shore journalism (known to be by one of the brothers) is Stephen's rather than Townley's.
Authorship attribution based on n-grams, sequences of various numbers of letters or words, has become increasingly popular, sometimes performing better than word frequency alone, especially on small texts. In a wide-ranging and provocative article, Clement and Sharp (2003) show that both letter and word n-grams perform marginally better than methods based on words. For these experiments, the frequencies of the various items in the known documents are transformed into probabilities of randomly extracting them from the text, and the test document is assigned to the author whose training set maximizes the probability of generating the text. Besides presenting many different methods and varieties of results, Clement and Sharp raise important questions about the relationship between content and style, the effects of text size, and apparently random differences that alter the accuracy of analyses. Although letter n-grams lack any transparent relationship to the meaning or style of a text, and are unlikely to be attractive to researchers who are interested in broader literary questions, word n-grams are likely to become increasingly popular because they may both improve accuracy and allow the critic to focus on meaningful word groups.
Four Exemplary Studies
Statistical stylistics or stylometry is the broadest of the areas in which quantitative analysis intersects with literary study, and it might be said to subsume parts or all of the applications just discussed. Its central concerns are closest to those of literary studies in general, with a special emphasis on the patterns that comprise style and how those patterns are related to issues of interpretation, meaning, and aesthetics. Rather than surveying or describing various kinds of stylometric studies, I will focus in a more detail on four recent articles that exemplify some of the most central concerns and methods while treating important literary problems and questions.
"Cicero, Sigonio, and Burrows" (Forsyth, Holmes, and Tse 1999) is about authorship, but it also treats issues of chronology and genre. It applies methods first proven on English to classical Latin and neo-Latin (inflected languages) and examines not only words, but also word length (in syllables) and some information about transitions between words of different lengths. These variables are analyzed using PCA, cluster analysis, and discriminant analysis, and the authors combine careful analysis with useful methodological observations. The central question asked is whether it is likely that the Consolatio Ciceronis, which was edited and published by Sigonio in 1583, is really the lost work known to have been written by Cicero about 45 bc and existing only as fragments quoted in other works, or, as was suggested shortly after its publication, a forgery by Sigonio himself. Can authorship attribution methods distinguish "between Cicero and Ciceronianism" (Forsyth, Holmes, and Tse 1999: 378)?
After collecting more than 300,000 words of classical and neo-Latin by eleven authors and dividing them into 70 sample texts, the authors use PCA based on the 46 most frequent function words to show that Cicero's oratory is distinct from his prose —that, as has often been noted, genre effects sometimes overwhelm authorship effects. The same method distinguishes Cicero well from six other classical authors, as does cluster analysis, and both produce slightly weaker but still broadly accurate results when Sigonio is tested against the other sixteenth-century authors.
Turning to Sigonio, Cicero, and the Consolatio, the authors use stepwise discriminant analysis of known texts by Sigonio and Cicero to determine the words that are most effective in distinguishing the two authors. This technique is especially appropriate in cases like this one where some samples belonging to distinct groups are available. It identifies a small group of discriminators that are quite effective in distinguishing Sigonio and Cicero: they classify only two Ciceronian texts as by Sigonio and attribute both sections of the Consolatio to him. Discriminant analysis is also used to discover variables that distinguish effectively between classical and neo-Latin, discriminators that classify the Consolatio as neo-Latin. Adding information about word length and syllable transition improves the accuracy of the analyses and more firmly identifies the Consolatio as neo-Latin. Finally, discriminant analysis also shows that the Consolatio is enough like Sigonio's other work to suggest that he is its author.
The authorship of this disputed work is inherently significant, and this article does an exceptionally clear job of describing the literary and cultural situation in which the authorship question is to be asked. The careful division of the problem into subproblems provides clarity, and the variety and methodological sophistication of the analyses both strengthen the case against Cicero as the author and serve as a guide to future work.
In "Jonsonian Chronology and A Tale of a Tub," one of several careful and important studies, Hugh Craig also uses discriminant analysis, but he applies it to a very different literary problem (1999). His central question is the position of Ben Jonson's A Tale of a Tub in the chronology of his work, and thus the context in which the play is read. Whether, "it is a late work of pastiche or is in origins an early, naively conventional one" (230–31) has important implications for its significance and its interpretation.
The existence of several datable early, middle, and late comedies allows Craig to set up a discriminant analysis based on the 58 most frequent function words, more heavily weighting those words that discriminate best among the three periods. When the plays are divided into 2,000-word segments, those segments separate clearly into clusters, with very little overlap. Craig's methodology, like the PCA analysis popularized by Burrows, allows a scatter plot of the play segments to be compared with a scatter plot of the variables that produced it, and this in turn allows him to discuss the words and their stylistic and chronological implications.
Analyzing segments of A Tale of a Tub in the same way shows that its segments are very widely dispersed: some appear among the earliest segments, some in the middle, some among the late segments, and one outside all three of these clusters. Though this is consistent with an early play later revised, Craig wisely tests other late plays, showing that the scatter is not an artifact of the analysis and that A Tale of a Tub is much more widely scattered than the others. The sectioning of the plays also allows a discussion of plot and content in relationship to the scattered segments.
Next, the play is repeatedly re-segmented at 100-word intervals and subjected to the same analysis to pinpoint abrupt changes in style, which are discussed with reference to the boundaries between acts and scenes. Finally, rolling segments of A Tale of a Tub are compared with those of other plays, showing that the fluctuations in A Tale of a Tub are much more extreme. Although no firm conclusions can be reached, this careful, innovative, and thorough analysis strongly suggests an early play reworked by Jonson near the end of his career. Craig's frequent and insightful return to the text and to questions of serious literary significance marks this as model stylometric analysis.
In taking up "Charles Brockden Brown: Quantitative Analysis and Literary Interpretation" (Stewart 2003) we shift genres, continents, centuries, and focus, but retain a strong relationship between quantitative analysis and more traditional literary questions. Rather than authorship or chronology, Stewart focuses on the styles of narration in two novels by Charles Brockden Brown (1771–1810), Wieland or The Transformation and the unfinished Memoirs of Carwin, the Biloquist. He investigates whether Brown successfully creates distinct narrative voices for the four narrators of Wieland and a consistent voice for Carwin, who is both one of those narrators and also the narrator of the unfinished Memoirs of Carwin.
Burrows used PCA successfully in distinguishing the dialogue of Jane Austen's various characters in his classic Computation into Criticism (1987), treating the characters as if they were literally the "authors" of their own speech. One interesting question is whether or not a writer of more modest gifts can successfully create distinct and consistent narrative styles. However important Brown may be to the origins of American literature, he is no Jane Austen. I have investigated a similar question regarding Hannah Webster Foster's 1797 American epistolary novel, The Coquette (Hoover et al. forthcoming), and, in a discussion of Nineteen Eighty-Four, The Inheritors, and The Picture of Dorian Grey, have suggested revisions to the standard methodology that may improve analyses of parts of texts by a single author (2003).
Stewart uses both PCA and cluster analysis and bases them not only on the frequencies of the 30 most frequent words, but also on the frequencies of various punctuation marks, and on the frequencies of words, sentences, and paragraphs of different lengths. He shows that the chapters of Wieland narrated by Clara, Pleyel, and Theodore are generally quite distinct from the chapter narrated by the villainous Carwin, which clusters with his chapters from Memoirs of Carwin. The analysis reveals an anomaly that provides the impetus for a discussion of more traditional literary concerns: both Pleyel's chapter of Wieland and the final chapter of that novel, which is narrated by Clara, cluster with Memoirs of Carwin, and Carwin's chapter of Wieland. (In an article in a similar spirit, McKenna and Antonia  probe the differences among interior monologue, dialogue, and narrative in Joyce's Ulysses, arguing that multivariate analysis of Gerty McDowell's language can contribute to the interpretation of form, meaning, and ideology in that complex and difficult novel.)
There is no space here to do justice to the subtlety of Stewart's integration of these quantitatively anomalous results into the larger critical debate surrounding the interpretation of the early American novel, but he produces some very suggestive reasons for the similarity of the voices of Pleyel and Carwin, including their long years spent in Europe — quite significant in an early American novel — and the fact that both want to dominate and possess Clara. Stewart also suggests connections between Clara's narration of the final chapter of Wieland, a kind of "happy ending" postscript written from Europe after her marriage to Pleyel, and critical views of Brown as a skeptic about the American experiment. He also makes intriguing suggestions of a connection between Carwin's ventriloquism and the similarity between Clara's voice and his own in the final chapter of Wieland. This study not only uses statistics effectively to provide insight into important questions of interpretation, but also "suggests that traditional critical interpretation has a real bearing on how we understand the meaning of those statistics" (138). There is, of course, always the danger of arguing a specious excuse for a real anomaly after the fact, but that is the nature of interpretation, and such speciousness often leads to its own correction.
I conclude this selection of exemplary articles with "The Englishing of Juvenal" (Burrows 2002b), which contrasts in topic and methodology with the three articles just discussed, but continues their serious engagement with traditional literary concerns. Its focus is on translation and style, with a twist of authorship and chronology, and the method is Delta analysis, described above. (For another very interesting look at translation and authorship attribution, see Rybicki 2006.) To a database of more than half a million words of English Restoration poetry, Burrows adds fifteen translations of Juvenal's tenth satire dating from 1646 to 1967. When Delta is used to attribute the translations to their authors, it is not very successful. D'Urfey is ranked first and Dryden second as the author of Dryden's translation; Johnson is strongly identified as the author of his translation, but Vaughan and Shadwell both rank well down the list of possible authors of theirs. Tests involving other translations give similar spotty results, suggesting that some authors effectively suppress their own styles and others do not; for his other translations, for example, Dryden often appears far down the list of likely authors.
Characteristically, Burrows goes on to a second analysis. This time, rather than asking who is the likeliest author of each translation, this analysis focuses on the authors, asking which of the fifteen Juvenal translations is most like the original work of each translator. In three of the four tests, the results are correct; in the fourth Dryden's comes in a very close second to Higden's as the most similar to the work of Dryden. These impressive results on a very difficult problem show that Delta is capable of capturing subtle authorial markers that persist even when submerged beneath the style of a translation. Another interesting fact is that D'Urfey, who ranks first as author of Dryden's Juvenal X, appears as the most likely author of five of the fifteen translations and as second or third most likely of eight others. Burrows shows that this is not the phenomenon often seen in authorship studies, where the lowest common denominator is, by default, the likeliest author when the true author is not present. The phenomenon is limited to the translations of Juvenal, suggesting that there are real similarities in style between D'Urfey's English and Juvenal's Latin.
Burrows then alters Delta slightly by using the averages of the word frequencies in all of the translations as the test text, treating this average text as a model of Juvenalism, but also retaining the average frequencies as the means against which Delta is calculated. (For other, more extensive alterations to Delta that I have suggested, see Hoover 2004b.) This naturally results in a complete set of zero z-scores for the test text and the mean, but it allows all fifteen translations to be measured against the "model." Shadwell's translation is the most similar to the mean and Johnson's the most different, even more different than the twentieth-century translations and the prose translations. Burrows concludes by using the differences between Johnson and the model to illuminate some of the important characteristics of Johnson's style, noting that Dryden and Johnson lie at opposite ends of a spectrum from versatility to consistency, a spectrum that all students of style would do well to remember. The emphasis on comparison in this article and the telling applications of statistical methods are particularly valuable. The concluding comments about the contrast between close reading and computer analysis emphasize the use of the computer to enhance and extend our ability to "read" literary texts in new ways:
The close reader sees things in a text — single moments and large amorphous movements —to which computer programs give no easy access. The computer, on the other hand, reveals hidden patterns and enables us to marshal hosts of instances too numerous for our unassisted powers. Even in the common case where we do not have fifteen versions of one original to bring into comparison, these principles hold good.(Burrows 2002b: 696)
A Small Demonstration: Zeta and Iota and Twentieth-Century Poetry
Given the rapid developments in this field, a small demonstration of the potential of Burrows's newest measures of textual difference, Zeta and Iota, seems appropriate. I began with some Delta tests on very different data, samples of poetry by forty twentieth-century poets, using large samples by twenty-six poets as the primary set and thirty-nine long poems as the secondary set. Twenty-five of these were by poets in the primary set and fourteen by other poets (these poems by primary authors were removed from their main samples). Delta is very accurate on these texts, correctly identifying the authors of all but three of the long poems by members of the primary set. I then took the few errors that occurred and looked for circumstances where a single author erroneously ranks first as the author of one poem and also ranks among the likeliest authors of another poem by the same poet. Among the most similar of the poets in my study using these criteria are Wallace Stevens, Archibald MacLeish, and T. S. Eliot. Burrows's tests of Waller and Mavell using Zeta and Iota were based on main sets of about 13,000 and 20,000 words, and my set for Eliot is about the same size; the sets for Stevens and MacLeish are much larger, more than 70,000 words. The individual poems to be tested are roughly the same size as those Burrows tested, about 2,000 to 6,000 words.
The results of tests of MacLeish against Stevens using both Zeta and Iota were impressive. The new measures had no difficulty distinguishing the two poets, whether MacLeish or Stevens formed the primary sample. There is space here to discuss only the results of Zeta, which is based on the middle of the word frequency spectrum — words that have largely been ignored in earlier studies. Zeta is even more effective in distinguishing MacLeish and Stevens than it was in distinguishing Waller and Marvell, with the lowest Zeta for the primary author typically twice as large as that for the second author. As Burrows found, the poems by another author, here Eliot, sometimes narrowly outscore some of those of the primary author. Although this may seem disconcerting, it actually suggests that Zeta is narrowly and appropriately tuned to the difference between the authors being tested.
On this set of texts, I found that much more stringent stipulations than those used by Burrows produced some fascinating results: the 26 words that are found in all five sections of MacLeish's sample but do not occur in the Stevens sample seem to be good potential MacLeish authorship markers. Their total frequency is 321 in the five MacLeish sections and 40 in the two individual long MacLeish poems, but only 2 in the Stevens sample and his two long poems combined. Relaxing the restriction to retain words if they appear in more than three, more than two, or more than one section gradually reduces the amount of difference between the poems by MacLeish and Stevens, though all of these analyses are completely accurate. The 40 words remaining based on the same stipulations in the Stevens sample, with a total frequency of 545 in Stevens and 60 in Stevens's two long poems, but only 3 in the MacLeish sample and his two long poems combined, in turn seem to be good potential Stevens authorship markers.
Selecting words that occur in all of the sections of the primary sample seems to violate Burrows's intention of avoiding the 30–150 most frequent words that are so often used in other methods, but the stipulation of a maximum frequency of 3 in Stevens accomplishes this in any case. For example, "answered" is the most frequent of the 26 potential MacLeish markers, but it ranks only 323rd among the most frequent words in the MacLeish samples, and "reality" is the most frequent of the 40 potential Stevens markers, but it ranks only 184th among the most frequent words in the Stevens samples. Both are thus beyond the range normally used in tests of frequent words. The 26 MacLeish words range in rank from 323 to 1,422 and the 40 Stevens words from 184 to 1,378, placing all of them well within the range of words that I normally now include in Delta analyses and placing most of them within the range I normally include in cluster analyses. The presence of powerful discriminators like these may help to explain why expanding the size of the word list that is analyzed so often increases the accuracy of the results.
Finally, a glance at the 26 MacLeish words and the 40 Stevens words suggests that Zeta and Iota may provide useful ways of focusing our attention on interesting words:
Ubiquitous Stevens — Rare MacLeish
reality, except, centre, element, colors, solitude, possible, ideas, hymns, essential, imagined, nothingness, crown, inhuman, motions, regard, sovereign, chaos, genius, glittering, lesser, singular, alike, archaic, luminous, phrases, casual, voluble, universal, autumnal, café, inner, reads, vivid, clearest, deeply, minor, perfection, relation, immaculate
Ubiquitous MacLeish — Rare Stevens
answered, knees, hope, ways, steep, pride, signs, lead, hurt, sea's, sons, vanish, wife, earth's, lifted, they're, swing, valleys, fog, inland, catch, dragging, ragged, rope, strung, bark
Besides the obviously greater length and abstractness of the Stevens words, especially the nouns, the Stevens list is saturated with adjectives, while the MacLeish list has very few adjectives and proportionally more verbs and concrete nouns. A search for some of the most frequent of these marker words in each poet's work yields an interesting pair of short poems: MacLeish's "'Dover Beach' — A Note to that Poem" and Stevens's "From the Packet of Anacharsis." Forms of no less than 7 of MacLeish's 26 marker words appear in his short poem (215 tokens, 123 types), including the 3 italicized in the following brief passage:
… It's a fine and a
Wild smother to vanish in: pulling down —
Tripping with outward ebb the urgent inward.
Speaking alone for myself it's the steep hill and the
Toppling lift of the young men I am toward now…
Forms of 6 of Stevens's 40 marker words appear in his even shorter poem (144 types, 91 tokens), including the 3 italicized in the brief passage below (internal ellipsis present in the original):
And Bloom would see what Puvis did, protest
And speak of the floridest reality…
In the punctual centre of all circles white
Stands truly. The circles nearest to it share
One of the most difficult challenges for quantitative analyses of literature is preventing the huge numbers of items being analyzed from overwhelming our ability to see the results in insightful ways. By reducing the numbers of words to be examined and selecting sets of words that are particularly characteristic of the authors, Zeta and Iota seem likely to prove very useful for literary analysis as well as authorship attribution, whatever the further developments of them may be once they have been tested and refined. (Zeta and Iota sometimes produce anomalous results in tests including many authors; Burrows suggests [personal communication] that they are better reserved for head-to-head comparisons.)
The Impact, Significance, and Future Prospects for Quantitative Analysis in Literary Studies
As has often been noted, quantitative analysis has not had much impact on traditional literary studies. Its practitioners bear some of the responsibility for this lack of impact because all too often quantitative studies fail to address problems of real literary significance, ignore the subject-specific background, or concentrate too heavily on technology or software. The theoretical climate in literary studies over the past few decades is also partly responsible for the lack of impact, as literary theory has led critics to turn their attention away from the text and toward its social, cultural, economic, and political contexts, and to distrust any approach that suggests a scientific or "objective" methodology. There are, however, signs of progress on both these fronts. The recent increased interest in archives within literary criticism will almost necessarily lead to the introduction of quantitative methods to help critics cope with the huge amount of electronic text now becoming available. Some quantitative studies have also begun to appear in mainstream literary journals, a sure sign of their growing acceptance. The increasing frequency of collaborations between literary scholars and practitioners of quantitative methods of many kinds also promises to produce more research that strikes an appropriate balance between good methodology and significant results. Prospects for the emergence of quantitative approaches as a respected, if not central, branch of literary studies seem bright.
Burrows, J. F. (1987). Computation into Criticism. Oxford: Clarendon Press.
Burrows, J. F. (1992a). "Not Unless You Ask Nicely: The Interpretative Nexus between Analysis and Information." LLC 7: 91–109.
Burrows, J. F. (1992b). "Computers and the Study of Literature." In C. S. Butler (Ed.). Computers and Written Texts. Oxford: Blackwell, pp. 167–204.
Burrows, J. F. (2002a). "'Delta': a Measure of Stylistic Difference and a Guide to Likely Authorship." LLC 17: 267–287.
Burrows, J. F. (2002b). "The Englishing of Juvenal: Computational Stylistics and Translated Texts." Style 36: 677–99.
Burrows, J. F. (2006). "All the Way Through: Testing for Authorship in Different Frequency Strata." LLC 22: 27–47.
Clement, R., and D. Sharp (2003). "Ngram and Bayesian Classification of Documents." LLC 18: 423–47.
Craig, H. (1999). "Jonsonian Chronology and the Styles of A Tale of a Tub." In M. Butler (Ed.). Re-Presenting Ben Jonson: Text, History, Performance. Houndmills: Macmillan, pp. 210–32.
Forsyth, R. S., D. Holmes, and E. Tse. (1999). "Cicero, Sigonio, and Burrows: Investigating the Authenticity of the Consolatio." LLC 14: 375–400.
Fortier, P. (2002). "Prototype Effect vs. Rarity Effect in Literary Style." In M. Louwerse and W. van Peer (Eds.). Thematics: Interdisciplinary Studies. Amsterdam: Benjamins, pp. 397–405.
Golding, W. (1955). The Inheritors. London: Faber & Faber.
Holmes, D. (1994). "Authorship Attribution." Computers and the Humanities 28: 87–106.
Holmes, D. (1998). "The Evolution of Stylometry in Humanities Scholarship." LLC 13: 111–17.
Holmes, D. I., M. Robertson, and R. Paez (2001). "Stephen Crane and the New York Tribune: A Case Study in Traditional and Non-Traditional Authorship Attribution." Computers and the Humanities 35.3: 315–31.
Hoover, D. L. (1999). Language and Style in The Inheritors. Lanham, MD: University Press of America.
Hoover, D. L. (2003). "Multivariate Analysis and the Study of Style Variation." LLC 18: 341–60.
Hoover, D. L. (2004a). "Testing Burrows's Delta." LLC19: 453–75.
Hoover, D. L. (2004b). "Delta Prime?" LLC 19: 477–95.
Hoover, D. L., J. Culpeper, B. Louw, and M. Wynne (forthcoming). Approaches to Corpus Stylistics. London: Routledge.
Love, H. (2002). Attributing Authorship: An Introduction. Cambridge: Cambridge University Press.
McKenna, C. W. F., and A. Antonia (2001). "The Statistical Analysis of Style: Reflections on Form, Meaning, and Ideology in the 'Nausicaa' Episode of Ulysses." LLC 16: 353–73.
Rybicki, J. (2006). "Burrowing into Translation: Character Idiolects in Henryk Sienkiewicz's Trilogy and its Two English Translations." LLC 21: 91–103.
Spencer, M., B. Bordalejo, P. Robinson, and C. J. Howe (2003). "How Reliable is a Stemma? An Analysis of Chaucer's Miller's Tale." LLC 18: 407–22.
Stewart, L. (2003). "Charles Brockden Brown: Quantitative Analysis and Literary Interpretation." LLC 18: 129–38.
Waugh, S., A. Adams, and F. Tweedie (2000). "Computational Stylistics using Artificial Neural Networks." LLC 15: 187–98.