Ted Underwood is Professor of English and Information Sciences at the University of Illinois, Urbana-Champaign. He is the author of two books about literary history, including most recently
This is the source
It has recently become common to describe all empirical approaches to literature as subfields of digital humanities. This essay argues that distant reading has a largely distinct genealogy stretching back many decades before the advent of the internet – a genealogy that is not for the most part centrally concerned with computers. It would be better to understand this field as a conversation between literary studies and social science, inititated by scholars like Raymond Williams and Janice Radway, and moving slowly toward an explicitly experimental method. Candor about the social-scientific dimension of distant reading is needed now, in order to refocus a research agenda that can drift into diffuse exploration of digital tools. Clarity on this topic might also reduce miscommunication between distant readers and digital humanists.
Distinguishing the geneologies of distant reading and digital humanities
Over the last decade or so, it has become common to describe all empirical approaches to literary history as subfields of digital humanities. At first, I didn’t take this conflation seriously; I thought it was journalistic shorthand for a history that scholars understood to be more complex. Writing in
currently proliferating under the broad rubric of ‘digital humanities’
distant readingand
digital humanitieshad been coined ten years earlier, in different academic communities, to describe different kinds of research. Digital technology hadn’t even played a central role in early examples of distant reading. But why quibble? No one expects a short newspaper article to give a full history of academic trends.
More recently, however, I have noticed that scholars themselves are beginning to
narrate intellectual history in the same way: treating all quantitative or empirical
approaches to literary history as aspects of a digital turn in the discipline. In
Amy Earhart’s genealogy of [T]he digital work I have traced in
the first part of this book has been largely representational, with technology
primarily used to create idealized or better versions than would be possible in
print. Current trends in digital literary studies, and the larger digital
humanities, appear to be moving away from representational concerns and toward
interpretive functions as contemporary digital scholars, such as Stephen Ramsay,
Franco Moretti, Matthew Jockers, Geoffrey Rockwell, and others, are using
technology to devolve, manipulate, and reform the literary text.
It may be correct to say that interpretive questions are a relatively late
development in
There is nothing wrong with writing a history of food in America, and also nothing
wrong with Earhart’s decision to focus on a particular critical tradition initiated
by the advent of the web. As long as readers remember that many ingredients of this
history have longer backstories elsewhere, no one will be misled. But of course,
backstories do get forgotten with the passage of time, and new generations learn to
associate pizza mainly with Chicago or New York. Today it is common to see distant
reading and the sociology of literature folded into discussions of Big Data research in digital
humanities
This essay will turn the calendar back to the middle of the twentieth century, in
order to tease apart intellectual traditions that have begun to be conflated. In
particular, I want to emphasize that distant reading is not a new trend, defined by
digital technology or by contemporary obsession with the word
Integrating experimental inquiry in the humanities poses rhetorical and social challenges that are quite distinct from the challenges of integrating digital media. It seems desirable — even likely — that distant readers and digital humanists will coexist productively. But that compatibility cannot be taken for granted, as if the two projects were self-evidently versions of the same thing. They are not, and the institutional forms of their coexistence still need to be negotiated.
Distant reading
Large-scale literary history is far from a new idea. Vernacular literary study entered nineteenth-century universities as an already-ambitious project that sought to trace the parallel development of literature, language, and society across a thousand years. It was only in the twentieth century that literary scholarship began to restrict itself paradigmatically to the close reading of single texts. If we take a long view of disciplinary history, recent research on large digital libraries is just one expression of a much broader trend, beginning around the middle of the twentieth century, that has tended to reinstate the original historical ambitions of literary scholarship.
But that would be a very long view: it doesn’t do much to help us understand
current scholarly debate. For that, we need a tighter frame — a frame that can
characterize the goals that have energized empirical approaches to literary
history over the last half-century or so, without reducing them to an expression
of twenty-first-century technology. This essay will provide an account on that
intermediate scale. The frame I have chosen to use is the phrase sociology of literature
Cultural analytics
could be an equally valid choice, if we wanted
to include disciplines other than literary studies. In short, like most
historical phenomena, the trend I am describing is composed of multiple
overlapping impulses. There is more than one right way to describe it.
I have chosen mining
or analysis.
On the other hand, it does have one
significant disadvantage: it is often understood to imply a recent origin story
that would prevent us from crediting any work done in the previous century. I
will need to complicate that story in the pages that follow. It is true that
Franco Moretti coined
Moretti’s turn-of-the-century works were important, not because they invented the
idea of macroscopic literary inquiry, but because they galvanized an existing
project by infusing it with a new sense of possibility and a new polemical
rationale. I will have more to say about his contribution, but this essay will
mostly take aim at a larger target — a critical tradition, emerging in the later
twentieth century, that would include things originally called
This premise is general enough to have cropped up many times, so the tradition I am describing will lack crisp boundaries. Many traditional works of literary history pause at the outset to construct an informal sample of, say, Gothic novels. To the extent that those studies separate the construction of the sample from the process of historical inference, I would say they are approximating distant reading. Since versions of this approach to literature can be traced back to the nineteenth century, it would be pointless to go looking for a moment of origin. The emergence of distant reading was not contained in any eureka moment when a literary scholar decided to try social-scientific methods. It emerged rather through a long sequence of attempts, which gradually transformed casual historiographic practices into an explicitly experimental method.
A longer study might follow this story down many different paths. Marxist literary theory has been one crucial influence; Raymond Williams might deserve a chapter of his own. The books he wrote around 1960 laid a theoretical foundation that still underpins much contemporary research — for instance, by insisting that literary culture is never a unified object, but rather a palimpsest of emergent and residual formations, transformed retrospectively by processes of selection. After reading Williams, it becomes hard to imagine that there could ever be a single definition of literary exemplarity, or a single correct sample of the literary past. In
nobody really knows the nineteenth-century novel; nobody has read, or could have read, all its examples, over the whole range from printed volumes to penny serials
A full account of the emergence of distant reading might also spend a chapter on
book history. Book historians have been compelled to explicitly define samples,
since libraries don’t cover the full range of practices they study. Book
historians have also pushed literary history to define its object of study more
concretely — separating processes of production, for instance, from circulation
and reading practices. But these parts of the story are already well known
This book became a monument of feminist scholarship by challenging the widespread
premise that popular literature simply transmitted ideology. In Radway’s view,
critics had too quickly extrapolated their own interpretive practices to other
readers. A critic may pick up a popular romance, for instance, identify the
gender norms that seem implicit in the plot, and conclude that the effect of the
book is to reinforce those norms. But how much does this tell us about the
actual experience of romance readers? What aspects of the stories do they value?
What role do the books play in their lives? Studying a community of women linked
by a particular bookstore, Radway concluded that readers have more control over
the meaning of stories than critics assume. Romances seemed to function in
practice as a declaration of
independence
from the pressure of these readers’ responsibilities as
wives and mothers, even when the gender roles represented in the narrative were
traditional. Many subsequent arguments about the active agency of reception in
fan culture are indebted to Radway’s conclusions.
Literary scholars have been much slower to imitate her methods, which depended on questionnaires, interviews, and numbers.
Radway’s quantitative methods may at first seem remote from familiar examples of distant reading. She doesn’t discuss algorithms. Instead she uses numbers simply to count and compare — in order to ask, for instance, which elements of a romance novel are most valued by readers. Recent examples of distant reading can grow more complex than this. But they can also remain just as simple. Franco Moretti has relied on bibliographies to measure the lifespans of genres; I have quizzed readers about their impressions of elapsed time in ninety novels.
Admittedly, contemporary distant reading is usually based on textual evidence, or
on social evidence about dead people, rather than questionnaires. Distant
readers are certainly concerned with reception
empirical researchwhich aims to
test the validity of … a hypothesis
Radway’s doctorate was in American Studies; she currently teaches in a department of Communication Studies. But other social-scientific traditions are also hovering in the background of
binary oppositionsorganizing the heroine, the female foil, the hero, and the male foil into a symmetrical structure
Linguistics was not particularly central to Radway’s project, and it may be
worth pausing for a moment to underline this point. Contemporary distant reading
has also been shaped by a different intellectual tradition devoted to
quantitative analysis of linguistic detail. That tradition has made vital
contributions, which I want to acknowledge. But I think linguistics may be
looming a little too large in the foreground of contemporary narratives about
distant reading, so much that it blocks our view of other things. Linguistic
categories are just as important as the social categories Radway explored; it’s
not that I want to champion one subject against the other. Rather, I think we
need to see both influences at once in order to grasp the generality of the
method that organizes this research agenda. Our knowledge about large-scale
literary history isn’t expanding because there was a special magic in linguistic
analysis (or a special moral authority in feminist sociology). The project is
succeeding, rather, because scholars have learned how to test broad
literary-historical hypotheses in a way that resists confirmation bias.
Otherwise it would be very difficult to make progress at this scale. If you’re
working in a domain where you could potentially cite 100,000 different novels as
evidence, confirmation bias will make all generalizations equally true until you
invent some procedure to limit your own freedom of selection. As psychologists
have expressed this: fields with abundant evidence need some way to limit researcher degrees of
freedom
Although Radway’s book was widely celebrated and widely cited in English
departments throughout the 1990s, it was not widely imitated there. As James F.
English has pointed out, literary scholars are traditionally quick to borrow
social scientists’ conclusions, but slow to borrow their methods
cluesin detective fiction
This method is very close to Radway’s approach to romance novels: from the sample of twenty texts, to the plan of reading systematically for particular features, to the little plusses and minuses that represent polarity in the diagram. I don’t mean to suggest that Moretti was specifically influenced by
To experiment on the past admittedly stretches the definition of
to protect the hypothesis from
misleading confirmations
scientific,
I don’t mean to imply that we must suddenly adopt all the
mores of chemists, or even psychologists. Imaginative literature matters because
readers enjoy it; criticism would gain nothing if we let meticulous
hypothesis-testing drain all the warmth and flexibility from our writing.
Literary historians who use numbers will have to somehow combine rigor with
simplicity, and prune back a thicket of fiddly details that would be fatal to
our reason for caring about the subject. But within those rhetorical limits,
distant reading can, let us say,
Of course, not everyone will agree with this definition. For many scholars, the
term great unread
Moretti’s insistence on reconstructing a maximally complete archive is also the
part of distant reading that scholars have spent most time debating. Many
critics have pointed out that it is impossible to recover everything an argument about what constitutes
an historically relevant and justifiable sample for analysis
I have been at pains to downplay several aspects of Moretti’s contribution to distant reading that are often seen as definitional: his coinage of the phrase itself, and his emphasis on comprehensive samples that include many non-canonical works. However, I do think Moretti is rightly credited with sparking the twenty-first-century expansion of this research project. To illustrate why, I can’t do better than quote the last paragraph of
Fantastic opportunity, this uncharted expanse of literature, with room for the most varied approaches, and for a trulycollective effort, like literary history has never seen. Great chance, great challenge … which calls for a maximum of methodological boldness: since no one knows what knowledge will mean in literary studies ten years from now, our best chance lies in the radical diversity of intellectual positions, and in their completely candid, outspoken competition. Anarchy. Not diplomacy, not compromises, not winks at every powerful academic lobby, not taboos. Anarchy.
Two contributions are vital here. First, the recognition that literary history is
not an exhausted, well-mapped field, but an uncharted expanse,
because we actually know little about its
macroscopic shape. When I say that Moretti galvanized distant reading by
infusing it with a new sense of possibility, this is the primary thing I mean.
But I would also emphasize, secondly, his inference that the diplomatic
reconciliation of conflicting normative claims is less urgent than many literary
scholars assume.
Here we reach a zone of persistent miscommunication between distant readers and
their colleagues. The discipline of literary studies has long organized itself
around prescriptive debates that seek to define the proper concern of a literary
critic. We inherit this polemical emphasis from nineteenth-century criticism,
and it survives today in vigorous arguments that pit history against form,
surface against depth, and critique against appreciation. Scholars rooted in
this tradition understandably want to interpret distant reading as a normative
stance of the same kind. Perhaps distant readers are expressing a principled
opposition to, say, close reading? In that case, the natural next move would be
to dialectically sublate the tension between close and distant. Observers are
often quite willing to offer this sort of compromise solution none of us
really know what’s in there yet.
A confession of ignorance isn’t
something one can meaningfully strike compromises about; it calls for a
different genre of response. Instead of interpreting distant reading as a
normative argument about the discipline, it would be better to judge it simply
by asking whether the blind spot it identified is turning out to contain
anything interesting.
I am of course a biased observer. But personally, I became confident that new
scales of inquiry were paying off in 2012, when Ryan Heuser and Long Le-Khac
published evidence of a massive, steady shift from abstraction to concrete
description in nineteenth-century novels
Up to this point, I have said relatively little about numbers, and nothing at all about computers. I have characterized distant reading as a tradition continuous with earlier forms of macroscopic literary history, distinguished only by an increasingly experimental method, organized by samples and hypotheses that get defined before conclusions are drawn. The interdisciplinary connections that mattered most for this tradition were, until recently, located in the social rather than computational sciences.
However, it is true that this sociological approach to literature has, over the last twenty-five years, fused with a computational tradition. The history of that fusion is complex, and I won’t try to detail it fully here; one could point to Mark Olsen and the ARTFL project at Chicago, or to Matthew Jockers and the Stanford Literary Lab, or to John Unsworth and an archipelago of people involved with the MONK Project. In any case, it is clear that large-scale literary history is now suffused with ideas drawn from corpus linguistics, information retrieval, and machine learning. I don’t intend to downplay the significance of this fusion; it has been the most exciting part of my career, and I’m indebted to everyone I just mentioned.
Nor do I want to suggest that computation was merely a means to achieve an end
that Radway and Moretti had already fully defined. Critics of digital humanities
often assume that computer science ought to remain merely instrumental for
humanists; it should never challenge
our fundamental standards or
procedures
In short, I am not at all motivated to shore up disciplinary boundaries or insist on a strictly internalist history of literary studies. And yet I have to admit that, for me, distant reading remains the name of an approach to literary history rather than a computational method. To be sure, it has multiple genealogies, and roots in many disciplines. But in tracing connections to the past I would still, on the whole, emphasize the thread that runs back through Moretti, Radway, and Williams. My rationale is simple. An approach to literature informed by social science can produce significant historical results by itself — with or without computers. But the converse has not generally turned out to be true. Computational methods, by themselves and without a social scale of inquiry, have not been enough to transform literary history.
We know this, to be quite blunt, because computational methods were applied to literature for thirty years without making a great impact on the discipline. The journal
Computer-aided literature studies have failed to have a significant impact on the field as a whole
how a text achieves its literary effectby examining
subtle semantic or grammatical structures in single texts or the works of individual authors.Computers had turned out to be
very poorly suitedto those New Critical questions, and concentration on them had
tended to discourage researchers from using the tool to ask questions to which it is better adapted, the examination of large amounts of simple linguistic features
This was the article that originally pulled me toward distant reading in the
mid-1990s
Moreover, Olsen’s remarks are still a useful warning for scholars working in the
area of overlap between digital humanities and distant reading. Algorithms are
genuinely important; they aren’t merely instrumental. But they also aren’t
sufficient for this project. So far, computation has only made a difference for
literary history in combination with reasonably broad samples aimed at
historical questions. A broad sample does not have to be an exhaustive
collection; it might only amount to a few dozen books. But framing questions
about dozens of books still tends to require a complete rethinking of received
research questions. So I understand why scholars are often tempted to start with
the algorithms instead, hoping that they will produce something interesting when
applied to familiar author-sized questions. Unfortunately, in my experience,
this is false economy. Olsen’s warning has not been superseded by any technical
advance: computers still can’t teach us much about New Criticism. (Maybe
someday, but not quite yet.) Within the sprawling ecumenical community called
Writing in
I’m as guilty as anyone of striking this casual pose. It is often unavoidable. I
have suggested that distant readers aspire to a version of the scientific method
appropriate for a historical discipline. But we are also literary critics, and
critics have an obligation to be interesting. This means that we sometimes have
to tuck methods in an appendix, or make the analytical task look a bit easier
than it truly was.
Unfortunately, social-scientific methodology has not been a central subject of
conversation in digital humanities, or in the forms of distant reading that
cluster under the DH rubric reading
itself contributes to the elision of social
science
That’s why I have written this article — to tease out the elided social-scientific genealogy behind distant reading. There are other threads one could trace. For instance, as I have acknowledged, machine learning is exerting a powerful influence on the contemporary scene. I don’t want to disparage any subfield, but I do want to insist that the genealogy of distant reading should be traced by disentangling its central intellectual impulses, not just by following the zone of overlap between computers and textual study as far back as possible. Roberto Busa’s concordance of Aquinas was a valuable thing, but a concordance of a single author does not constitute an important origin moment for distant reading. If we wanted to trace this tradition back to the middle of the twentieth century, we would need to follow different threads in several different directions. We might end up asking what Raymond Williams was doing with literature in the late 1950s, what Claude Lévi-Strauss was doing at the same time with social anthropology, and what Frank Rosenblatt was doing with the perceptron.
In the twenty-first century, admittedly, these disciplinary stories are tending
to converge and fuse. That creates an exciting challenge, but also a problem for
graduate training. Scholars preparing to work as distant readers probably need
some exposure to programming, social theory, and statistics, as well as fairly
deep knowledge of a literary-historical tradition. Right now, the flexible
interdisciplinary community called
But if these two projects are to coexist under one roof, the differences between
them need candid discussion. Digital humanists don’t necessarily share distant
readers’ admiration for social science. On the contrary, they are often
concerned to defend a boundary between quantitative social science and humane
reflection (see, e.g.,
This article has tried to clarify the commitments that define distant reading. I
have not aimed to produce consensus: I know that many scholars cited here will
disagree with my definition of the field. In particular, I know that many
scholars maintain strong ties to both digital humanities and distant reading,
and I expect people who do both things will resist the conclusion that these are
intellectually distinct projects. Certainly, the projects are at present fused,
in ways that matter deeply to academics as human beings. For instance, job
advertisements usually call for a digital humanist
— almost never for a
distant reader.
So it is pragmatically unwise for junior scholars to
separate the two terms, and a purely descriptive account of the contemporary
social scene might well fold them together. This essay has separated digital
tools from experimental methods for reasons that are not purely pragmatic or
descriptive. I have tried to ground the separation in a genealogical narrative,
but I would also admit that it has a forward-looking prescriptive purpose.
Over the last fifteen years, as distant readers have seized technological
opportunities, the goals of the project have become diffuse. Often our immediate
goal has really been exploratory: let’s see what can be done with these
tools.
The exploration has been fruitful, but I think the field is ready
to move past exploration. Large-scale literary history could now reorganize
itself around clear research questions and rigorously advance our knowledge of
the past. But in order to do that, I believe we need to set fascination with
technology to one side and rediscover the guiding principle of experiment. I
have defended that opinion by pointing to the history of the field, and
especially to the importance of social science in Williams, Radway, and Moretti.
But it is also, in the end, an opinion. This essay promises only
This essay benefited greatly from conversation at the Instant History symposium at Loyola University Chicago, organized by Paul Eggert and Steven Jones in the fall of 2016. Respondents at the symposium included Ian Cornelius, Lydia Craig, Casey Jergenson, and Justin Hastings. The argument was also influenced by conversation with Andrew Goldstone and Eleanor Courtemanche, and it was improved by the editors and reviewers of