Shlomo Argamon is an associate professor of computer science at the Illinois Institute of Technology, where he is the director of the Linguistic Cognition Laboratory. He received his B.Sc. in applied mathematics from Carnegie-Mellon University in 1988, his Ph.D. in computer science from Yale University, where he was a Hertz Foundation Fellow, in 1994, and was a Fulbright Fellow at Bar-Ilan University in Israel from 1994 to 1996. Dr. Argamon's research focuses on the development of computational text analysis techniques, with applications mainly in computational stylistics, authorship attribution, sentiment analysis, and scientometrics.
Charles Cooney works at the ARTFL Project at the University of Chicago, where he also earned a PhD in Comparative Literature. His scholarly work focuses on the relationships between French and American twentieth-century poets and between the two literary cultures.
Russell Horton is a research programmer at The ARTFL Project and the Digital Library Development Center at the University of Chicago, where he received his BA in Linguistics in 2002. He works on machine learning and text analysis software for the humanities.
Mark Olsen is the Assistant Director of the ARTFL Project at the University of Chicago. Mark received his Ph.D. in French history from the University of Ottawa in 1991 and has been involved in digital humanities and computer-aided text analysis since the mid-1980s. His current ambition is to write a biography of the Marquis de Pastoret by candle-light with a quill.
Sterling Stein received his B.Sc. (2003) and M.Sc. (2008) in computer science from the Illinois Institute of Technology, where he was a research assistant in the Linguistic Cognition Laboratory. He is currently a software engineer at Google Inc. in Mountain View, CA.
Robert Voyer recently joined the ranks at Powerset as a computational linguist. Before joining the natural language search world, Robert worked as a research developer for The ARTFL Project at the University of Chicago, where he also earned his MS in Computer Science and BA in Romance Languages.
Authored for DHQ; migrated from original DHQauthor format
Machine learning and text mining offer new models for text analysis in the humanities by
searching for meaningful patterns across many hundreds or thousands of documents. In this study,
we apply comparative text mining to a large database of 20th century Black Drama in an effort to
examine linguistic distinctiveness of gender, race, and nationality. We first run tests on the
plays of American versus non-American playwrights using a variety of learning techniques to
classify these works, identifying those which are incorrectly classified and the features which
distinguish the plays. We achieve a significant degree of performance in this
cross-classification task and find features that may provide interpretative insights. Turning
our attention to the question of gendered writing, we classify plays by male and female authors
as well as the male and female characters depicted in these works. We again achieve significant
results which provide a variety of feature lists clearly distinguishing the lexical choices made
by male and female playwrights. While classification tasks such as these are successful and may
be illuminating, they also raise several critical issues. The most successful classifications
for author and character genders were accomplished by normalizing the data in various ways.
Doing so creates a kind of distance from the text as originally composed, which may limit the
interpretive utility of classification tools. By framing the classification tasks as binary
oppositions (male/female, etc), the possibility arises of stereotypical or lowest common
denominator
results which may gloss over important critical elements, and may also
reflect the experimental design. Text mining opens new avenues of textual and literary research
by looking for patterns in large collections of documents, but should be employed with close
attention to its methodological and critical limitations.
Using algorithms to classify dramatic works of black authors by nationality and gender
The Black stage has been an important locus for exploring the evolution of Black identity and self-representation in North America, Africa, and many African diaspora countries. African-American playwrights have examined many of the most contentious issues in American history since emancipation — such as migration, exploitation, racial relations, racial violence, and civil rights activism — while writers outside of the United States have treated similar themes arising from the history of colonialism, slavery, and apartheid. Alexander Street Press (ASP), in collaboration with the ARTFL Project, has developed a database of over 1,200 plays written from the mid-1800s to the present by more than 180 playwrights from North America, as well as English-speaking Africa, the Caribbean, and other nations. The Black Drama collection is remarkable for the wealth of information provided for each play as well as for the authors, characters, and performance histories of these works. While such extensive metadata permits sophisticated search and analysis of the collection, it also provides an environment that lends itself well to experiments in machine learning and text mining.
Using the Black Drama data, we examine the degree to which machine learning can isolate
stylistic or content characteristics of authors and/or characters having particular attributes —
gender, race, and nationality — and the degree to which pairs of author/character attributes
interact. We attempt to discover if lexical style or content markers could be found which
reliably distinguish plays or speeches broken down by a particular characteristic, such as
gender of character. A positive result would constitute strong evidence for distinctive, in this
example male and female, character voices in the sample of plays. If distinctiveness could be
shown, we then sought some characterization
of the differences found, in terms
of well defined grammatical or semantic classes.
We find that comparative tools doing supervised learning are quite good at classifying plays by American versus non-American authors. Even the outliers, or misclassified plays, demonstrate that these algorithms are able to identify American writers by language usage with remarkable reliability. We are slightly less successful when trying to distinguish the gender of author and/or character. This gender experiment, however, does reveal differences in the ways male and female authors and characters use language.
Equally important to the relative abilities of individual tools to classify texts, our
experiments have alerted us to potential concerns about data mining as a critical or
interpretive endeavor. Comparative classifiers quite powerfully lump together texts and objects
that they find similar
or dissimilar.
However, as we go
forward, we have to try to develop results analysis that does not rely on simple binary
opposition. Framing comparative tasks based on existing, often binary categories, can lead to
results which have a distinctly stereotypical or lowest common denominator
feel. Application of these new technologies in humanistic research requires that that we
understand not only how the tools work, but that we also bring to bear critical evaluation of
the questions we ask, the tasks we ask the tools to perform, and the results obtained.
The Black Drama collection developed by Alexander Street Press has, at the time of this work,
over 1,200 plays by 181 primary authors containing 13.3 million words, written from the middle
of the 19th century to the present, including many previously unpublished works random
or statistically representative sample of Black
writing in the 20th century. Rather, it reflects editorial decisions and includes key works from
a number of American and non-American artistic movements, such as the Harlem Renaissance, Black
Arts Movement, and Township Theatre, as well as the principle works by many critically acclaimed
authors whom the editors consider, not unreasonably, the most important playwrights. The
database contains 963 works by 128 male playwrights (10.8 million words) and 243 pieces by 53
female playwrights (2.5 million words). The most important authors in the collection include
Langston Hughes (49 plays), Ed Bullins (47), OyamO (43), and Willis Richardson (41). Plays by
Americans dominate the collection (831 titles), with the remaining 375 titles representing the
works of African and Caribbean authors. The database contains 317,000 speeches by 8,392 male
characters and 192,000 speeches by 4,162 female characters. There are 336,000 speeches by 7,067
black characters and 55,000 by 1,834 white characters with a smattering of speeches by other
racial groups. As would be expected, the predominance of American authors is reflected in the
nationalities of speakers in the plays. 272,000 speeches are by American characters and 71,000
by speakers from a variety of African nations.
Like other Alexander Street Press data sets, the Black Drama collection is remarkable for its
detailed encoding and amount of metadata associated with authors, titles, acts/scenes,
performances, and characters. Of particular interest for this study are the data available for
authors and characters which are stored as stand-off mark-up
data tables. The
character table, for example, contains some 13,360 records with 30 fields including name(s),
race, age, gender, nationality, ethnicity, occupation, sexual orientation, performers, if a real
person, and type. Even more extensive information is available for authors and titles. The
character data are joined to each character speech, giving 562,000 objects that can be queried
by the full range of character attributes.
The ARTFL search system, PhiloLogic n-word
and its
variants. The 5,116 (3.8/10000 words) occurrences of this slur appear in just under 1 in 100
dialogue shifts in the collection and in almost half of all the plays (515). Its extensive use
by a substantial majority (119/181) of playwrights in this collection suggests that it has had
an important role in the representation of Black experience in the past century.n-word
twice as frequently as female authors (4.2 vs 2.1/10000 words).
Similarly, male characters overall are depicted using this slur almost twice as frequently as
female characters (9.5 vs 5.0/1000 speeches). However, factoring author gender into the equation
changes the rates somewhat. While male playwrights still represent the genders using it at
roughly a 2 to 1, male to female ratio (10.3 vs 5.4/1000 speeches), female authors depict female
characters using it at a rate more closely equal to that of males (4.8 male vs 3.5 female/1000
speeches). This leveling of comparative rate may be an artifact of the moderate preference of
female authors to represent female characters, as just over a third (34.7%) of speeches in plays
by male authors are female characters, while female playwrights allocate slight more than half
(52.1%) of speeches to female characters. Similar gender distinctions are also apparent in
representation of character race. White characters comprise 14% of speeches (10.5% in female
authors). Male authors represent white characters using this racial slur at just under twice the
rate as black characters (15.1 vs 8.6/1000 speeches), female authors represent this distinction
at a 5 to 4 ratio (5.4 vs 4.3/1000 speeches).
While illustrative, such micro-studies
based on standard full-text searches
for specific words or patterns can do little more than hint at larger discursive and
representation issues, such as differences between male and female writing. We believe that the
new generation of machine learning tools and text data mining approaches have the potential to
reveal more general variations in language use because they look for patterns of differential
language use broken down by various combinations of author and character attributes.
Machine learning and text mining techniques are commonly used to detect patterns in large
numbers of documents. These tools are often used to classify documents based on a training
sample and apply the resulting model to unseen data. A spam filter, for example, is trained on
samples of junk mail and real mail, and then used to classify incoming mail based on the
differences it found in the training set. In our application, we already know the
classifications of interest, such as author gender and race, for the entire collection. Thus, we
apply supervised learning techniquesFeatures
in this context means the data being used to perform the machine
learning task. Each instance, typically a document or part of a document, may have an arbitrary
number of features which may include word, lemma or word sequence (n-grams) frequencies as well
as other elements of the data which can be computed, such as sentence length or part-of-speech
frequencies.over-fitted
to the training data, which would limit the
effectiveness of the classifier to properly handle unseen instances. Over-fitting will also
tend to weight relatively unrepresentative features too highly. Cross validation is performed
by subdividing the training data into random groups (often 10), training on some of these
groups and evaluating the predictions on the remainder.viagra
in an MNB spam filter
would be assigned as very high probability of being in a spam e-mail. For an unseen instance,
the system calculates the probabilities for each feature being a member of a particular class,
adding up all of the probabilities to assign one or more classifications to the new instance.
SVMs are somewhat more complex, as they attempt to divide training data into maximally
separated groups by adjusting feature weights.viagra
would be assigned a high feature weight in a model underlying a spam
filter. In a classification task that results in a high success rate, misclassified documents
draw particular interest because these failures of the model often point to literary or
linguistic elements which distinguish the outliers from the mass of correctly classified
documents.
To support text mining experimentation, we have implemented a set of machine learning
extensions to PhiloLogic, our full text analysis system, called PhiloMine.
The diverse Black Drama corpus is a useful collection for examining the stage throughout the
Anglophone black diaspora, allowing specific focus on the impact of colonialism and
independence, and for comparing African-American plays with works from other cultures. The
collection contains 394 plays by American and 303 plays by non-American playwrights written
during the period we are studying, 1950-2006. In this experiment, we tested the degree to which
we could distinguish between American and non-American plays. To this end, we generated an
excluded feature list of words with common spelling differences — such as
color/colour,
center/centre,
etc — that would have had an impact on results. We further
excluded words or names that might appear frequently in a small number of texts, limiting
classification on features present in less than 80% but in more than 10% of the documents. The
resulting feature list numbered approximately 4200 surface forms of words.
For this preliminary experiment, we achieved accuracy rates ranging 85% to 92% depending on
the algorithm selected. Specific classifiers yielded slightly different accuracy rates, partly
because they weight features and function differently. Using the parameters described above with
PhiloMine, the Multinomial Naive Bayesian (MNB) classifier generally had high rates of success
distinguishing American and non-American plays, 88.8 percent correct with 84.4 percent correct
on cross-validation. Other classifiers achieved similar performance rates for this task. The
Weka (Weka3,
Splitting the time period in 1984, we found some indication that American and non-American plays might be slightly less distinguishable over the past twenty years. Table One shows the differences in accuracy rates for the earlier period (236/178 American/non-American) and the later period (158/125).
The efficacy of this classification task is matched, to some degree, by the less than
startling features most strongly associated with the two bodies of texts. Appendix One shows the
top 200 features most predictive of works by American and non-American playwrights. The American
plays are marked by their use of slang, references to some place names, and orthographical
renderings of speech. The state names suggest a Southern, rural backdrop to many plays, but the
terms hallway
and downtown
have a decidedly urban feel. The
top features of non-American authors had very few slang words and comparatively fewer words that
reflect spoken language. Many, in fact, belong to a traditional social sphere or reveal more
formal attitudes toward government, as noted by terms like crown,
palace,
politicians
and corruption.
The features assigned the highest
probabilities in this classification task strike the casual observer as both expected and
reasonable.
While an extended discussion of the features isolated in this experiment is beyond the scope of this paper, a couple of observations are warranted. First, is the number of features that can be selected for a successful classification task is surprisingly small. We selected the top 30 features from the lists for American and non-American playwrights (a standard PhiloMine function):
Based on this tiny feature set, we achieved
similar cross-validated classification performance: MNB 90%; Weka Bayes 85.5%; and Weka SMO
88.8%. While effective on a technical level, few critics would find this list to be a sufficient
way to characterize the differences between American and non-American Black drama. The second
observation is that different classifiers appear to give weights to remarkably different
features. For example, the Weka Naive Bayesian classifier generates a single list of features
that looks very little like those above. One sees no place names, no speech
words, and few terms that stand out on their own as necessarily African, traditional, urban, or
Southern:
The Weka SMO classifier identifies features that are more comparable to the list from the MNB. The 20 most heavily weighted features of each corpus were, for American authors:
and for non-American authors:
While all three of these algorithms produced strong results classifying authors as American and non-American, the different features they rely on raise questions about their usefulness for literary scholars studying text collections.
In the case of these three tests, the MNB classifier's feature sets are the easiest to grasp
immediately. They point to differences in speech patterns between Americans and non-Americans,
different social structures, and different environments. The feature set of the Weka Naive
Bayesian classifier makes very little intuitive sense. Why is it that the word
thinks,
either by itself or in conjunction with other terms, most effectively
distinguishes a play as American or non-American? To a lesser degree, the Weka SMO classifier's
feature set suffers from the same liability. These last classifiers do not give the easy access
into the plays that the MNB does. The user can look at the feature set of that classifier and,
almost immediately, begin thinking about issues like the migration of black Americans from the
rural South to Northern cities. The drawback to the immediacy of this feature set is that it
also has a stereotypical
feel. Plays by non-American black authors, following a
simple-minded reading of the MNB result set, might be thought to take place in villages where
chiefs and elders buy wives with goats and cattle. In this sense, the features tend to obscure
any nuance in either corpus because the algorithm itself tends to be very selective as it
generates a model.
Beyond feature lists, an examination of incorrectly classified documents demonstrates the utility of comparative classification. In most PhiloMine tasks, the system will identify documents that are not correctly classified. As might be expected, some non-American subgroups were classified better than others. Thus, British, Australian, and Canadian authors were more frequently misclassified in this task than African or Caribbean playwrights. Inspecting the outliers was instructive. For example, eight plays by Kia Corthron were consistently misclassified as American. Upon further review, the problem arose not because of the algorithm, but the metadata we provided it. The editors incorrectly identified this American playwright as a Canadian. Because of the lexical features it found, the classifier decided that her plays, in fact, should not be grouped with those of the non-American authors, as the metadata would have had her, but with the American authors. This misclassification was actually a correct classification that alerted us to the error in the metadata. Other outliers also bear further inspection, which may raise further critical questions. The plays of Montserratian Edgar White were correct classified except for one,
rural/jungle West Africa.OyamO's
Comparative text mining on the nationality of Black playwrights in the second half of the 20th century shows clear distinctions between American and non-American authors. The algorithms consistently achieve high levels of performance and generate distinguishing feature lists that are of potential interest in helping to characterize topics, themes, and styles that mark each group. Finally, the outliers or incorrectly classified authors and plays may lead to further critical scrutiny, allowing for variations within the general classification.
It may be objected that our classification task itself is a trivial case, attempting to confirm a distinction that is all too obvious to be of significant literary interest. A task like this, however, provides a good test case for verifying how well classifiers are able to function. We were able to check the accuracy of our results easily through the bibliographic data. In the case of the outlier, Corthron, a web search confirmed that she is, in fact, American. Of course, features that reveal certain linguistic and dramatic differences between the American and non-American plays might not be particularly surprising. Authors tend to place characters in localities and use idiomatic language they themselves know. For example, an American playwright is more likely to set a play in Mississippi and depict characters using a Southern vernacular. However, the features are often able to bring out further, less obvious stylistic and creative choices in operation across text corpora. It is beyond the scope of this paper to address just what these could be. A scholar with a deeper knowledge of this body of works could potentially examine the feature sets and find lexical patterns that point to larger tendencies. In the end, the algorithms can only rely on and propose what they find to be statistically significant. The scholar, as always, must decide the meaning of the results.
Classification tasks based on the gender of authors, characters, and a combination of the two
provides a more challenging test than classification by author nationality. Previous work
suggests that gender classification tasks are somewhat less accurate and that the feature sets
they generate are less obviously distinctive
The Black Drama collection contains 573 (82%) plays by male and 124 (18%) by female
playwrights written between 1950-2006. An initial classification using our default feature set
selection is deceptively accurate, with MNB returning 79.4% cross-validated accuracy and Weka
SMO 85.1% cross-validated. The Weka confusion matrix
indicates that a
significant majority of male authors are correctly classified while performance on female
authors is less than 50%:
malefor every document (82%). Therefore, to determine whether the system was in fact finding a gender difference, we had to balance corpus sizes. PhiloMine supports a standard corpus balancing option that randomly selects documents from the larger sample until it has found a number equal to the smaller class. Since this is a random selection, effectively comparing all of the female plays against a set of male documents, one needs to perform this task numerous times to determine an average accuracy rate. Using the Weka NB and SMO functions five times each, cross-validated results indicate a fair degree of success:
The classification tasks to this point have been looking at entire plays with no attempt to control for expected skewing factors, such as stage directions, which would not be expected to have as strong a gendered writing style; and characterization, where authors depict characters of the other gender. Controlling for obvious skewing influences, particularly as a control following more generic experiments, tends to provide more coherent results, but at the cost of creating composite documents which do not reflect the structure and organization of the plays as individual works.
For this experiment, we rebuilt the Black Drama collection under PhiloLogic/PhiloMine to behave as a database of some 13,000 distinct objects (characters) containing 8.9 million words. We eliminated stage directions, cast lists, and other such apparatus containing some 4.5 million words, while combining individual speeches of each character into one object for faster processing. This changed the unit of analysis from documents to composite character speeches with associated metadata, including author attributes. For the period 1950-2006, there are 4,228 characters by male authors and 865 characters by female playwrights with more than 200 words. For the same period, there are 3,226 male characters and 1,742 female characters.
Classifying all the characters by author gender using the PhiloMine Bayesian function resulted in 75.5% cross-validated accuracy. This general result was confirmed on 5 runs using the random document balancing function (865 characters), with cross-validated accuracy rates of 72.8%, 72.1%, 71.0%, 72.2%, and 71.6%. Deliberately shuffling the character instances of randomly balanced characters returned results approximating the expected 50% accuracy: 47.6%, 51.2%, and 50.8%. Similar results were obtained for classifications by character gender. Overall, for all of the characters, MNB accurately classified 73.2% of the instances. Five runs of randomly balanced characters (1,742) resulted in accuracy rates of 72.5%, 71.1%, 72.3%, 72.2%, and 71.4%. Random falsification tests again approximate the expected 50% accuracy: 48.4%, 48.1%, and 50.4%. Classification of gender or author and character on composite character objects showed modest improvement in accuracy. Further, we found that classifying on more major characters (total words greater than 1,000 and 2,000) again resulted in modest increases in accuracy.
This experiment suggests that, as we balance our authors and characters more rigorously,
essentially testing on a more and more abstract representation of the texts, our success rates
improve. We first extracted all speeches with character gender attributes from the corpus,
splitting them into tokenized word frequency vectors for all authors, all characters, male
authors, female authors, male characters, and female characters. For each of these, we used
SVM-Light
We then equalized a test sample for class by discarding instances in the majority classes until we had a balanced set. As part of this process, we further corrected for number of words in each character class by selecting character instances to balance the word frequencies overall. As shown in Table Four, this test sample produced a balanced dataset by number of instances and number of average words spoken by each character.
Machine learning systems are clearly able to identify gender of playwrights and their characters with impressive accuracy. We have found that on raw running texts, the systems can reliably identify author gender at rates between the high 60s and mid-70s percent accuracy. And, as the plays are processed in various ways to eliminate potential skewing factors — such as unbalanced subsets, extraneous textual data (e.g. stage directions), and differences in raw word counts — classification performance increases. This increase in performance comes, however, at the cost of increasing the distance from the text themselves.
Given the ability to classify author and character genders, we will now return to the texts
briefly to examine the features most characteristic of gendered writing in Black Drama. Appendix
Two shows the top 200 features as measure by Bayesian probability ratios, broken down by male
and female playwrights without respect to character gender. The features suggest a rather
traditional set of gender distinctions. Male authors tend to focus on legal/criminal issues
(officer, gang, pistol, jail, etc.); numerous obscenities and slurs (bullshit, nigger(s),
goddamn, fuck, shit); music (band, drum, leader, drums, spiritual, player, jazz); and money
(dollars, price, cents, cash, etc.). Female playwrights of this period tend to privilege issues
concerning family/home (child, stories, hug, mama, girls, birth); emotive states (smiling,
imagine, memories, memory, happiness, happy); descriptions (handsome, lovely, grace, cute,
courage, loving, ugly); and women (herself, girls, she, female, lady, women, her). The
representation of traditional gender roles are most notable in the characterization of
non-American male authors. As shown in Appendix Three, male characters are given very clear
public roles, with the top eight words by probability ratios being chief, order,
government, lead, power, position, country, land
while the female characters are
limited to the private realm (husband, dress, shame, marry, doctor, married, please,
parents
).
Gender characterization among American authors writing between 1950-2006 (Appendices 4 and 5)
provides evidence that men and women depict characters slightly differently. The feature sets
are rather similar: both contain numbers (men generally do counting) and share terms as varied
as sir, american, power, bitch, country, and killed. But the male list is noticeably coarser,
with more profanity, the term nigger(s),
and more references to law enforcement
and violence. With only a few exceptions, the female character feature lists have basically the
same domestic tenor. In contrast to male characters, female characters apparently use very
little profanity and seem to be much less involved in struggles with public authorities. Of
course, these lists only reveal generalities. The features are differential frequencies. There
might in fact be foul-mouthed female characters in the corpus who are active in the public
sphere. But those characters would probably be exceptions. The degree to which these lists
reveal true differences among black American male and female authors is a matter for discussion.
The important thing is that the mining algorithm gives fuel to the discussion and serves as a
starting point for closer textual study.
This same character gender classification test on non-American authors yields feature sets
suggesting even more disparate depictions of the sexes than among American authors. Appendices 6
and 7 show that, for authors of both sexes, male characters inhabit the public sphere, their
discourse deals with leadership, and they are more likely to use grammatically precise terms
like which
and whom.
Female characters' language, again,
primarily centers on domestic concerns. Distinctions between male and female authors' use of
language in the depiction of their characters are few. One of the striking differences, however,
is that only the male characters written by female authors in this data set use scatalogical
language at a significant rate. And comparing the results from the American and non-American
tests highlights the different concerns for these characters who inhabit different cultures.
Classification of texts by gender of playwrights, characters, and the combination of the two
is a more difficult test than by nationality. Results ranging from high-60 percent accuracy for
plays as they are found to the mid-80s in carefully constructed samples extracted from the
texts. It is also clear that the features used to construct classifier models might be of
interest to literary researchers in that they identify themes and language use that characterize
gender distinctions. Of course, men talk more of wives than women and only women tend to call
other women hussies,
so it is hardly surprising that male and female
authors/characters speak of different things in somewhat different ways. The features suggest,
however, that we are finding lowest common denominators
which distinguish male
from female, but which may also privilege particular stereotypes. The unhappy relationship of
Black American men with the criminal justice system or the importance of family matters to women
are both certainly themes raised in these plays. The experimental design itself, using
classifiers to detect patterns of word usage which most distinguish the genders, may bring to
the forefront literary and linguistic elements which play a relatively minor role in the texts
themselves.
We have found that, although algorithms can in fact detect differences in lexical usage to a striking degree and output feature sets that characterize differences between corpora of data, human scholars must still do the work of scrutinizing results and, more importantly, decide how best to develop these tools for humanities research. Fundamentally, automatic classifiers deal with general features and common traits. In contrast, in recent decades, literary criticism has focused on the peripheries, often looking at the ways understudied works by authors from underrepresented groups work within a larger cultural context. Literary critics have tried to nuance general understanding about what the mainstream is and how it works. As we mentioned above, a danger in framing comparative tasks based on binary oppositions is that doing so can produce simplistic or stereotypical results. Furthermore, given the power of classifiers, we might always be able to prove or detect some binary opposition between two groups of texts. And so the task before us, if we want to develop tools to aid literary criticism, is to try in some way to respond to the values driving scholarship currently and, as those values change, continue to take them into account. We must also keep in mind that measures of success and procedure are often different for computer scientists and literary critics. For example, using only 60 total feature terms, 30 per corpus, we can classify the Black Drama texts as American or non-American with approximately 90% accuracy. Distinguishing differences on such a small number of words is an impressive technical feat, to be sure. But to a literary scholar, such a circumscribed way of thinking about creative work may not be terribly fruitful. As we go forward, we will have to try to bridge gaps such as these. Our success in this endeavor will, of course, depend upon close collaboration between those building the data mining tools and those who will finally use them.