Spring 2009
Special Cluster: Done
Editor: Matthew G. Kirschenbaum
Large-Scale Humanities Computing Projects: Snakes Eating Tails, or Every End is a New
Beginning?
William A. Kretzschmar, Jr., University of Georgia
The word finish can mean two things that have quite different implications for large-scale humanities computing projects: "to bring to completion; to make or perform completely; to complete" and also "to perfect finally or in detail; to put the final and completing touches to (a thing)." The word finish is just not part of the deal for the Linguistic Atlas Project in either sense. However, granting agencies must ask what do you want money for this time? and, from this viewpoint, the Atlas Project consists of a series of particular tasks or experiments, each one of which is capable of being finished in both senses of the word. This paper discusses the reality of funding, deadlines, and deliverables, as they relate to the sequence of tasks that make up the larger Atlas Project. There are no once-and-done, permanent solutions. The largest humanities computing projects require continuing care and maintenance, and the best way forward is to create some sort of stable institutional setting for large projects that will provide continuity and baseline resources for the work.
It’s For Sale, So It Must Be Finished: Digital Projects in the
Scholarly Publishing World
David Sewell, University of Virginia Press
Since the early 1990s, theorizing in the digital humanities has often celebrated
open-endedness and incompletion as inherent qualities of digital work. But a scholarly publisher
undertaking preparation and sale of digital objects cannot altogether dispense with traditional
notions of deadlines and completion if those publications are to enter the dual marketplaces of
peer review and institutional purchase. The Electronic Imprint of the University of Virginia
Press was funded in 2001 with the goal of bringing born-digital scholarly projects under the
aegis of the same review and marketing system that applies to books. In this article I describe
how we defined the criteria for done-ness in creating two very different projects, a
born-digital edition of Herman Melville’s Typee manuscript and a conversion of
the letterpress Papers of George Washington into a digital edition. Our
experience suggests that it is possible to categorize different genres of digital creations
based on the extent to which intrinsic criteria for “done-ness” can be applied to them, and that
decisions about completeness are always subject to extrinsic factors as well, such as budgetary
constraints and the pressures created by competition and the evolution of standards.
Published Yet Never Done: The Tension Between Projection and
Completion in Digital Humanities Research
Susan Brown, University of Guelph; Patricia Clements, University of Alberta; Isobel Grundy, University of Alberta; Stan Ruecker, University of Alberta; Jeffery Antoniuk, University of Alberta; Sharon Balazs, University of Alberta
The case of the Orlando Project offers a useful interrogation of concepts like completion and finality, as they emerge in the arena of electronic publication. The idea of doneness circulates
discursively within a complex and evolving scholarly ecology where new
modes of digital publication are changing our conceptions of textuality,
at the same time that models of publication, funding, and archiving are
rapidly changing. Within this ecology, it is instrumental and indeed
valuable
to consider particular tasks and stages done, even as the capacities of
digital media push against a sense of finality. However, careful
interrogation of aims and ends is required to think through the relation
of a digital project to completion, whether modular, provisional, or of
the project as a whole.
Special Cluster: Data Mining
Editor: Mark Olsen
Vive la Différence! Text Mining Gender Difference in French Literature
Shlomo Argamon, Linguistic Cognition Lab, Dept. of Computer Science, Illinois Institute of Technology; Jean-Baptiste Goulain, Linguistic Cognition Lab, Dept. of Computer Science, Illinois Institute of Technology; Russell Horton, Digital Library Development Center, University of Chicago; Mark Olsen, ARTFL Project, University of Chicago
In this study, a corpus of 300 male-authored and 300 female-authored French literary and historical texts is classified for author gender using the Support Vector Machine (SVM) implementation SVMLight, achieving up to 90% classification accuracy. The sets of words that were most useful in distinguishing male and female writing are extracted from the support vectors. The results reinforce previous findings from statistical analyses of the same corpus, and exhibit remarkable cross-linguistic parallels with the results garnered from SVM models trained in gender classification on selections from the British National Corpus. It is found that female authors use personal pronouns and negative polarity items at a much higher rate than their male counterparts, and male authors demonstrate a strong preference for determiners and numerical quantifiers. Among the words that characterize male or female writing consistently over the time period spanned by the corpus, a number of cohesive semantic groups are identified. Male authors, for example, use religious terminology rooted in the church, while female authors use secular language to discuss spirituality. Such differences would take an enormous human effort to discover by a close reading of such a large corpus, but once identified through text mining, they frame intriguing questions which scholars may address using traditional critical analysis methods.
Gender, Race, and Nationality in Black Drama, 1950-2006: Mining Differences in Language Use
in Authors and their Characters
Shlomo Argamon, Linguistic Cognition Lab, Dept. of Computer Science, Illinois Institute of Technology, Chicago; Charles Cooney, ARTFL Project, University of Chicago; Russell Horton, Digital Library Development Center, University of Chicago; Mark Olsen, ARTFL Project, University of Chicago; Sterling Stein, Linguistic Cognition Lab, Dept. of Computer Science, Illinois Institute of Technology, Chicago; Robert Voyer, Powerset
Machine learning and text mining offer new models for text analysis in the humanities by
searching for meaningful patterns across many hundreds or thousands of documents. In this study,
we apply comparative text mining to a large database of 20th century Black Drama in an effort to
examine linguistic distinctiveness of gender, race, and nationality. We first run tests on the
plays of American versus non-American playwrights using a variety of learning techniques to
classify these works, identifying those which are incorrectly classified and the features which
distinguish the plays. We achieve a significant degree of performance in this
cross-classification task and find features that may provide interpretative insights. Turning
our attention to the question of gendered writing, we classify plays by male and female authors
as well as the male and female characters depicted in these works. We again achieve significant
results which provide a variety of feature lists clearly distinguishing the lexical choices made
by male and female playwrights. While classification tasks such as these are successful and may
be illuminating, they also raise several critical issues. The most successful classifications
for author and character genders were accomplished by normalizing the data in various ways.
Doing so creates a kind of distance from the text as originally composed, which may limit the
interpretive utility of classification tools. By framing the classification tasks as binary
oppositions (male/female, etc), the possibility arises of stereotypical or lowest common
denominator results which may gloss over important critical elements, and may also
reflect the experimental design. Text mining opens new avenues of textual and literary research
by looking for patterns in large collections of documents, but should be employed with close
attention to its methodological and critical limitations.
Mining Eighteenth Century Ontologies:
Machine Learning and Knowledge Classification in the Encyclopédie
Russell Horton, Digital Library Development Center, University of Chicago; Robert Morrissey, University of Chicago; Mark Olsen, ARTFL Project, University of Chicago; Glenn Roe, ARTFL Project, University of Chicago; Robert Voyer, Powerset
The Encyclopédie of Denis Diderot and Jean le Rond d'Alembert was one of the most important and revolutionary intellectual products of the French Enlightenment. Mobilizing many of the great – and the not-so-great –
philosophes
of the 18th century, the Encyclopédie was a massive reference work for the arts and sciences, which sought to organize and transmit the totality of human knowledge while at the same time serving as a vehicle for critical thinking. In its digital form, it is a highly structured corpus; some 55,000 of its 77,000 articles were labeled with classes of knowledge by the editors making it a perfect sandbox for experiments with supervised learning algorithms. In this study, we train a Naive Bayesian classifier on the labeled articles and use this model to determine class membership for the remaining articles. This model is then used to make binary comparisons between labeled texts from different classes in an effort to extract the most important features in terms of class distinction. Re-applying the model onto the original classified articles leads us to question our previous assumptions about the consistency and coherency of the ontology developed by the Encyclopedists. Finally, by applying this model to another corpus from 18th century France, the Journal de Trévoux, or Mémoires pour l'Histoire des Sciences & des Beaux-Arts, new light is shed on the domain of Literature as it was understood and defined by 18th century writers.
Articles
Communitizing Electronic Literature
Scott Rettberg, The University of Bergen Dept. of Literary, Linguistic, and Aesthetic Studies
Electronic literature is an important evolving field of artistic
practice and literary study. It is a sector of digital humanities
focused specifically on born-digital literary artifacts, rather than
on using the computer and the network to redistribute, analyze, or
recontextualize artifacts of print culture. Works of electronic
literature appeal to configurative reading practices. The field of
electronic literature is based on a gift economy and developing a
network-based literary culture built on the collaborative practices of
a globally distributed community of artists, writers, and scholars.
This article situates the development of the field of electronic
literature within academe, some of the institutional challenges
currently confronting the field, and its potential for further
development.
Teaching and Learning from the U.S. South in Global Contexts: A Case Study of Southern Spaces and Southcomb
Sarah Toton, Emory University; Stacey Martin, Emory University
This paper examines the internet journal Southern Spaces, launched in February 2004 and the online learning community SouthComb, started in 2006. We examine the development of these online tools, exploring pedagogical implications as well as the tools and avenues they bring to the field of Southern Studies, American Studies and scholarly communication online. We also explore the potential uses for these resources as well as their efforts to elucidate a broader understanding of the U.S. South in regional, national and global contexts.
Designing Choreographies for the New Economy of Attention
Eric Gordon, Emerson College; David Bogen, Rhode Island School of Design
The nature of the academic lecture has changed with the introduction of wi-fi and cellular technologies. Interacting with personal screens during a lecture or other live event has become commonplace and, as a result, the economy of attention that defines these situations has changed. Is it possible to pay attention when sending a text message or surfing the web? For that matter, does distraction always detract from the learning that takes place in these environments? In this article, we ask questions concerning the texture and shape of this emerging economy of attention. We do not take a position on the efficiency of new technologies for delivering educational content or their efficacy of competing for users’ time and attention. Instead, we argue that the emerging social media provide new methods for choreographing attention in line with the performative conventions of any given situation. Rather than banning laptops and phones from the lecture hall and the classroom, we aim to ask what precisely they have on offer for these settings understood as performative sites, as well as for a culture that equates individual attentional behavior with intellectual and moral aptitude.
Reviews
A Review of Matthew Kirschenbaum, Mechanisms: New Media and the Forensic Imagination Cambridge, MA and London, UK: MIT University Press, 2008
Johanna Drucker, University of California, Los Angeles