Tanya Clement is an Assistant Professor in the School of Information at the University of Texas at Austin. She has a PhD in English Literature and Language and an MFA in fiction. Her primary area of research is the role of scholarly information infrastructure as it impacts academic research libraries and digital collections, research tools and (re)sources in the context of future applications, humanities informatics, and humanities data curation. Her research is informed by theories of knowledge representation, information theory, mark-up theory, social text theory, and theories of information visualization. She has published pieces on digital humanities and digital literacies in several books and on digital scholarly editing, text mining and modernist literature in the
David Tcheng works as a Research Scientist for the Illinois Informatics Institute (I3) at the University of Illinois at Urbana Champaign (UIUC). David received a BS from Illinois State University and is currently pursuing a Ph.D. in Informatics at UIUC. David is a machine learning (ML) specialist and has applied ML to many difficult real world problems from domains ranging from art to science and involving media types ranging from sound, image, and symbolic sequences. Prior to I3, David worked many years with NCSA and co-founded the Automated Learning Group. Backed by venture funding from I-Ventures, David took a one year hiatus from UIUC to start up a music analysis and recommendation company called One Llama Media, Inc.
Loretta Auvil works at the Illinois Informatics Institute (I3) at the University of Illinois at Urbana Champaign. She received a MS in Computer Science from Virginia Tech and a BS in Applied Mathematics and Computer Science from Alderson-Broaddus College. She has worked with a diverse set of application drivers to integrate machine learning and information visualization techniques to solve the needs of research partners. She has led software development and research projects for many years. Prior to working for I3, she spent many years at NCSA on machine learning and information visualization projects and several years creating tools for visualizing performance data of parallel computer programs at Rome Laboratory and Oak Ridge National Laboratory.
Boris Capitanu is a Research Programmer working in the Illinois Informatics Institute at the University of Illinois at Urbana Champaign. Boris holds a B.S. and M.S. in Computer Science from University of Illinois at Urbana-Champaign. His research interests include data mining, machine learning, and educational technologies. Boris is currently working on the SEASR project creating software platforms for the advancement of scholarly research.
Megan Monroe is a Ph.D. student in the Computer Science Department at the University of Maryland. She currently works in the Human-Computer Interaction Lab (HCIL) on Professor Ben Shneiderman's medical visualization team. Her focus is in temporal event search and analysis.
This is the source
Computational literary analytics that include frequency trends and collocation, topic
modeling, and network analysis have relied on rapid and large-scale analysis of the
word or strings of words. This essay shows that there are many other features of
literary texts by which humanists make meaning other than the word, such as prosody
and sound, and how computational methods allow us to do what has historically been a
more difficult method of analysis — trying to understand how literary texts make
meaning with these features. This paper will discuss a case study that uses theories
of knowledge representation and research on phonetic and prosodic symbolism to
develop analytics and visualizations that help readers discover aural and prosodic
patterns in literary texts. To this end, this paper has two parts: (I) We describe
the theories of knowledge representation and research into phonetic and prosodic
symbolism that underpin the logics and ontologies of aurality incorporated in our
project. This basic theory of aurality is reflected in our use of OpenMary, a
text-to-speech application tool for extracting aural features; in the flow
we
coordinated to pre-process texts in SEASR’s Meandre, a data flow environment; in the
instance-based predictive modeling procedure that we developed for the project; and
in
Literary meaning from aurality
Humanities data, for which cultural institutions such as libraries and museums are becoming progressively more responsible, is like all data: increasing exponentially. Many scholars have responded to this expanded access by augmenting their fields of study with theories and practices that correspond to methodologies that use advanced computational analysis. The very popular Digging into Data challenge is a testament to the wide array of perspectives and methodologies digital projects can encompass. In particular, the first (2009) and second (2011) rounds of awards include projects that are using machine learning and visualization to provide new methods of discovery. Some analyze image files (
large amounts of music information(the
At a time when digital humanities scholars are enthusiastic about
wordto analyze literary texts — specifically those features that comprise sound including parts-of-speech, accent, phoneme, stress, tone, and phrase units. To this end, this discussion includes a case study that uses theories of knowledge representation and research on phonetic and prosodic symbolism to develop analytics and visualizations that help readers of literary texts to negotiate large data sets and interpret aural and prosodic patterns in text.
In this piece, we describe how computational analysis, predictive modeling, and visualization facilitated our discovery process in three texts by Gertrude Stein, the word portraits
flowwe coordinated to pre-process texts in SEASR’s Meandre,
Theories of knowledge representation can facilitate our ability to express how we
are modeling sound in a computational environment. Before defining what we mean by
the logics and ontologies of aurality,
however, it is useful to discuss
why these definitions are necessary at all. John F. Sowa writes in his seminal
book on computational foundations that theories of knowledge representation are
particularly useful for anyone whose job is to analyze
knowledge about the real world and map it to a computable form
the application of logic and ontology
to the task of constructing computable models for some domain
truth
but rather
of how we think about the world in a certain context (the pure form
and the content that is expressed in that
form
knowledge representation is vague,
with no criteria for determining whether statements are redundant or
contradictory;
similarly, without ontology
(or a clear sense of
what the content represents), Sowa writes, the terms and symbols are ill-defined,
confused, and confusing
sound,
it is difficult for literary scholars to read or
understand the results of any computational analytics we apply to that model.
In his seminal article,
can best be understood in terms of five distinct roles it plays
- A knowledge representation is most fundamentally a surrogate, a substitute for the thing itself, used to enable an entity to determine consequences by thinking rather than acting, i.e., by reasoning about the world rather than taking action in it.
- It is a set of ontological commitments, i.e., an answer to the question: In what terms should I think about the world?
- It is a fragmentary theory of intelligent reasoning, expressed in terms of three components: (i) the representation's fundamental conception of intelligent reasoning; (ii) the set of inferences the representation sanctions; and (iii) the set of inferences it recommends.
- It is a medium for pragmatically efficient computation, i.e., the computational environment in which thinking is accomplished. One contribution to this pragmatic efficiency is supplied by the guidance a representation provides for organizing information so as to facilitate making the recommended inferences.
- It is a medium of human expression, i.e., a language in which we say things about the world.
After defining the first and second roles of knowledge representation in more detail in the first part of this piece, we aggregate a discussion of the next two aspects in the second part. Finally, part three of this piece includes the final role and a more comprehensive discussion of how all five roles are at play within our specific readings of texts written by Gertrude Stein.
The first role of knowledge representation, as Davis describes it, is most fundamentally a surrogate, a
substitute for the thing itself, used to enable an entity to determine
consequences by thinking rather than acting, i.e., by reasoning about the world
rather than taking action in it
We call this surrogate an orality
of text as a testament to the history of oral cultures, Charles
Bernstein focuses on the aurality
of text, which he calls the sounding of the writing
orality
has an emphasis on breath, voice, and
speech ...
than
I did
can be found as part of the
Ultimately, understanding and defining sonic phenomena is a subjective practice. In a recent special issue of
something not obviously divisible
Objects as sonic phenomena are points of diffusion that in listening we attempt to gather,they write. Most significant for this discussion, they articulate the work of sonic interpretation as this
work of gatheringin
an effort to unify and make cohere
Written texts all have to be related somehow, directly or indirectly, to the world of sound, the natural habitat of language, to yield their meanings. ‘Reading’ a text means converting it to sound, aloud or in the imagination, syllable-by-syllable in slow reading or sketchily in the rapid reading common to high-technology cultures.
While Ong essentializes the relationship between written texts and sound in terms
of his study of literacy in oral cultures, Charles Bernstein points back to the
difficult work of identifying the osmotic relationship between sound and meaning
when interpreting poetry, arguing, [t]he relation of sound to meaning
is something like the relation of the soul (or mind) to the body. They are
aspects of each other, neither prior, neither independent
gatherings
of sound and how that work
influenced her creation of literary texts: I had the habit,
she writes, of conceiving myself as completely
talking and listening, listening was talking and talking was listening and
in so doing I conceived what I at the time called the rhythm of anybody’s
personality
Interpreting or representing sound is also a subjective practice, however. Dwight
Bolinger notes that this subjective work of gathering is a best guess
that
is based on what can be considered divisible and measurable syntactical units.
In the total absence of all
phonological and visual cues,
he writes, the psychological tendency to
impose an accent is so strong that it will be done as a
best guess
from the syntaxsound out
a written word, we make
best guesses
for those sounds based on the possibilities of sound that
are represented by the structural features of a word including parts-of-speech,
the position of a word in a phrase (e.g., consecutive verbs or nouns), sentence
type (e.g., a declaration or a question), and information structure (e.g., given
and inferable information in a dependent clause is frequently de-accented) within
its syntactical context. This discussion emphasizes the fact that identifying
sound surrogates that represent a literary text’s aurality remains subjective,
especially within a computational system like ours that relies on best
guesses
for gathering syntactical units.
Davis’s second role for knowledge representation is as a set of ontological commitments,
i.e., an answer to the question: In what terms should I think about the
world? The commitments are in effect a strong pair of glasses that determine
what we can see, bringing some part of the world into sharp focus, at the
expense of blurring other parts
The debate concerning whether or not sound contributes to how we interpret written
texts has a long history.all the
phenomena, all the features which belong to the structure of the language being
sung, the rules of the genre, the coded form of the melisma, the composer’s
idiolect, the style of the interpretation: in short everything in the
performance which is in the service of communication, representation,
expression
; and the volume of the singing and speaking
voice, the space where significations germinate
In order to create aural surrogates with computational modeling we include textual
features of sound that research has shown correspond to how we interpret texts.
Reuven Tsur, for example, works backwards from long held beliefs about specific
meanings of sound in poetry to arrive at rules that can support or refute these
meanings. We are also interested in those features of text that create
possibilities for interpretation. For instance, Tsur notes that a reader’s sense
that sounds make meaning is an abstract or impressionistic regard, but Tsur seeks
to use phonetic and phonological
generalizations in an attempt
based on widespread beliefs concerning the
‘
aesthetic
’ quality of speech soundstwo aims
: (1) to legitimate impressions
of
sound as an integral part of
criticism
and (2) to define rules that harken toward scientific
impartiality
or empirical analysis claim back the largest possible areas
of criticism from arbitrary impressionism
aspects of the world we believe to be
relevant
In particular, we adhere to the idea that all meaning making is an act of
abstraction that is dependent not only on the objects of study (words
(a) Onomatopoeia; (b) Expressive
Sounds; (c) Focusing Sound Patterns; (d) Neutral Sound Patterns
[a] sound combination [that] is grasped as expressive of the tone, mood, or some general quality of meaning
an abstraction from the sound pattern (i.e., some kind of tone orqualityof the sounds is parallel to an abstraction from the meaning of the words (tone, mood etc.)
And the silken, sad, uncertain rustling, of each purple curtain), which uses onomatopoeia and a quatrain by Shakespeare that begins
When to the sessions of sweet, silent thought…that uses expressive sounds. Both of these examples use the sibilants /s/ and /š/ to evoke, respectively,
noisy potentialand
hushing potential
all speech sounds are equalsince
for readers of poetry it is difficult to escape the feeling that some speech sounds are more equal … more musical, more emotional, or more beautiful than others
Sound also contributes to meaning with prosodic and phonetic elements. Prosody has
been defined by linguists as comprising intonation, stress, and rhythm to convey
linguistic meaning through phrasing and prominence intonation
to include not only accents and stress but also symbolic
meaning since it is generally used to refer to
the overall landscape, the wider ups and downs that show greater or lesser
degrees of excitement, boredom, curiosity, positiveness, etc.
The above theories in aurality and research in phonetic and prosodic symbolism
undergird the choices we have made in developing a technical, computational
infrastructure for analyzing the sound of literary texts. Shifting our attention to
consider two more of Davis’s roles of knowledge representation including knowledge
representation as a fragmentary theory of intelligent reasoning
and as a
medium for pragmatically efficient computation,
this section will discuss
three essential parts of the infrastructure that represents the sound of text in our
project. First we consider our decision to use OpenMary, a text-to-speech application
tool, that extracts aural features from literary texts; next, we discuss the data
flow we developed in SEASR’s data flow environment (Meandre) to produce a
representation of the data for modeling as well as the predictive modeling procedure
we implemented to analyze patterns across these extracted features; and finally, we
introduce
In this project, we use OpenMary,
unknownwords were returned in the CMU comparison. That is, many of the words that Stein used in her lexicon, though common words, were returned as
unknownwords in the results (such as
insensibility,
meekness,
well-meaning,and
slinks). OpenMary’s recommendation, on the other hand, incorporates a
best guessmodel in any given prosodic situation — that is, it is based on an algorithm or a set of stringent rules that draws on the kind of research that Tsur, Bolinger and others have mapped for how we make meaning with sound, which includes part-of-speech, accent, phoneme, stress, tone, the position of a word in a phrase (e.g., consecutive verbs or multiple nouns), sentence type (e.g., a declaration or a question), and information structure (e.g., given and inferable information in a dependent clause is frequently de-accented)
The documentation explains OpenMary’s system for Natural Language Processing
(NLP): In a first NLP step, part of speech
labelling [sic] and shallow parsing (chunking) is performed. Then, a lexicon
lookup is performed in the pronounciation [sic] lexicon; unknown tokens are
morphologically decomposed and phonemised by grapheme to phoneme (letter to
sound) rules. Independently from the lexicon lookup, symbols for the
intonation and phrase structure are assigned by rule, using punctuation,
part of speech info, and the local syntactic info provided by the chunker.
Finally, postlexical phonological rules are applied, modifying the phone
symbols and/or the intonation symbols as a function of their
context.
Further intelligent reasoning is reflected in OpenMary’s folksonomic technique for
representing words that are not in the CMU Pronouncing dictionary lexicon; this
technique involves generating a lexicon of known pronunciations from the most
common words in Wikipedia and allowing developers to enter new words manually
(Adding support for a new language to MARY TTS
). OpenMary will make a
best guess
at words that are not part of the CMU lexicon because its
rule set or algorithm — its
As a byproduct of this process, OpenMary outputs a representation of the sound of
text in XML that reflects a set of possibilities for speech that are important
indicators of how the text could potentially be read aloud by a reader.
Specifically, OpenMary accepts text input and creates an XML document (MaryXML) as
output with attributes like those shown in Figure 1. This example represents the
phrase A kind in glass and a cousin, a spectacle and nothing strange
from
Gertrude Stein’s text
As shown above sentences (<s>) are broken into prosodic units and then
phrases (<prosody> and <phrase>), which are, in turn, broken into
words or tokens (<t>). These word elements hold the attributes that mark
accent
, part of speech (pos
), and ph
— phonetic spellings
(transcribed in SAMPA),) broken into what we refer to as sounds
separated
by –
, with an apostrophe ( '
) preceding stressed syllables. Other
information is included at the phrase level such as tone
and
breakindex.
accent
the
L
indicates a low pitch and H
indicates a high pitch; a
character followed by * gives the pitch of the stressed syllable; pitches
preceding and following the stressed pitch are separated from the stressed
pitch by +
and !
represents a downstep onto the following pitch;
g2p_method
indicates how the phonetics were found; breakindex
indicates the type of boundary: 2
= a potential boundary location;
3
= an intermediate phrase break; 4
= an intra-sentential
phrase break; 5
= a sentence-final boundary; 6
= a paragraph-final
boundary; tone
indicates the tone ending the previous phrase
The SEASR (Software Environment for the Advancement of Scholarly Research) team at
the University of Illinois at Urbana-Champaign has been working on creating a
computational environment in which users who are interested in analyzing large
data sets can develop data flows that push these data sets through various textual
analytics and visualizations.
The ability to explore a text’s aurality was not represented within SEASR until we
added a Meandre component to use OpenMary (shown as the green box module in Figure
2). Meandre components were used to segment the book into smaller chunks of text
before passing it to OpenMary for feature extraction, because sending large
amounts of text to OpenMary created memory problems associated with processing the
complete document. Consequently, the flow processes each document in our
collection through the OpenMary web service at a paragraph level. Meandre is also
used to create a tabular representation of the data (see Figure 3). The features
represented from the MaryXML are part of speech, accent, phoneme, stress, tone,
and break index, because research shows that these features have a significant
impact on how we make meaning with sound.
Once the features for aurality were extracted for a collection of documents, we wanted to compare the aurality between the documents and identify the documents that had similar prosody patterns. This comparison was framed as a predictive problem, where we used the features from one document to predict similar documents. We developed an instance-based, machine-learning algorithm for the predictive analysis that can be broken into the following steps:
Our hypothesis is that
several books in our collection have similar prosody patterns and should
sound
more alike.
Figure 4 shows the
process we follow to create examples
for machine learning, starting
with the OpenMary output, and transformation to a database table in Meandre.
Next, we use our predictive analysis algorithms to derive a symbol
from the OpenMary output at the sound level (i.e., each row of the tabular
data). This symbol is an id that represents a unique combination of
intelligent reasoning
and pragmatically
efficient computation
aspects of knowledge representation that Davis,
et. al have identified.
For our collection, the size of the phrase window is fourteen so the set of
input features are the fourteen symbol ids for the given phrase. In total,
there are 1,434,588 phrase windows of fourteen symbols from nine books.
Finally, we added the class
attribute, which is an id for the book in
which the phrase window exists. The class attribute (the book) is the
attribute that we predict.
For the predictive analysis we use an
instance-based approach, which is based on learning algorithms that
systemically compare new problem instances with training instances. In this
project, we use a full, leave-one-out cross validation. That is, for each
prediction, the phrase window is compared to each phrase window from all
other books.
To predict which book a given phrase window exists, this phrase window is
compared to all other phrase windows from all other books by computing a
distance function.distance weighting power
.
When p is set to a maximum, only the single nearest neighbor is used to
make predictions and when p is set to a minimum, all memorized examples
contributed equally. For any given problem there is a sweet spot
where the highest accuracy is achieved. This optimal parameter setting is
different for each variation of the problem and is also affected by the
number of training examples.
We describe these extensive processes to show that intelligent reasoning and pragmatically efficient computation require extensive amounts of processing power. As such, these are not experiments that can be run on a home computer. Changing the way we analyze text (moving away from the grapheme and towards the phoneme) is complicated by the need to collaborate across disciplinary realms (such as an English Department or School of Information collaborating with a Supercomputing Center and Visualization Lab). Further, the results produced by these processes comprise another set of large amounts of data that must be made comprehensible to readers or scholars interested in analyzing sound patterns in text.
An essential aspect of this project is ProseVis, a visualization tool we developed
to allow a reader to map the features extracted from OpenMary and the predictive
classification data to the words in the contexts to which readers are
familiar.
Using the data produced by Meandre, ProseVis highlights features of a text. Figure
5, for example, shows two short prose pieces by Gertrude Stein called word
portraits
and titled
do not tell a story
a picture that exists for and in itselfusing
objects landscapes and peoplewithout being
deeply conscious of these things as a subject
When visualizing the text at the sound level, we encountered three primary issues:
(1) The set of unique sounds in a given text is too large to assign each one an
easily discernible color in the display; (2) When doing a string-based comparison
of one complete sound to another, it is not possible to detect subtler, and
potentially critical similarities that form patterns such as alliteration and
rhyming.Struggle,
Skeene,
Stay,
Stayed.
In all four of these words (which accounted for 67 instances
across all files), the leading S
is separated into its own sound. The
reason for this breakdown is that sounds in the OpenMary data are broken into
parts to correspond to how words are spoken, not necessarily according to
syllables. Some sounds lack one or more of these components, and the absence of
any such component will be assigned a color as well.
This breakdown provides the reader with a finer-grained level of analysis, as well
as a simplified coloring scheme. As a result, if the reader chooses to color the
visualization by the sound, they have the additional option of coloring by the
full sound, or by a component of sound such as a leading or ending consonant or a
vowel sound. Figure 7, Figure 8, and Figure 9 show these alternate views. Further,
a reader can render all the words as phonetic spellings (sound
),
parts-of-speech (POS
), or take out the underlying information altogether
(to leave just color) instead of text.
Finally, under the
colorswould override the others. This is explained in further detail below.
As discussed, one of Gertrude Stein’s early modes of experimentation was to create
word portraits in the modernist mode. At the same time, she sensed an immediate
connection between the acts of speech (talking and listening) and her work creating
portraits of people in words. Derrida minimizes the distinction between writing and
speech or voice (and therefore sound) by showing how both are perceived by the
Derrida obscures a distinction between
written and spoken language that a discussion of poetics cannot do without.
Poststructuralism’s demonstration of the difference writing makes must
therefore be set in relation to the difference sound makes
[i]n order to function, that is, to be
readable, a signature must have a repeatable, iterable, imitable form; it must
be able to be detached from the present and singular intention of its
production
signature
or sound of text is the
iterable, repeatable data that is produced by computational analysis.
Further, we can imagine this imitable form as a layer of data (a reading or another
text
) that we are using as an overlay on the originary
text as a
means or a lens to read the literary text differently. This new
perspective on
Stein’s texts is not only important for understanding her creative work; it is
important for reconsidering what we have learned not to consider. For instance, Craig
Monk argues that Gertrude Stein lost favor with Eugene Jolas, founding editor of
[t]he literary creatoror writer
to disintegrate the primal matter of words imposed on him by textbooks and dictionaries(
Introduction). Joyce, argues Monk, epitomized Jolas’s revolution with his
neologisms and portmanteau words
little household words so dear to Sherwood Anderson never impressed [Jolas]([unpublished autobiography], 201 qtd. in Monk 32). As a result, while Jolas would publish much of Joyce’s work including a serialization of his ‘Work in Progress’ (which subsequently became
it was only as Jola’s preference for the verbal in poetry began to emerge clearly that that the discussion of the visual analogies used often to describe Stein’s works might be read, in hindsight, as implicitly derogatory
In fact, the idea that James Joyce’s mode of experimentation incorporated elements
from music while Stein’s works, in contrast, reflected influences from the visual
arts has been debunked and explored and complicated by too many scholars to rehearse
again in this space, but the fact remains that as a culture, we are not far removed
from the situation in which I began to wonder at about this time
just what one saw when one looked at anything really looked at anything. Did
one see sound, and what was the relation between color and sound, did it make
itself by description by a word that meant it or did it make itself by a word
in itself
Primarily, the last part of this discussion is an exploration, using ProseVis and the data from OpenMary and Meandre’s processes in reading sound patterns in Gertrude Stein’s portraits
a medium of human expression, i.e., a language in which we say things about the world
The relative success of Stein’s methods for creating the rhythm of a character is evident in the response of scholars. Wendy Salkind argues that with her portraits
disenchant[ment] with Matisse and his paintingand a sustained
belief in the genius of Picasso.In particular, Salkind notes the ways in which sounds and rhythms work to create these readings:
We can
hear that adulation and disappointment in the phrase repetitions she uses in both pieces. She writes about the effort of creating art, the struggle to be constantly working, to be consistently expressing something, and to find greatness among your followers… When thePicasso description above isspoken aloud , the repetition of thewsound continuously brings your lips forward, as if in a kiss. The monosyllabic sentence flows and arrives on the emphasis of thedouble syllable resolution of the final word. Although alsomonosyllabic , theMatisse description is pedestrian, lacking fluidity. Whenspoken , her words describing him don't feel nearly as good in your mouth.
These same patterns are evident in the ProseVis visualization in Figure 7 in which
the beginning consonant sound w
of words like was,
one,
and were
is represented in red. Clearly, there are fewer
concentrated patterns in the
wsounds out of 2129 total sounds (7.5%), which is actually more than
wsounds out of 1246 sounds (4.5%). The visualization suggests that rather than volume of sounds, Salkind’s reading may have more to do with the close repetition of the
wsounds in
wsounds used in
accentdata (see Figure 11 and Figure 12), we see the dark blue areas that indicate high pitch or accented words that are more prevalent in
Other comparative patterns are clear as well. In Figure 5, in which full sounds are represented, and each vertical line is a phrase, there is an inversion of patterns between the two pieces. In
This one,while phrases about Matisse begin with more general referents such as
Someas in
Some of a few.Conversely, while the
somethingand end again with the specific reference to him again in
one,the
Some) and get more specific in the middle of the phrase (referring to
he) and ending with vague terms referring to an abstract
thing. These patterns or expression are highlighted in this visualization because they are emphasized by certain sounds. In the
oand
mand
nsounds in
someand
one— nonetheless with the
omsound, your throat remains open. The prevalent Matisse sounds are one would make by beginning with closed lips and ending with widened or more opened lips such as the
iand
esounds in
thingand
he— in this case the reader is closing off the breath, squeezing it with her mouth. One could argue that Stein’s play with sounds shows how she represents these artists: the
fluidityupon which Salkind remarks; the
For our predictive modeling study, we compared the sounds of Gertrude Stein’s
takes the form of domestic guides to living: cookbooks, housekeeping guides, books of etiquette, guides to entertaining, maxims of interior design, fashion advice
exploits the vocabulary, syntax, rhythms, and cadences of conventional women's prose and talkto
[explain] her own idiosyncratic domestic arrangement by using and displacing the authoritative discourse of the conventional woman's world
Toklas, of course, collected recipes, and she later published two cookbooks,The Alice B. Toklas Cookbook (1954) andAromas of Past and Present (1958). Through Toklas then, at least, Stein was familiar with the genre of the cookbook or recipe collection and would appropriatelyadoptand parody that genre in writing of their growing intimacy. Significantly, Toklas’s name asalasappears repeatedly inCooking as well as elsewhere inTender Buttons .
It is immediately clear from a simple frequency analysis that the word Alas
only appears in the one
To focus the machine learning on this hypothesis, we chose nine texts for comparison and only used features that research has shown reflect prosody such as part of speech, accent, stress, tone, and break index. The nine texts we chose were
First, we defined a prediction problem for machine learning to solve: Predict from which book the window of prosody features comes. Figure 13 visualizes the results of our predictive analysis. The analysis results show that machine learning makes the same similarity judgment that Murphy had made:
Using the ProseVis interface, we can see within the context of the text where these associations have been made. Figure 14 shows
turned onincluding NECB,
alas the back shape of mussle) and yellow (in the line
alas a dirty third alas a dirty third). The darker shades indicate that the probability that the sound is more like
The story the visualizations tell is two-fold. First, these visualizations are
useful in allowing us to test or generate hypotheses about prosody and sound in
the texts. Figure 16 shows us that the section surrounding alas
is, in
fact, more like
alashas the rhythm of
Second, other questions and hypotheses may be raised concerning how the algorithm
and ProseVis work together to generate these visualizations. These latter kinds of
questions can be considered in terms of the data sets and the documentation we are
providing as well as in respect to articles such as this one. In other words, the
goal is not accurate text identification using prosody features, but rather to
test hypotheses that consider the sound and prosodic similarities of texts. Part
of what we interested in digital humanities are the mistakes we perceive are made
by the computer and what these errors reveal about algorithms we are using to
gauge the significance of textual features. In other words, one benefit to
scholarship represented in this research is determining where the model breaks
down and where the ontology must be tweaked. For example, currently, the machine
learning system is not being tuned to produce the most accurate classifications:
using more context such as a larger window size (i.e., a larger number of phonemes
to consider as part of a window) increases the classification accuracy
dramatically.
Previously, digital humanities scholars have also used phonetic symbolic research to
create tools that mark and analyze sound in poetry. For instance, Marc Plamondon
created AnalysePoems to analyze the Representative Poetry Online (RPO) website
(http://rpo.library.utoronto.ca). Plamondon’s goal with AnalysePoems was to automate the identification of the
dominant metrical pattern of a poem and to describe some basic elements of the
structure of the poem such as the number of syllables per line and whether all
lines are of the same syllabic length or whether there are variations in the
syllabic length of the lines in an identifiable pattern
reality
of a spoken poem, a feat that is
impossible because of the ephemeral elements of performance of which a poetry reading
comprises. Instead, AnalysePoems is “built on the prosodic philosophy that a full
scansion of a poe
Another tool that was built to examine how the phonetic/phonological structure of a poem may contribute
to its meaning and emotional power
is Pattern-Finder that feature-patterning is the driving
force in the
music
of the poetryfeatures
of data such as parts of speech,
syllables, stress, and groups of vowel and consonant sounds. Like the creators of
Pattern-Finder, we are also interested in allowing readers to make groupings of
consonant sounds that include plosives, frications, and affricates and groupings of
vowels that include those formed in the front or the back of the mouth. Phonetic
symbolic research and the creation of these tools demonstrate a precedent for
facilitating readings that use these features to analyze text for meaning.
Practically speaking, our system for predictive modeling and the ProseVis tool are in the early stages of development but we are encouraged with the results so far. We predicted that
At the same time, while usability studies are still a future goal in the project,
developing the algorithm and the ProseVis interface has already been productive in
terms of interrogating the efficacy of our underlying theories of aurality. Our work
is a new and promising approach to comparing texts based on prosody, but what is
equally promising is that we are ultimately basing our ontology for creating a
machine learning algorithm on an underlying logic of potential and inexact sounds as
they are anticipated in text. Further, the success
of the comparison of sounds
between texts is based on the extent to which the computer is confused
about
these possibilities. The fact that this is a best guess
methodology, which
stems from theories in artificial intelligence and knowledge representation and is
based on potentials and probabilities, suggests that the algorithm and the tool
incorporate and function within a space that invites hypothesis generation and
discoveries in the sound of text.
A thing not beginning or ending’: Using Digital Tools to Distant-Read Gertrude Stein’s
Melanctha’ and African-American Musical Traditions
Sounds Convey Meaning: The Implications of Phonetic Symbolism for Brand Name Construction