One
In order to begin to write this essay, I set out to make some appropriate use of what
I have come to think of as “writing to be
found”. Originally I had thought that this would be by way of simply
beginning to write, embarking on my usual process of writing while checking,
periodically, to see whether the sequences of words that I was in the midst of
composing were still “found” in the corpus and
then at what point they became “not yet
found”.
[1] How many
words would I have to add, composing my syntagmatic sequences, before they were not
found in the corpus of language to which the Google search engine gives me access,
before they were, perhaps, original sequences? How difficult would I find it to
produce unfound sequences? Would I be able to continue to write as I usually write
once I was aware that, at some perhaps unanticipated moment, the words I write are
suddenly penetrating and constituting the domain of sequences that are not yet found
in our largest, most accessible corpus of written English?
There have proven to be many questions raised by any and all of my attempts to engage
with these processes and their contexts. Moreover, I remain convinced that many of
these processes may be productive of significance and affect, to an extent that will
allow aesthetic, not only critical, practices some purchase.
This way of working with language is enabled by unprecedented, convenient and
articulable access to the network, a world of language, a media-constituted diegesis,
that is still “powered” – as the contemporary technologically
inflected usage would have it – by text, by encoded representations of inscription,
in what we usually call writing. The net is still largely composed from all the
privileged instantiations of our languages’ singular materialities that we, as
irrepressible language-makers, have so far written to be found.
By which I mean to make it clearer, that when I write with these processes, I’m both
writing, and writing with Google.
[2] Is Google my collaborator? Does Google
become the space within which I write? I want to make it clear that I don’t consider
myself necessarily to be writing in the space of the network nor collaborating
(directly) with other artists. At this point, I also want to make it clear that I do
not consider myself to be using Google, not, at least, in the usual way that Google
is used for gathering instances of language by search. I’m not refashioning myself as
a Flarfist.
[3] I’m not casting a faux-puerile, post-everything, absurdist net
over the net using the net, gathering glittering detritus, spectacular disjuncture,
in endless anti-syntactic listlings. I’m not composing searches in order to find the
language for what I’m making. I’ve got my language already, one way or another. I
just want to know whether it’s found or it isn’t. The Flarf-poetic approach is –
although this is only a small part of Flarf – a détournement of the affordances that
Google offers us as a portal to text on the network. My “writing to be found”, on the other hand, is in itself a way of
writing that is shaped by the way that Google is shaped, by the way in which Google
curves the space of the network. And Google does also, in a sense, write with me:
constraining, directing, guiding, and, especially, punctuating my writing.
It occurs to me, broadening the scope of these experiments’ relevance, that Poetic
writing for programmable and network media seems to have been captivated by the
affordances of new media and questions of whether or not and if so, how certain
novel, advanced, media-constituted properties and methods of literary objects require
us to reassess and reconfigure the literary itself. What if we shift our attention
decidedly to practices, processes, procedures – towards ways of writing and ways of
reading rather than dwelling on either textual artifacts themselves (even when
considered as time-based literary objects) or the concepts underpinning such objects
as artifacts? What else can we do, given that we must now write on, for, and with the
net which is itself no object but a seething mass of manifold processes? Google
itself signals the significance of process since Google both is and is not the net.
Google is not the inscription that forms the matter of the net. Google is merely
(almost) everyprocess (not everything) that makes it possible
for us to find and touch and consume what was always already there in front of us.
When you collaborate you are more or less obliged to get to know your collaborator.
Getting to know Google better, in a practical sense, as a collaborator, is one of the
most interesting results to emerge from even the relatively simple and preliminary
processes that have been set in train.
This is probably the moment to introduce some details of the procedures with which I
am writing. First, a classical epithet via Montaigne in John Florio’s translation, “The Philosopher Chrisippus was wont
to foist-in amongst his bookes, not only whole sentences and other long-long
discourses, but whole bookes of other Authors, as in one, he brought Euripides
his Medea. And Apollodorus was wont to say of him, that if one should draw from
out his bookes what he had stoln from others, his paper would remaine blanke.
Whereas Epicurus cleane contrarie to him in three hundred volumes he left
behind him, had not made use of one allegation.”
[
Montaigne 1910]
Process: Write into the Google search field with text delimited by quote marks until
the sequence of words is not found. Record this sequence. Delete words from the
beginning of the sequence until the sequence is found. Then add more words to the end
of the sequence until it is not found. Repeat. Each line of the resultant text
(although not necessarily the last line) will comprise a sequence of words that is
“not yet found”. At the time of composition
these lineated sequences of words had not yet been indexed by Google and were thus,
in a certain (formal) sense, original:
-
“If I write, quoting,”
-
“I write, quoting, “And”
-
“write, quoting, “And the”
-
“quoting, ‘And the earth”
-
“‛And the earth was without form and void; and darkness was upon the face
of the deep,’ these words”
-
“upon the face of the deep,’ these words will”
-
“deep,’ these words will be found”
-
“these words will be found. Perhaps”
-
“will be found. Perhaps they will now”
-
“Perhaps they will now always be found”
-
“will now always be found. I”
-
“always be found. I write”
-
“be found. I write, in part”
-
“I write, in part, in the hope that what”
-
“in the hope that what I write will be found.”
with Google, Sat Oct 3, 2009, completed 2:04am EST.
I was induced to explore this way of writing by the remarks of a philosopher and
cognitive scientist, Ron Chrisley, at a workshop on Neuroesthetics.
[4] In discussing robotic perception, he was making some use of the concept of
the “edge of chaos”. I understood this phrase
loosely as referring to a threshold of information processing, the point at which an
artificial cognizer can no longer assimilate – typically by compression or by rule
formulation – the information that comprises its inputs. Somehow, to me, this
suggested or rhymed with that moment in our now common encounters with search engines
when what we are looking for is not yet found, when it could still be anything,
because, as yet, it is nothing to the corpus. It isn’t there. It isn’t in any way
predictable. It’s still maximal, raw information in Shannon-Weaver’s sense – the edge
of chaos that we are about to make, literally, readable.
Since I have some practical experience with Markov models for text generation, I also
pretend to recognize this as a closely related phenomenon.
[5] If we think of Google as giving us access to a vast Markov model, I
believe I am right in saying that as I build up my sequences of words delimited by
quotes and test them after adding each word, I am testing the model’s ability to be
able to find me an n-gram where n is equal to the number of words in my sequence.
Non-zero results mean that there are probabilities to play with. Not only is it the
case that other people before me have produced instances of this sequence of words,
but an n-gram model, constructed from the Google corpus, would also have some chance
of generating my search phrase. However, once I’ve reached an unfound sequence, the
model breaks down. I’m at the edge, and I may also, perhaps, be about to extend, by
some minuscule amount, the readable, the unchaotic territory of the textual, perhaps
even that of the literary. I’m about to write, and to add my own writing to the
corpus.
And then suddenly it gets interesting. I was just writing, and now I’m writing with
Google and beginning to wonder what that means. Google is where we search for
language and for forms of all kind that are made from language, including aesthetic
forms. It’s become our default portal to the default corpus. It is not yet all
writing but we feel that we are close to the historical moment when the extraordinary
possibility – Ted Nelson’s Docuverse – has become an actuality for, at least, a major
portion of the existing textual corpus of writing in English. Already, I wager, we
type our searches into Google expecting that it will find anything and everything
that we might expect to be found in the world of letters, of conventionally inscribed
textuality. What do I mean by that? I mean at least all of those sequences of words
that have been written by authors who are known to us. All of the writing that is
known, all of the writing that will have been found. And much besides.
-
“The purpose of this writing is to address”
-
“an edge of chaos.”
-
“Specifically, the point or points”
-
“in sequences of words that”
-
“delimit phrases”
-
“found to be unique in our”
-
“most accessible corpus.”
with Google, Sat Oct 3, 2009, completed 10:27am EST.
The two singularly lineated sentences above are made with a slightly different
process, a retreat from the not yet found sequence – at the time this was, for
example, “The purpose of this writing is to address
an” – to the longest sequence that was still found in the accessible Google
corpus. Although the sentences are original to me they are expressed in phrases that
can be shown to be plagiarized from the corpus. They have all already been
written.
For we do seem to be addressing something like the palpable, objective edge of
authorial originality. “The purpose of this writing
is to address” was always unoriginal before I set out. When I wrote, “The purpose of this writing is to address an”,
the indefinite article made me an author.
Those of us who are educators will be aware of the way that Google and other search
engines are used as simple detectors of student plagiarism. Type the suspected
sentence into Google and it is very likely to find the source from which it may have
been copied. Writing to be found with Google reveals, however, the singular, perhaps
unprecedented nature of its, Google’s, co-authorial authority. By definition Google
changes shape. As we’ve said before, it’s a process. By providing access Google seems
to
be the corpus of reference while remaining a protean manifold of
processes that continually reconfigure themselves while crawling over
our networked body of language (the actual corpus), even unto the
edge of chaos, finding new readable things and indexing them relentlessly and
swiftly, remarkably swiftly. Less than three hours after I’d posted my not-yet-found
texts to the netpoetics blog, they were suddenly found.
[6] Thus, taking the same text and putting
it through the same
procedure produced an entirely different text and a new measure (or textual
visualization) of my originality.
Returning to my first process, with the supply text just quoted, for example:
-
“The purpose of this writing is to address an”
-
“is to address an edge of”
-
“address an edge of chaos.”
completed with Google at 9:17 EST on Oct 1, 2009, became:
-
“The purpose of this writing is to address an edge”
-
“is to address an edge of chaos.”
a little over two hours later at 11:30 on the same day. (By the way,
although the second iteration of the process reduces the number of unfound sequences
in this initial extract; for the entire supply text, the second iteration actually
increased the total number of unfound sequences from 17 to 21.)
This potential for iteration was not only expected but it was something with which I
desired to experiment, using it to produce a series of texts, evolving over time in
relation to the findableness of their constituent sequences of words.
But imagine my surprise when I tried the procedure again and found it regenerating
the earlier version. My new, original writing was no longer found. I could see it
there in the corpus (at netpoetics) but as far as Google, the “index of reference”, was concerned it was, apparently, no
longer there. I could not yet have produced it. Uncanny. But easily explained by my
arbitrary access, at the first instance of checking, to Google servers that had
already published the indexing of their busy spiders. Later, I had been less lucky:
my client must have connected to other servers (I have no obvious control over this)
onto which the new indexes had not yet propagated. Google had temporally denied my
originality, my authority. It had changed the shape of my authorial persona. I wasn’t
writing with it. It was writing with me, against me, withholding what I thought I had
inscribed.
Two
Why hadn’t I considered this before? Why don’t we think of it now, and then more
often? As a culture, we are in the seemingly ineluctable process of handing over the
digitization and indexing of our entire surviving published textual legacy to Google,
in order for them to include that part of it which they have not already indexed. I,
we, have no idea how they are going to index our literature or how their indexing of
it might change over time. On the other hand there is considerable evidence of
uncertainty and inconsistency.
[7]
I should of course mention in passing that there are already and will likely remain
some checks and balances to Google. So far, the other internet search engines have
access to most of the same corpus, and they do not index this corpus in the same way.
[8] Without huge investment we could all write and set up our very own search
engines. Nonetheless it is remarkable the degree to which Google has become, as I
say, initially the search engine of reference and now in some sense the reference of
reference. This is so obvious to us that it has become banal to point out that
whatever Google is, it may be the most remarkable and significant agency for cultural
change on the planet.
Of course, the scholars amongst us (and within us) will defer. We cannot rely on
anything that the folksonomic internet provides, although relying, admittedly
“by default”, is exactly what all of us having access actually
do. Neither can we defer from Google in the same way that we defer from Wikipedia, on
the basis of what it “contains”. Google is not Wikipedia and, in a
sense, it does not contain anything. Practically and in other critical senses, it
stands between us and Wikipedia while also providing – in so far as it indexes all
the writing that can be found – much of the material from which Wikipedia is built.
Wikipedia is something that arose contemporaneously with the Googlization of
everything but is more a symptom than a cause. Whatever Google is, is a problem that
remains to be addressed, and written with.
Here is one brief working statement of what Google is becoming or what it may already
be: Google is the preferred or default agency to which our existing
institutions of cultural production and critique delegate the symbolic processing
of our inscribed material culture in exchange for unprecedented access to the
results of that symbolic processing. I am, of course, bracketing all the
important questions concerning what exactly is handed over to Google for processing,
how is this done, who owns it, and where it is – all of which are irreversibly
complicated by the fact that any answers will be radically different
“before” and “after” these processes that
were already in train long “before” any actual exchanges – such as
agreements to digitize libraries – were made explicit, let alone regulated in any
publicly agreed and articulated manner.
Let’s say it again in more polemical terms. We hand over our culture to Google in
exchange for unprecedented and free access to that culture. We do this all but
unconscious of the fact that it will be Google that defines what
“unprecedented” and “free” ultimately
imply.
[9] As yet, we hardly seem to acknowledge the fact
that this agreement means that it is Google that reflects our culture back to us.
They design the mirror, the device, the dispositive, as the French would put it. They
offer a promise of “free” access in many senses of that word
including zero cost to the end-using inquirer and close to zero cost to the
institutions that supply the inscribed material culture that Google swallows and
digests. But Google does not (some might here add “any longer”) conceal
the fact that this free access does come at a cost, another type of cost, one that is
also a culture-(in)forming cost: Google will process all (or nearly all) this data in
order to sell a “highly-cultivated” positioning of advertisements.
The deal can’t go ahead without this underlying engine of commerce and
commercialization. In a sense, Google is the predominant global corporation a major
proportion of whose capital is literally cultural capital. Now, what was already a
huge backing investment is being freely augmented by the traditional investors in
this market of culture, the universities in particular. Bizarrely, these
institutional investors are not asking for shares in the business, or rights to vote
on the board. All they seem to want is to have what they
already had,
but processed, indexed, reformed and reflected back to them, to us, in, as I say, a
manner that allows many of us unprecedented access.
This is not, primarily, an essay about Google, and the situation was and is far more
complicated than this polemical outline suggests. Google did, after all, emerge from
the
popular culture that was born on the internet itself, long (in net
history terms) before institutions began to contribute to this culture to any
significant extent. Thus the initial cultural capital that Google amassed may be seen
as fairly won, and the access that Google provided to a suddenly vast,
ever-accumulating resource was truly unprecedented, rendering the culture of the net
useable, manageable, findable, beyond all expectation.
[10] We learned quickly that “unprecedented access” meant that
Google was better than any other agency at managing the “more than ever
before” of everything that is digitally inscribed, the exponential
increase in information. But now this simple, if overwhelming, quantitative fact is
all that we and our institutions know with any surety. We know that
Google will deal with the scale of it all, and manage it all better, and give more of
it back to us,
[11] but we may never know,
unless we ask or demand, exactly
how they do this or how they
will
or will not do this in some speculative future when they have already
disposed of the problems of processing it all, displacing it all, continually
rendering it back to us through manifold devices with post-human artificial intelligences.
[12]
Three
So now all my writing to be found has been recast in the light of this shared,
would-be universal engagement or struggle with Google to retrieve or reform culture.
And immediately, as in the work of writing digital media that underlies these
remarks, I return to specifics with a heightened awareness of their potential
significance, especially as critique of these relations.
For example, in the course of investigating writing to be found, it occurred to me
that any material that is quoted in a text from a well-known, and therefore much
indexed, source will emerge very differently in the procedures outlined above. It
seems that in what may be standard original composition, you can expect sequences of
words that you are writing to be found to be unique after about five words, depending
on diction. However, arbitrarily long sequences of words recalled or quoted from many
texts, like the English Bible in one of the standard translations, will already and
will always be found ... by Google. The conceptualist in you might want to test this
to some absurd aesthetic extreme, typing all of Genesis into the Google search box
delimited by quotes and discovering thousands of hits. I didn’t get this far although
I made attempts with lengthy sequences until I noticed, in light grey type, the
legend:
[13]
“what” (and subsequent words) was ignored
because we limit queries to 32 words.
I hadn’t noticed or been aware of
this limitation before. And I am still unsure about when and how it was instituted.
How long had this been a Google limitation? Who decided it was needed and why? Why 32
words? It’s clearly not surprising that this limitation exists. The point here is
that it gets in the way of using or, in my case, writing
with Google in
the way I believed would be interesting and might lead to further aesthetic or
critical cultural production. What if I wanted to continue with what I had hoped and
planned to do? Google’s got indexes to my language, my culture. Even if they might
not reasonably be expected to give me all the tools I might need or want to explore
this material, why should they constrain or reform the tools that they do appear to
give me in ways that seem to me to be arbitrary or, at least, unrelated to my own
concerns? These questions are already important but not as important as they will
become. When Google indexes all books, which institutions will keep track of when and
why they change their search algorithms, let alone endeavor to influence Google’s
decisions in such matters?
[14]
Never mind, for my immediate purposes at least. Conceptually, I can imagine what the
search results would have been for absurdly long sequences from famous texts and how,
using writing to be found procedures for lineation, texts that quoted or plagiarized
such material (let’s say, writing to be found punctuating certain texts of Kathy
Acker or Pierre Menard’s
Quixote or Kent Johnson’s
Day),
[15] would be chopped up where they are “original”
and then bulge out where they incorporated what is already found, as the “If I write, quoting ...” example above
demonstrates. (Menard’s
Quixote would be all
“bulge”.)
I say “never mind”, but remain disturbed. A productive engagement had been
interrupted by a (ro)bot from Porlock and now this seems as if it will be
characteristic of writing and working with Google, re-energizing the
Anglo-Saxon origins of that preposition. In fact, of course, it is a function of
encoded properties and methods that are designed to reassert, where and whenever
necessary, the underlying purposes of the Google engine which is, as we recall, to
dispose of culture and propose advertisements based on this disposal. Google asserts:
“You don’t need more than 32 words in your queries in order to determine what
you want and what interests you. Making something that requires longer searches
will simply skew our data and make it harder for us to know what you want.”
Despite Google’s assertion, I keep searching. Now my collaborator, Daniel C. Howe,
and I keep searching. We’ve already, like many others, come up against another
important limit. If you search too much or too fast (even manually I found), then
Google’s engine thinks you might be a process (as
it is) and that you
might be making automated queries. This produces the same threat to Google’s
underlying purpose, the threat of skewed analytical data. However, to us it seems as
if we are simply retrieving access to our own linguistic culture. Usually, we are
simply mining the corpus that Google makes accessible – in an unprecedented manner –
for “natural language data”. In writing to be found, I seek out
the chaotic edge of what is being written and is soon to be found by myself and
others, the edge of what literary culture acknowledges to be attributable
authorship.
[16] Isn’t this a legitimate engagement
with what Google promises us? Shouldn’t these admittedly or purportedly poetic
queries be accepted as a part of the culture with which they also engage?
As a matter of fact we continue to write programs that generate automated queries and
it is strange that Google – itself a vast conglomeration of processes – rejects them
as such. Shouldn’t Google be prepared to pass judgment as to whether a process is an
innocent cultural address to its services rather than assume that any automated
inquiry is an attempt to undermine or deflect it from its prime, commercial
objective?
[17] Returning to a concrete example that engages related concerns with poetics
and the author function, I realized that using the Google search query’s
not prefix (a minus sign) I might search for sequences of words from
well-known texts (delimited by quote marks) that would be found in the corpus but in
places where they were not associated with their well-known
“authors”. I used this negatively qualified version of the
procedure described above, testing successively longer sequences and aiming to find
the longest sequences that also satisfied the essential condition of
not
being attributed to the famous author. This produces a text that, paradoxically, is
collaged from phrases that are quoted from arbitrary internet unknowns but which,
when linked together, will compose a famous text. Before supplying an actual example,
I want simply to point out that the program I write to undertake this entirely
legitimate essay in conceptual poetics generates a large number of test searches even
for a brief text and it will find itself frequently blocked by Google’s suspicion of
and ultimate denial of my own process’s high cultural intentions.
[18]
This is Beckett, three fragments from
How It Is which
also correspond to the final part of a short prose work he originally published in
French as
L’Image. But it is also possible to assert
that is
not Beckett but rather something that I have written together
with Google, where we have conspired to calculate a maximal syntagmatic association
with Beckett’s texts while ensuring that these sequences are attributable to others,
often many others, and we do this in a manner that can be established by a
contemporary form of citation. It is a relatively nice problem to consider whether
this text infringes copyright. I might claim, for example, that it is not copied,
that it’s not even the same text, especially given that I have transcribed it with
quotation marks around the phrases. A copyright expert might assert that it was
created by a mechanical process, that it is the product of a procedural but regular
form of transcription and is, therefore, a copy, to which I would have to reply that
a great deal of personal thought and significant indeterminate and unmediated human
labor also went into its making. The piece certainly challenges the Beckett estate’s
moral rights in respect of the text’s integrity and its association with the author’s
name. In US law these rights are not established. In any case, I may both justly
claim fair use, and also perversely propose that my first-cited example was actually
derived from the following entirely original collage composed from fragments found to
have been written on the internet:
[19]
“a moment still
”
“
animals still then”
“
April morning in the
”
“
blue and white of sky
”
“been none for a long time now”
“
blue I stay”
“empty a few
”
“
there no more
”
“mud it’s over
”
“my tongue comes out”
“
thirst the tongue”
“goes out no more”
“
goes in the mouth
”
“
again lolls”
“
closes it must be a”
“straight line now it’s
”
“
the hand opens and closes
”
“that helps me it’s
”
“
going let it go I”
“realize I’m still smiling”
“
in the mud i stay”
“
it’s done I’ve had the
”
“
image the scene is”
“there way off on
”
“
the right in the mud”
“
over it’s done I’ve had”
“the image”
“
there’s no sense in that now
”
Clearly a lot more could and will be done with the procedures of writing to be found
including with this latter variation in which one rediscovers how much of what has
been written has already been written. Google makes all of this possible and Google
also stands in the way of these unanticipated essays. One very significant reason to
continue to work in this way is precisely to reveal how Google and other similar
agencies will reform what they pretend to enable, and how our existing institutions
that support writing as a cultural practice will relate to the profound reformations
that must ensue.
Four
The “writing readers” within a major collaborative project in
digitally mediated literary art are underpinned by the critical, contemporary,
quietly hacktivist natural language processing and research initiated in “writing to be found”. The
Readers Project incorporates “writing with
Google”, and it also proposes performative reading as, perhaps, exemplary
of how we may write in this, our future. The collaboration, with Daniel C. Howe,
produces literary objects that have an extensive computational dimension and will,
typically, be realized as screen-based or projected works, for both private viewing
and reading, and more public exposure in installations with distributed multi-media
and/or mobile displays. As such, they are, in the relatively small world of writing
digital media, examples of a variety of work whose real-world instantiations take
some place either in the screen real estate of net-based or personal computer-based
art, or in the mediated gallery space of digital art. Even the computational aspects
of this work have become amenable to critical attention in these days of codework,
expressive processing and/or critical code studies.
However – and this may not be the best news for an already over-extended critical
community examining aesthetic objects that have still to prove themselves in any
wider cultural forum – crucial reading strategies that are already encapsulated in
our projects, in our quasi-autonomous readers, are derived from precisely the kind of
“writing with Google” that I have outlined
above. In other words, one of the more interesting dimensions of these readers is
that they are, in significant measure, the result of natural language research and
processing undertaken in, arguably, a socio-politically implicated dialogue with our
predominant new devices of cultural reflection and disposition. Of course, the
readers also have other inclinations and ambitions (apart from any jostling entry
into the world of digital art). They may simply wish to offer themselves to
open-minded literary critical readings such as are often applied to the literary
avant-garde. You can read them as poetry or as a poetics. What I am suggesting,
however, is that they may also be read for the way that both they and their making
reads and writes with newly mediated culture, with Google in this instance.
This is a final point, a vector for both literary poesis in digital media and for its
critical reception, but I must conclude the point with its illustration. Here are
three readers from the project, moving through and “reading”, in
some sense, an underlying text, a prose poem of my own, “Misspelt
Landings”.
[20] There is a mesostic reader that finds
and highlights words containing letters (which it capitalizes as it finds them) in a
phrase beginning “READING THROUGH ...”, and
there are two other readers: one that tends rightwards and downwards in the
conventional vectors of human reading while deviating occasionally, and one that
seems to wander while surrounding itself with a halo of erased or faded text. What is
far from obvious is that these readers, all of them, chose their next word to read
(and hence their deviations) on the basis of simple but quite effective research on
the usage of these words in the corpus to which Google gives us access, however
reluctantly. An important aspect of the way this and other pieces from
The Readers Project are deployed is that, for each such
manifold display, the readings of all the live readers are separately broadcast to a
server, a feed to which you may subscribe by accessing a URL with a browser and with
other clients under development. Subscribed to a particular reader, you may read
along with it and see clearly the textual path it has chosen, according to its
particular reading strategies.
In simple terms these readers check the proximate neighboring words of the word they
have just read and they “know” – from the results of their
writers’ struggle with Google – whether or not any or all of those proximate words
will represent likely natural language phrases.
[21] Daniel C. Howe and I are the writers of these readers and we, along with
other coded processes, struggled with Google, sending queries to its
“books” domain to see how many instances of thousands of
three-word phrases had already been inscribed as writing to be found and how
frequently they had been inscribed in the net’s textual corpus, if at all.
Many of you reading this will understand that this is far from being an entirely
novel approach. However, although our readers may seem to be following a simple
Markov chain, the actual processes and models deployed in
The
Readers Project conceal some significant differences to a standard Markov
model.
[22] More importantly
and finally, these readers were written with processes that hacked near-live
statistical data out of the Google-indexed internet corpus of all the inscribed
cultural material that can be found. Writers of readers like these could not have
made anything approaching their capabilities until very recently, or not without
huge, institutionally-maintained resources. We were and are able to make these
readers remarkably up-to-the-minute in their model-driven analyses of the texts that
they were written to read. They know what they need to know about the latest writing
to be found on the net in their domain. This knowledge was mined iteratively from the
language that we all gave over and continue to give over to Google and, in so far as
Google was uninterested in or threatened by the queries we needed to make in order to
gather our readers’ simple knowledge, that knowledge is the result of a fascinating
struggle that – for this reader at least – is a model in micro-procedure of the
struggles that we must all undertake as our institutions of culture pass over their
care and disposition to all those strange engines of inquiry that may suddenly reject
our search for writing. They reject our queries for reasons that we may not entirely
comprehend. Not yet and perhaps, not ever.