David M. Berry is Reader in the School of Media, Film and Music at the University of Sussex and co-Director of the Sussex Humanities Lab. His recent work includes Critical Theory and the Digital, the edited collection Understanding Digital Humanities and the co-edited collection Postdigital Aesthetics: Art, Computation and Design.
Erik Borra (♂) is researcher at and technical director of the Digital Methods Initiative. He is also a lecturer in the New Media and Digital Culture M.A. program at the University of Amsterdam. His PhD research focuses on the Web as a source of data for social and cultural research, specifically focusing on search engine queries, Wikipedia edit histories and social media data.
Anne Helmond is assistant professor in New Media and Digital Culture at
the University of Amsterdam. Her research interests include software
studies, platform studies, infrastructure studies, digital methods, and
web history. In her dissertation she examined the platformization of the
web which entails the extension of social media platforms into the rest
of the web and their drive to make external web data
Jean-Christophe Plantin is Assistant Professor at the London School of Economics and Political Science, department of Media and Communications. His research investigates the implications of big data and visualization technologies for civic participation and social science research. He is the author of
Jill Walker Rettberg is the author of
This is the source
This paper documents the results of an intensive
Mapping the fields of Digital Humanities and Electronic Literature by retrieving similar items from the Amazon API.
In this article we seek to tentatively explore the field of digital humanities
(DH) through the production of particular outputs of knowledge rather than the
tools that are used. In this we are, perhaps, acting counter to the often
remarked processual aspect of DH, that is, that digital humanities focuses not
just on the outputs but also on the processes involved in producing those
outputs, by, for example, creating data sets, digital tools, archives, etc. the computer (or the network
context) is in some way
This article brings two contributions to the field of DH. Firstly, we show the
relevance of the data sprint
method for DH inquiry, during which data are
collected and analyzed over a short period of time, offering a mezzo-level of
analysis between small and large datasets (or genre of the ‘flash’ book, written
under a short timeframe, to emerge as a contributor to debates, ideas
and practices in contemporary culture... interventions that go well
beyond a well-written blog-post or tweet, and give some substantive
weight to a discussion or issue...within a range of 20-40,000
words
Due to the data sprint
format of this project, several members were
involved in multiple projects at the same time, or were not working
co-locatedly. As a consequence, we heavily relied on online collaborative
applications in order to work remotely together. A specific Skype channel was
used for multiple purposes: after the tasks were divided between the various
members, it served as a means to let the others know which step was done. It was
also used to ask questions on a specific task. Furthermore, it served to quickly
transfer lists of ASIN numbers from the seed books, or to transfer .zip files
containing Gephi files resulting from requests to the Amazon API. Collaborative
spreadsheets on Google Drive were a means to collaboratively write descriptive
tables of seed books, but also to share first results from the graph (e.g. in a
spreadsheets showing Eigenvector Centrality, indegree or outdegree - see below).
Finally, a Dropbox was used to share .gexf (Gephi) files resulting from the
crawling, and to share the graphs after working on the visualisation. Data
sprints are based on reproducibility: the work done needs to be documented and
shared online in order to foster similar work and further developments. In
parallel to the collaborative online applications used to work together, we set
up a website
Secondly, this article demonstrates how digital methods emergent communities of interest on
the WWW by examining purchasing patterns
Kaplan’s typology of data sources in digital humanities scholarship digital culture
, as opposed to works based on
big cultural datasets
(such as the Google Ngram project mentioned
above) or offering digital experiences
(such as 3D virtual worlds). To
this extent, we follow Borgman smart
data
instead of large and messy big data.
In order to fetch data, we use here a
digital methods approach by repurposing an online device, the Amazon
recommendation system, to see how we can make use of web-native objects such as
recommendations for social and cultural research to deploy the logic of
recommendation cultures
editorial logic
which depends on the subjective
choices of experts
algorithmic logic
(idem)
to retrieve, organize and present relevant information. Our research explores
the algorithmic structure of today’s
informatic culture
In this paper we call the Amazon API for different countries to show the
relationships between different titles using the SimilarityLookup featurereturns up to ten products per page that are
similar to one or more items specified in the request
to map the fields
digital humanities and electronic literature. By focusing on country-specific
versions of Amazon we can visualise national networks of book purchases and
analyse differences and similarities for the fields per country or linguistic
area. The introduction of geo-location technology on the web national webs
which are
demarcated by devices such as search engines that go local
This paper is presented as a partial and tentative means of mapping a field, but
also as a moment in the developing field of digital humanities hermeneutics of screwing
around,
as Ramsay tools, data, and metadata
extend[s] reflection on core
instrumental technologies in cultural and historical directions
In this paper we use a form of social-network analysis that visualises the relationships between the different entities, in this case books, in our networks. As Alan Liu explains,
the premise of social-network analysis is that it is not individuals or groups but the pattern of relations between individuals or groups that is socially significant. Such an approach commonly produces analyses in the form of social-network graphs composed of nodes and connecting edges (also called ties) accompanied by metrics of degree, distance, density, betweenness, centrality, clustering, and so on. The goal is to describe a topology of social relations that allows researchers to understand, for instance, which nodes are pivotal to connections within communities.
One of the interesting outcomes of the practices involved in undertaking this
form of digital method is the "hermeneutics of screwing around" importance
of particular nodes. In other words, adapting the way the data is presented to
view it from different perspectives is a method that lets us surface patterns
and interesting features of the graph.
In our case study we used ten seed booksSimilarity is a measurement of similar items purchased, that is, customers
who bought X also bought Y and Z. It is not a measure, for example, of items
viewed, that is, customers who viewed X also viewed Y and Z.
Initially we repeated the request to reach a depth of three,
which includes the results, the subsequent results and subsequent result to
create a broad overview over the Digital Humanities (see Figure 1). However, for
subsequent analysis we decided to limit our request to the results and the
subsequent results (a depth of two) in order to limit the scope of our dataset
and complexity of analysis.
In other words, we fetched a maximum of ten recommendations per book to a depth
of two degrees. This generated a maximum of one hundred book titles for each of
the subject areas of digital humanities and electronic literature
The second data phase was organised around a comparative approach in relation to
data requests to the Amazon API for recommendations for the digital humanities
books in each of the following countries: ca
, cn
, de
,
es
, fr
, it
, jp
, co.uk
, com
. We requested our data for all ten
countries via the Amazon API, but only retained the four (US, UK, FR, DE) that
returned results for our requests.
These are the subject domain expert similar
books
per item, and then for each of these results the process was repeated. Using
Gephi, the ten data sets were then appended
into one master data set
that was used for the generation of the subject area visualisations.
The selection of seeds was somewhat heuristic, and based on discussion between subject domain experts on the most important books in each field. For the field of electronic literature, the selection took into account frequently referenced books as documented in the
Below are the initial seeds for the subject areas.
The following section describes the steps taken in creating network
visualisations using Gephi from the GEXF result files as output of
requesting the recommendations for the seed books with a depth of two. These
steps describe how a master set
was created from ten individual files
(corresponding with the seeds) for each of the queried local Amazons.
Described here is the process for creating the Digital Humanities .co.uk
master set
. Any deviations from these settings for creating the
other master sets are noted below.
Append Graphto add it to the first.
titleto Label to only show the book title as the node’s label.
Label Adjustlayout algorithm to prevent overlap for readability
statisticspanel
Average Degreeand note the resulting number
Graph Density, check
Directed, and note the resulting number
Network Diameter, check
Directed, leave other options empty, and note the diameter, average path length, and number of shortest paths (rounded to three decimals)
Digital humanities Amazon.de same but Force Atlas 2 Settings: Scaling 300, Gravity 1
Digital humanities Amazon.fr same but Force Atlas 2 Settings: Scaling 300, Gravity 5
Electronic literature Amazon combined same but Force Atlas 2 Settings: Scaling 300, Gravity 5
Electronic literature Amazon.com same but Force Atlas 2 Settings: Scaling 300, Gravity 5
This first graph was created by combining the ten master data sets from the ten digital humanities seeds (see Table 1) for Amazon.com with a depth of three and shows the overall Amazon recommendation network for Digital Humanities.
Within the graph visualisation (Figure 1) each node represents a book. It is a directed graph which means that the edges between the source (book A) and the target (book B) are directed (A points to B). In our case the edges represent recommendations so the source (book A) points to the target (recommended book B, C, D etc.). We retrieved recommendations to a depth of three, meaning that our data set includes the seed books (i.e. depth 0), books recommended in relation to the seeds (i.e. depth 1), books recommended in relation to the books in depth 1 (i.e. depth 2), and books recommended in relation to the books in depth 2 (i.e. depth 3).
What we see in Figure 1 is the clustering of particular books into more or less distinct genres or disciplinary groups. For example, we see that there are many connections between books on internet and technology, electronic literature and game studies, but fewer between game studies and book studies, or between book studies and speculative realism. A closer reading of this visualization by a topic expert on the Digital Humanities, David M. Berry, revealed that the books cluster around particular fields within the Humanities such as Digital Humanities, Media Studies, Literary Studies and related areas. We identified the following clusters from the visualisations produced: Game Studies, Electronic Literature, Internet and Technology, New Materialism, Digital Humanities, Book Studies, Deleuze Studies and OOO/Speculative Realism. Some books appeared to form bridges between the fairly distinct clusters, such as Hayles’
We also see here how, while some books are densely interconnected leading to the clustering in the graph, others simply lead away from the initial seeds. An example is Bate’s
We also see many connections between the seed books, which suggests that our initial choice of seeds was reasonably representative of the field.
Figure 1 and 2 show the network as generated from Amazon.com with a depth of three. If we look at the networks generated from individual country Amazon stores, we find important differences between the countries. In order to be able to compare the US, UK, French, and German Amazon we limited the crawl depth for recommendations to two. This means that we only looked at similar items suggested for our seed books as well what was recommended for those similar items.
Feeding the seed books for the digital humanities into the US Amazon (.com) generated a densely interlinked graph showing ninety-five individual books (Figure 3) with all seed books clustered in the middle. We also see that all the books bought by people who bought the seed books are connected to more than one seed book.
Here a group of central digital humanities books have clustered in the center of
the graph. Eight of the seed books are visible here
The first related field is electronic literature on the right of the graph in Figure 3. McGann’s
Ian Bogost’s
Hayles'
The UK Amazon graph shows 155 book titles with seven seeds clustered in the
middle
Figures 6 and 7 show the networks generated by feeding our English-language seed books into the French and German Amazon stores. In both cases, the seed books almost all disappear.
On the graph for Amazon.fr, only two books from the seeds remain:
Humanités numériques
Humanités digitales
The first cluster with the largest number of nodes (in pink/purple), is constituted by French sociology and media studies books, mostly written by academics but published by mainstream publishing houses which aims to foster general public debates about technology in society. On the right side, the cluster with blue nodes revolves around two books written by Milad Doueihi, a French-speaking classical historian:
On the graph for Amazon.de, five books from the seeds remain. The various clusters are less thematized than the other countries, and topical drifting appears. The cluster in the center, in green, contains digital humanities literature, and media studies books. The cluster in purple, in the bottom center, goes from media studies to philosophy and French theory, which echoes the close cluster on the right (in brown nodes) on posthuman theories. The two clusters on the top are related to information science (red nodes on the top left side) and to computer science books, mainly about Turing (green nodes, top right). The last cluster, at the bottom left with blue nodes, contains literary theory and novels.
Unlike the French graph, there are many English language books in the German graph, but interestingly enough the graph introduces German books on digital humanities such as
the technological conditionin German it acts as a bridge to related books in English.
Further comparing the various local Amazon domains under study we can see that .com has the most densely interlinked recommendations (see density and average degree in Table 3) while .fr is the sparsest. This of course also reflects how quickly other books are recommended (average path length).
Our analysis suggests that electronic literature is a far less cohesive field than digital humanities is in the USA, at least in so far as printed books can be said to represent the field. Figure 8 shows the network of all books returned for our electronic literature seed books in the four Amazon stores we searched, and you can see that a large number of books were returned for the seed books. The seed nodes, which are marked in yellow, are scattered around the graph rather than clustered in the center as in the US digital humanities graph in Figure 3, and they are not heavily recommended, as is indicated by the small size of most of the seed nodes. The two seed books within electronic literature that Amazon.com notes readers of other books buy are Hayles’
Further comparing the DH and electronic literature graphs from a more numerical point of view (see Table 4), we can see that although the electronic literature graph has fewer nodes, it is actually more densely connected than the DH one (see density and average degree). Looking at the average path length, one can see that it is easier to reach other books given a book related to EL according to Amazon.
The games studies cluster below
At the top of Figure 8, we see links leading away from electronic literature into discussions of the role of the book in a networked society and further away into book history and general discussions of bookmaking and the book business. This shows how one field drifts towards another, but we also see that this section of the graph is not interlinked and that there are few or no connections back into the more centrally placed books that are more closely related to electronic literature.
The upper right of Figure 8 shows a digital humanities cluster very similar to that generated by our digital humanities seeds, while the lower right hand side shows an interesting loosely connected cluster of works on conceptual poetry and writing and on digital poetics. We see Perloff’s
Far out to the right in Figure 8, we see a cluster of books about generative art and code art (Figure 11).
As you can see in Figure 8, this cluster is quite distant from the rest of the network, and is connected to it by a few clear brokers: Bartscherer and Coover’s recent anthology
Despite the fact that electronic literature research is scholarship about creative works of electronic literature, only three works of electronic literature show up in the graph, and these are works published on CD-ROM by Eastgate systems in the early 1990s. Most electronic literature is published online and is not part of Amazon’s database. We see in Figure 12 that Joyce’s seminal hypertext fiction
We see that people who bought
Studying a field only from the point of view of Amazon book recommendations is clearly not going to tell the whole story about digital humanities or electronic literature. This type of study of the field excludes other book sellers and traditional scholarly resources such as journals and journal articles and also excludes the outputs of digital humanities projects such as archives, TEI projects, websites, tools and code archives, and also works of art and literature which are central to the field of electronic literature. Also, it bears the temporal limitations imposed by the time constraint of the three-day data sprint format. Finally, there are also technical limitations using the Amazon API in our survey of the fields of electronic literature and digital humanities, such as a limit of ten recommendations per book and a maximum of 3600 requests to the API per hour.
Nonetheless, it does provide a new way of looking at the field. The sheer size of the Amazon database allows us to see interesting connections between the digital humanities and adjacent fields. Also, even with these caveats it is notable that the results, broadly speaking, do reflect clusters of what we might think of as fields of study, and the connection between them.
It is also important to realise that while the data does, according to Amazon,
give information about what other books are bought by customers who buy book X,
book sales don’t necessarily mean that the books are read, cited, used, or
influential. In addition, customers who buy digital humanities books also buy
other books on Amazon and if books are frequently bought together they will be
marked as
Our data from the French and German Amazon stores differed slightly from the English-language Amazon stores with fewer results. One reason is that we use the same, English-language books as seeds in all the national Amazon stores. While we could have chosen French and German language books, we thought the results yielded were still interesting, and many non-English titles rapidly appeared in these graphs. The seeds introduced similar language-specific items on the topic of digital humanities. This may point us to French and German-specific subfields of the digital humanities in their native language.
Based on these network graphs of different disciplines, we would conclude that they appear to have different styles of communication, at least in terms of the importance of printed books in the field. The field of digital humanities in the USA as viewed through Amazon’s SimilarityLookup is cohesive, with a relatively small number of books that are bought together. In France and Germany, on the other hand, we see that the field is far less well defined, and based on the books that are bought together, it is hard to say clearly what the digital humanities are or are not in these countries. Britain offers an intermediate position, where the digital humanities are understood in a less precise or perhaps broader manner than in the US, and where we see relationships to many more fields.
Electronic literature, as viewed through print books and Amazon’s SimilarityLookup, is a field that is far less cohesive than the digital humanities in the USA, and we see instead that books on electronic literature intermingle with books in related disciplines: new media studies, game studies and the digital humanities chief among them. Game studies in fact comes to the fore in the graph drawn from our electronic literature seeds, and appears to be a field almost as cohesive as the digital humanities in the US.
We mentioned earlier that there is some irony to analysing the digital humanities by looking at the books published in the field rather than at the digital projects and tools developed. But at least in the US, it appears that the field of digital humanities is very clearly defined by its books. Our diagrams would be an excellent starting point for a reading list for a newcomer to the digital humanities. Electronic literature, on the other hand, is not as easily described by this method. Perhaps more of the publications on electronic literature are entirely digital, whether as creative works or as shorter articles in online journals and other online publications, and thus they are not visible to Amazon. Or perhaps electronic literature is more interdisciplinary by nature, and thus people who read books about electronic literature read more broadly rather than focusing on that topic alone.
This research raises further questions that we think could be explored in relation to the research questions.
How can a more formal and defensible seeding strategy be developed? In the project, in common with other digital methods projects, a domain subject expert is used to generate initial data sources, seeds, and links. It would be interesting and useful to reduce or eliminate the seeding process such that once the general thematic area is identified, a standard seed generation methodology can be followed. This may still use knowledge elicitation from the subject domain expert, but would be formalised.
How can validation of the API results be implemented so that the API does not always return identical data for search queries? This is a result of how Amazon handles data fields that contain a super-set in relation to the capacity of the API return values. One method might be multiple requests and a smoothing algorithm to average the results from the API.
The ability to create the graphs from the API is extremely powerful and although
we undertook some limited secondary data generation, such as querying the Kindle
Highlights database
As an example of further research we ran some preliminary data requests to the Kindle Highlights database, however, the number of highlights in our texts was very low. Most digital humanists either do not read the digital (or at least, the Kindle) versions of the texts, or they do not highlight their digital versions. Nonetheless, with the growth in e-readers, iPads, tablets, and the like, we can expect this database to be of increasing interest to researchers undertaking similar projects to this in the future. For example, these are the most popular highlights in Kirschenbaum’s
forensic materiality rests upon the principle of individualization (basic to modern forensic science and criminalistics), the idea that no two things in the physical world are ever exactly alike.
The point is to address the fundamentally social, rather than the solely technical mechanisms of electronic textual transmission, and the role of social networks and network culture as active agents of preservation.
a digital environment is an abstract projection supported and sustained by its capacity to propagate the illusion (or call it a working model) of immaterial behavior.
Each of these sections has limited information associated with it, although it is noticeable that no page number is given – Kindles do not have page numbers as part of the product.
As an exploratory approach to mapping a field or disciplinary area of research, this approach has much to recommend it. It provides a useful entry point for drawing up an initial map of the field and for developing understanding of the way in which books provide a structure for a field's development. Whilst we wish to reiterate the limitations of this approach, particularly in view of the digital nature of the two fields we chose for comparison, digital humanities and electronic literature, and the resultant absences in the data and visualisations that are created, we nonetheless think that used appropriately it is a method that is very amenable to an exploratory method of field-mapping.