Abstract
Classifying and categorizing the activities that comprise “digital
humanities” has been a longstanding area of interest for many
practitioners in this field, fueled by ongoing attempts to define digital
humanities both within the academy and in the public sphere. The emergence of
directories that cross traditional disciplinary boundaries has also spurred
interest in categorization, with the practical goal of helping scholars
identify, for instance, projects that take a similar technical approach, even if
their subject matter is vastly different. This paper tracks the development of
TaDiRAH, the Taxonomy of Digital Research Activities in the Humanities developed
by representatives from DARIAH, the European cyberinfrastructure initiative, and
DiRT, a digital humanities tool directory. TaDiRAH was created specifically to
connect people with information on DiRT and in a DARIAH-DE bibliography, but
with the goal of adoption by other directory-like sites. To ensure that TaDiRAH
would be usable by other projects, the developers opened drafts for public
feedback, a process which fundamentally altered the structure of the taxonomy
and improved it in numerous ways. By actively seeking feedback from the digital
humanities community and reviewing data about how the source taxonomies are
actually used in order to inform term selection, the development of TaDiRAH
provides a model that may benefit other taxonomy efforts.
1. Introduction
Classifying and categorizing the activities that comprise “digital
humanities” has been a longstanding area of interest for many
practitioners in this field, fueled by ongoing attempts to define digital
humanities both within the academy and in the public sphere. The emergence of
directories that cross traditional disciplinary boundaries has also spurred
interest in categorization, with the practical goal of helping scholars
identify, for instance, projects that take a similar technical approach, even if
their subject matter is vastly different. This paper tracks the development of
TaDiRAH, the Taxonomy of Digital Research Activities in the Humanities developed
by representatives from DARIAH, the European digital infrastructure initiative,
and DiRT, a digital humanities tool directory. TaDiRAH was created with the
short term goal of enhancing discoverability of resources in the DiRT directory
and the DARIAH-DE bibliography while also anticipating adoption by other digital
humanities directory-like sites. To ensure that TaDiRAH would be usable by other
projects, the developers opened drafts for public feedback, a process which
fundamentally altered the structure of the taxonomy and improved it in numerous
ways. By actively seeking feedback from the digital humanities community and
reviewing data about how the source taxonomies are actually used in order to
inform term selection, the development of TaDiRAH provides a model that may
benefit other taxonomy efforts.
2. Motivating factors
2.1 DiRT
Since the inception of the DiRT Wiki in 2008, the site has used an ad-hoc
set of categories. In its original form, each category represented a
wiki page where tools were listed. In 2010, a migration to the Drupal
content management system gave each tool its own profile page. However,
even under this new structure, the original categories persisted as a
way to organize the individual profiles. Starting in late 2012, the DiRT
Directory undertook an assessment of its categories, with the goal of
identifying any gaps, phasing out rarely-used terms, and adding new
terms to better reflect the scope and nature of the tools presented in
DiRT. Early investigation into how the categories were being used
produced striking results. Entire classes of commonly-used digital
humanities tools were largely rendered invisible through the lack of an
obviously matching category. For instance, optical character recognition
(OCR) tools were scattered between the categories of “data
conversion”, “transcription”, “data collection” and
“annotation”. The DiRT categories clearly needed revision; when
an opportunity arose to collaborate with DARIAH-DE on a taxonomy that
could be shared across multiple sites with different purposes, doing so
was clearly preferable to undertaking another isolated effort.
2.2 DARIAH-DE “Doing Digital Humanities” bibliography
The other partner in the development of TaDiRAH was a working group in
DARIAH, the European initiative to build a “digital research infrastructure for the arts
and humanities” (see [
Romary 2014]). This
working group was concerned with research and education and formed part
of DARIAH-DE, the German contribution to DARIAH. The group aimed to
establish a principled overview of the research methods and procedures
shared by humanities scholars. Presented as prose, such an overview
(published in [
Reiche, Becker, Bender, Munson, Schmunk, Schöch] could help newcomers
understand the field more quickly and could connect established and
digital research by demonstrating their fundamental methodological
similarities. Presented as a taxonomy or ontology, the same overview
would form the basis for discovery systems for bibliographic references,
tutorials or research projects. The first use case for such a taxonomy
was DARIAH-DE's “Doing Digital Humanities”
public bibliography on Zotero
[1], which
became a test bed for a keyword system that would become one source of
seed terms for TaDiRAH.
3. Digital humanities taxonomies
The taxonomy of digital research methods described here builds on previous
work towards a structured and principled overview of the complex field of
digital humanities. The following three approaches were particularly
influential for the development of TaDiRAH.
The goal of providing an orientation to and a means to think about the field
are at the heart of McCarty and Short's idea of the methodological commons.
Using the metaphor and the tool of “mapping” to represent the complex
“terrain” of the digital humanities, McCarty & Short (2002)
suggest a map which has the “methodological commons” at its center.
They define the methodological commons as “an abstraction for the computational methods that the various
disciplines of application share”, which functions as a space of
encounter between “disciplinary groups” and “areas of learning”.
The research methods are not named, but are represented through broad data
types with which they are associated, such as “narrative text”,
“images”, or “music”. McCarty and Short's data types are
roughly equivalent to TaDiRAH's “Object” category.
With a similar goal of structuring and abstracting from individual research
undertakings, but using a somewhat different approach, John Unsworth
proposed in 2000 a short list of “Scholarly Primitives” of research,
particularly in the humanities. Contrary to McCarty and Short, Unsworth
defines scholarly primitives as “activities
[which] are basic to scholarship across eras and across
media”
[2]; so fundamental that
they form “basic functions common to
scholarly activity across disciplines, over time, and independent of
theoretical orientation.” Unsworth's tentative list was the
following: Discovering, Annotating, Comparing, Referring, Sampling,
Illustrating, Representing. Unsworth’s formulation laid the groundwork for
the TaDiRAH team to include both methods and objects into the taxonomy, and
to keep them as separate entities with the potential to enter into multiple
and changing relationships.
Another, very ambitious undertaking, situated at an even higher level of
abstraction, is documented by [
Benardou, Constantopoulos, Dallas, and Gavrilis 2010]. Their outline
of a “Conceptual Model of Scholarly Research
Activities” is based on an activity model which includes
inter-related entities such as research activities, research goals, methods,
procedures, tools, information objects, and actors.
[3] This model does not analyze research
methods in isolation, but contextualizes them in the framework of the entire
research process. From the beginning, TaDiRAH was meant to be designed in a
way that would allow it to be integrated into such larger undertakings.
4. Sources and alignment
TaDiRAH terms were derived from three very different sources: the DiRT
categories, the DARIAH-DE tag set and the arts-humanities.net taxonomy. The
sources were not selected as part of a long process of research and
deliberation but were pre-determined based on the particular use cases that
motivated the work. The task of alignment involved mapping the taxonomies to
one another, while attempting to address differences in structure, scope and
granularity as they arose.
4.1 DiRT categories: tools classed by function
The categories used by the DiRT directory were developed as an ad hoc
classification of digital research tools used in the arts and
humanities, for use on the original DiRT wiki. Despite the positioning
of DiRT as a humanities-oriented “digital research tools”
directory, most of the tools listed in DiRT originated in other domains,
and have non-research applications. The categories form a flat list of
32 tool functions without scope notes, and are accompanied on the site
by a rapidly growing and unwieldy list of uncontrolled tags. As
illustrated in the frequency distribution chart below, three categories
(data analysis, visualization, and annotation) have a significantly
higher number of tools associated with them than the other categories;
these were candidates for division into a series of more granular
categories.
4.2 Arts-humanities.net taxonomy: projects classed by method
Arts-humanities.net was developed by the Centre for e-Research at King’s
College London and had its origins in a 2008 collaboration between the
Information and Computer Technology Guides database and the Arts
Humanities Research Council ICT Methods Network. Its primary purpose was
to promote and provide access to information on the use of digital tools
and methods for research and teaching in the arts and humanities. In
support of that mission, it was home to a directory of projects, events,
centers, case studies and other information. It was also home to a
methods taxonomy where “methods” refer to “computational methods
used by artists and humanists.”
According to the project’s former website, the taxonomy is based on one
originally developed by the former Arts and Humanities Data Service
(AHDS) for classifying projects. The taxonomy structure is broad and
deep containing seven top level methods categories, which include
between eight and twenty-five specific methods each; see below for the
distribution of top-level categories across all content on
arts-humanities.net. All but the top-level categories, which appear to
be used primarily as guide terms to navigate through the taxonomy,
include detailed scope notes. Taxonomies such as this, with complex
hierarchies that are both broad and deep, tend to be faced with
challenges regarding consistent application, updating and maintenance
(NISO).
4.3 DARIAH-DE tag-set: digital humanities literature classed by
research activity
The DARIAH-DE taxonomy was developed specifically for implementation in
the Zotero-based “Doing Digital Humanities”
bibliography. It was informed both by the arts-humanities.net methods
taxonomy, Unsworth’s formulation of scholarly primitives, and the idea
of phases in the research lifecycle. The focus was primarily on
research-based activities and objects. It originally consisted of nine
top-level interdisciplinary activities that matched methods from the
arts-humanities.net taxonomy, as well as a second level of forty-nine
methods and a limited list of object types.
4.4 Taxonomy alignment
There are two basic approaches to human-mediated taxonomy development:
top-down and bottom-up. A top-down approach begins at the top level of
the hierarchy, followed by the second level and so on. It starts with
developing the basic structure or framework. In contrast, the bottom-up
approach is informed by an existing collection of content (documents,
objects, datasets, vocabularies, etc.) with the resulting scheme
emerging from that content (see [
NISO 2010]. All of the
source vocabularies described above were designed using some combination
of the above approaches. The development of TaDiRAH, which is based
entirely on existing taxonomies, was the product of a bottom-up process.
Adopting a bottom-up approach made the most sense given our limited
resources and pragmatic aims. To begin with a top-down process would
have meant starting over. It also had the advantage of being user
centric and was aligned with the principles of
user warrant
and
literary warrant (see [
NISO 2010]
leveraging terms already in use – terms which had been applied to
existing content by different classes of users in the DH community: tool
users, developers and practioners adding and tagging content in DiRT and
scholars adding and tagging content in the DH bibliography.
Given the differences in structure, scope and granularity across the
selected sources, the team was faced with a number of challenges
including:
- Making distinctions between goals, methods, and techniques (i.e.
all are related to activities).
- Specific terms that could be mapped to more than one category
(e.g. modeling).
- Terms with several different meanings (e.g. visualization as an
activity or as an object).
- Categories combining multiple concepts which required decisions
about whether something was distinct enough to stand on its own as a
separate category (e.g. storage vs. storage and
dissemination).
- One to many relationships between goals/methods and techniques
(e.g. mapping).
Each source taxonomy was created with a slightly different purpose.
Aligning the sources detailed above was an iterative process that
included human-mediated matching, review and discussion. The larger
challenge was to go beyond identifying similarities among the
taxonomies, and review differences in turn, considering the nuanced ways
in which a term or series of terms might be interpreted and applied.
Following some general guidelines,
[4]
the team began with an analysis of the existing DiRT taxonomy and how it
was applied across DiRT content. That was followed by a series of
mappings, starting by mapping DiRT to DARIAH-DE, identifying and
resolving points of poor alignment. It was much more challenging to map
the results of the DiRT/DARIAH alignment to the very granular
art-humanities.net taxonomy. Fortunately, the latter included detailed
scope notes which facilitated discussions and decision-making throughout
the process. The final step before releasing TaDiRAH for public comment
was adding scope notes to all terms in the final draft.
5. Public review process & revisions
On September 12, 2013, the team sent out a call for feedback on the first
public draft of the taxonomy, via the Humanist Discussion Group. The public
draft
[5]
was open for comments for a two-week period, where it received over 60
comments from individuals outside the project team. In addition, the team
received multiple emails pointing to published and unpublished work in the
area of digital humanities taxonomies. The comments ranged from suggesting
that scope notes be rephrased to discussions about the best choice among a
set of near-synonyms to use as the label for an overall concept
(“scanning” vs. “imaging” vs. “digitizing” was the
subject of particularly active debate).
Prior to the first round of feedback, specific techniques (such as
“stylometry” and “topic modeling”) occupied the second level
of the taxonomy, as did much broader terms, like “publishing” and
“annotating”. Multiple commenters pointed out this issue, and
recommended that “techniques” be moved to a separate list. This was
preferable to creating a third level of the taxonomy, as it would be
difficult in some cases to unambiguously assign a technique to only one
parent term. Having a separate list for techniques had the added benefit of
better supporting the rapid evolution of new techniques, without requiring
constant revision to the core TaDiRAH terms; like the “objects” list,
“techniques” would be an open list.
Due to an extensive amount of detailed and thoughtful feedback, the process
of revising the taxonomy took almost five months of periodic meetings and
asynchronous work. In early February 2014, the taxonomy team opened a
revised draft for feedback, this time for a week. While there was still
considerable engagement (with over 20 external comments), comments in the
second round were mostly focused on smaller issues of phrasing, and there
were no fundamental challenges to the structure of the taxonomy. Within a
week after the feedback period closed, the taxonomy coordinators were able
to incorporate the proposed changes and release the first public version of
the taxonomy, version 0.5.
6. Current version of TaDiRAH
Realizing that there needed to be some easy way to refer specifically to this
taxonomy, the organizers devised the name “TaDiRAH”, which stands for
“Taxonomy of Digital Research Activities in the Humanities”. The name is also a near-anagram of
DARIAH and DiRT, the organizations responsible for its development. The only
Google result for TaDiRAH at that time indicated that it was the name of a
child’s dragon on the virtual pet website Neopets; in honor of that
creation, the team adopted a silhouette of a dragon as the TaDiRAH logo.
In its current version, TaDiRAH consists of several sets of terms: two closed
sets of so-called Research Activities, one with eight top-level categories
that represent broad research goals, and below that a second
set of more fine-grained research methods. In addition, there
are two open lists, one representing specific research
techniques and one representing research
objects to which methods and goals can be applied.
The goals roughly cover the entire research process — Capture, Creation,
Enrichment, Analysis, Interpretation, Storage and Dissemination. An
additional “meta” category includes activities that transcend all other
categories (e.g. “Assessing” or “Community Building”). Each goal
includes three to seven methods, with the methods section of TaDiRAH
containing 40 terms in all. For example the research goal of "Capture" can
be achieved using the following methods:
Capture
..Conversion
..Data Recognition
..Discovering
..Gathering
..Imaging
..Recording
..Transcription
There are two separate open lists of terms that can be associated with terms
from the goals/methods section. They include 36 terms representing a wide
range of digital research objects (e.g. “text”, “metadata”,
“manuscript”) as well as 34 terms representing specific research
techniques (e.g. “Topic Modeling”, “Debugging” or
“Gamification”). For example, a tool such as SIGIL, which is used
for creating eBooks, could be tagged with the terms “Creation” (goal),
“Writing” (method), “Encoding” (technique) and “Text”
(object). A tool such as QGIS could be tagged with the terms “Analysis”
(goal), “Spatial Analysis” (method), “Georeferencing” (technique)
and “Maps” (object).
7. Early adoption
Apart from DiRT and the DARIAH-DE bibliography that were the initial test
environments for the taxonomy, other initiatives have shown interest in
TaDiRAH or have already begun to apply it to their content.
Applications that emerged from the taxonomy development team’s own work
include the use of TaDiRAH within the DARIAH Teaching Resources Registry
available on the project website
[6] and
the DHCommons project directory. DHCommons is still in the process of
developing a new project profile schema that can accommodate the projects
originally stored on arts-humanities.net; TaDiRAH categories will be a core
part of the new profiles once they are deployed. An implementation on the
DARIAH-DE portal (DARIAH Germany’s website) is planned and currently under
way. The “Doing Digital Humanities” bibliography
on Zotero has also adopted the current version of TaDiRAH.
In addition, a variety of projects and initiatives have adopted TaDiRAH for
structuring their data. One example is the draft of a DH Course
Registry
[7] hosted by the
Dutch CLARIAH initiative in collaboration with DARIAH-EU. In the near
future, each member country will provide an overview of digital humanities
courses that are being offered in that country, including a visualization
and links to each of the programs. A consistent classification built on
TaDiRAH keywords will support a well-interlinked European digital humanities
landscape. There has also been discussion about applying TaDiRAH to the
classification of in-kind contributions within DARIAH-EU.
Individuals working on smaller projects that deal with structured data have
also requested to use TaDiRAH. For these projects, it is especially
attractive to implement a widely-used taxonomy in order to become or remain
visible within the scholarly community. Examples include “Zeitschrift für Digital Humanities”
[8], a
digital humanities journal that is currently being designed in Germany and
which is considering using the taxonomy to classify contributions, as well
as the German DHd-Blog
[9],
which is considering TaDiRAH as a way of tagging posts.
8. Current and future development
8.1 Dissemination
TaDiRAH is publicly available via GitHub under a CC-BY license. In
addition to the actual taxonomy containing activities, objects, and
techniques, the repository includes information on coordinators, the
initiatives using TaDiRAH, and related presentations and publications.
GitHub also has support for versioning and issue tracking. When users
report issues encountered when implementing TaDiRAH in their own
projects, this will lead to improvements in future versions.
In support of the pragmatic goal of creating a resource that would be
widely available to a distributed community of users and which could be
applied in a variety of contexts, it was important to provide a
standards-based machine-readable version. The taxonomy team used an
instance of TemaTres Vocabulary Server (Ferreyra 2014) hosted by DARIAH-DE
[10] It produces SKOS core
and makes TaDiRAH available as linked open data. The SKOS version is
available via GitHub
[11] and through
TemaTres as a SPARQL endpoint
[12].
8.2 Further revisions
The taxonomy team will remain in close contact with the groups
implementing TaDiRAH on the DiRT directory, the DARIAH “Doing Digital Humanities” bibliography, and
DHCommons, in order to identify opportunities to revise TaDiRAH. Growing
interest in the use of TaDiRAH may require a more formal review of best
practices for documentation, sustainability, and governance of public
value vocabularies, particularly those made available in a linked open
data environment. Standards for the latter are still a work in progress
but important discussions have begun among stakeholders in the
information standards community including W3C, NISO, DCMI and
others
[13].
The taxonomy team intends to make TaDiRAH multilingual, a feature
requested by the community and supported by TemaTres. There have already
been volunteers from several countries to help achieve this. Interactive
visualizations are another potential feature that may help users
navigate the taxonomy. Here, first experiments have been made using SKOS play!
[14] which might be
integrated in future versions.
8.3 NeDiMAH
Within the broader DARIAH context, the work on TaDiRAH was originally
viewed as part of the ongoing cooperation between the research and
education working group of DARIAH and NeDiMAH, the Network for Digital
Arts and Humanities. NeDiMAH’s goal is to “contribute to the classification and expression of
digital arts and humanities”
[15] by developing a theoretically sound ontology that can
classify work in digital arts and humanities, thereby contributing to
its visibility and academic credibility.
[16] Ontology
development has moved forward in collaboration with DARIAH as there was
a considerable overlap and synergy between the two organizations.
Developing this ontology is necessarily more challenging and
time-consuming than developing TaDiRAH, which has a pragmatic
orientation. Nonetheless, the TaDiRAH team intends to work with the
NeDiMAH team to ensure that the taxonomies are interoperable to the
greatest extent possible.
9. Conclusion
TaDiRAH is currently available in version 0.5 in several human-readable as
well as machine-readable forms. It has been implemented in several different
contexts, has been presented at several high-profile venues (including the
Digital Humanities Conference, see [
Borek, Dombrowski, Perkins, Schöch 2014] and the Dublin
Core Metadata Initiative Conference, see [
Perkins, Dombrowski, Borek, Schöch 2014],
described in dh+lib (see [
Dombrowski2014], and is likely to
be adopted by more projects. What can be learned from TaDiRAH's brief
history so far is that a bottom-up approach such as the one adopted by
TaDiRAH can work if a small but sufficient number of people can be brought
together to collaborate on a common goal. In the context of a taxonomy
aiming to attain a certain level of consensus from the community of
potential users, it has been essential to quickly publish and publicize
preliminary versions of the taxonomy in order to gain the broadest range of
feedback possible. This seems particularly true in the context of a taxonomy
that prioritizes pragmatic adoption over theoretical soundness. The TaDiRAH
team has sought to learn as much as possible from previous practical
implementations in the same domain of digital research methods in the
humanities, while seeking to address issues arising in projects that
implement TaDiRAH. We strongly believe that the best way to move forward and
learn more about the strengths and possible limitations of TaDiRAH lies in
actual experience with using it in various contexts. In this sense, the
value of TaDiRAH and the true potential of the development approach
undertaken here will increase as more projects adopt it and feed their
experience into future versions of TaDiRAH.
10. Acknowledgements
The authors would like to thank Matt Munson for his work as part of the team
responsible for the initial development of TaDiRAH.
Works Cited
Bedford2013 Bedford, Denise. “Evaluating classification schema and classification decisions”.
Bulletin of the American Society of Information Science and Technology,
December/January 2013, 39, no. 2: 13–21. doi: 10.1002/bult.2013.1720390206
Benardou, Constantopoulos, Dallas, and Gavrilis 2010 Benardou, Agiatis, Panos Constantopoulos, Costis Dallas,
and Dimitris Gavrilis. “A Conceptual Model for Scholarly
Research Activity.”
iConference 2010 Proceedings, 2010, 26–32.
Borek, Dombrowski, Perkins, Schöch 2014 Borek,
Luise; Quinn Dombrowski; Jody Perkins; Christof Schöch. “Scholarly primitives revisited: towards a practical taxonomy of digital
humanities research activities and objects”, short paper, Digital
Humanities Conference 2014, Lausanne, Switzerland, July 7-12, 2014,
http://dharchive.org/paper/DH2014/Paper-504.xml Hughes, Constantopoulos, Dallas 2015 Lorna
Hughes, Panos Constantopoulos, & Costis Dallas (In print). “Digital Methods in the Humanities: Understanding and
Describing their Use across the Disciplines.” In: S. Schreibman, R.
Siemens, & J. M. Unsworth (ed.). A New Companion to Digital Humanities.
Oxford: Wiley-Blackwell, 2015.
McCarty 2002a McCarty, Willard: “Humanities computing: essential problems, experimental
practice.”
Literary and Linguistic Computing 17, no. 1 (2002):
103-125.
Mccarty 2003 McCarty, Willard: “Humanities computing.” Encyclopedia of Library and
Information Science 2, 2003.
NeDiMAH NeDiMAH: Network for Digital Methods in the
Arts and Humanities. Funded by the European Science Foundation (ESF), 2011-2015.
http://www.nedimah.eu/ Perkins, Dombrowski, Borek, Schöch 2014 Perkins,
Jody; Quinn Dombrowski, Luise Borek & Christof Schöch: “Building Bridges to the Future of a Distributed Network: From DiRT
Categories to TaDiRAH, a Methods Taxonomy for Digital Humanities,”
DCMI International Conference, Austin, Texas, October 8-12, 2014.
Reiche, Becker, Bender, Munson, Schmunk, Schöch Reiche, Ruth, Rainer Becker, Michael Bender, Matt Munson, Stefan Schmunk, and
Christof Schöch. Verfahren der Digital Humanities in den Geistes- und
Kulturwissenschaften. DARIAH-DE Working Papers. Göttingen: DARIAH-DE, 2014.
urn:nbn:de:gbv:7-dariah-2014-2-6.