Introduction[1]
Editions, in print or digital, have a prominent place in humanities research as they
make primary sources accessible and are the foundation for scholarly argumentation in
many disciplines. Allowing the unambiguous citation of an edited text is an important
task of a scholarly edition since it makes research processes retraceable and
transparent. This has not changed in the digital age, however, despite the digital
medium posing new challenges for editors and users of scholarly digital editions
(SDE) alike.
The Institute for Documentology and Scholarly Editing (IDE) has developed a checklist
to evaluate scholarly digital editions. Two items of this catalogue deal with the
assessment of the citability of such editions:
- 1.2. Bibliographic identification of the reviewed SDE. An SDE
should be identified in terms similar to traditional bibliographic
descriptions: a title, the responsible editors, other responsible persons and
institutions, the dates of its publication (initial, versions, last
modification), and the address (in terms of web-URL or other naming conventions
like DOI, URN or PURL) should all be evident. Any difficulties extracting such
information from the SDE should be remarked upon in the review. [Sahle et al. 2014, 1.2]
- 4.8. Identification and citation. Are there persistent
identifiers for the objects of the SDE? Which level of the content structure do
they address? Which resolving mechanisms and naming systems are used? Does the
SDE supply citation guidelines? [Sahle et al. 2014, 4.8]
The first quote from the catalogue is concerned with the possibility to extract
suitable bibliographic data to cite an SDE correctly. It is important that the SDE
should be identifiable in the same manner as traditional print publications. However,
besides traditional bibliographic data such as the title, editor(s) and date of
publication, the version or the date the digital object was last changed is also
indicated and the Internet address is important for citation too. Two other crucial
aspects of digital editions are mentioned in the second quote: does the SDE use
persistent identifiers and does it provide citation guidelines for users? Print
editing knew similar concepts: updated versions of a book are published as second,
third and further subsequent editions, the ISBN is recognised as a persistent
identifier and the title page as the location where all the bibliographic information
can be found. When looking at digital editing however, updating an edition, providing
persistent identifiers and bibliographic data for citation are not handled in a
similar manner and pose new challenges.
Persistent names. URLs are globally unique identifiers that
could be used for persistent addressing of online resources [
Berners-Lee 1998]. However, as a URL’s primary function is to locate
resources, they do not guarantee that the information they present to stays the same
over time or that the entire web page they point to will still be available in two,
five, or ten years. The unreliability of URLs for citation is well known and causes
problems for research in different disciplines [
Klein et al. 2014]. Permalink
and Persistent Identifier (PID) strategies have been developed to ensure resources
can be cited using names that are globally unique and stable over time [
Arnold and Müller 2017]. However, it must be emphasised that the stability of
permalinks and PIDs is primarily an organisational challenge as the assigning
institution must put policies in place for assignment, management, and maintenance of
identifiers and ensure correct resolving [
Shafer et al. 1996]
[
Brugman, 2009]. Consequently, persistence of online resources cannot be
guaranteed without commitment.
Versioning. Closely linked to stable addresses is the question
of how to deal with changes of digital objects over time. A content provider may
change the content of a website at any time and without any explicit indication of
the changes [
Jones et al. 2016]. From a scholarly citation perspective, this
causes several issues, the foremost of these is to the user who needs to cite a
specific version of the webpage content as she consulted it. For instance, this is
particularly true for users of legal texts the citations of which must refer to a
determined version. This is also a relevant issue for scholarly publishing and the
recent publication by Broyles has highlighted the relevance of this topic for the
digital editing community [
Broyles 2020]
[
Bürgermeister (forthcoming)]. Critical editions are the basis of textual
scholarship and constantly changing texts cannot be reliably cited. Wikipedia is an
example of a website with constantly changing content, with every editing of an
article resulting in a new version with a permalink. Wikipedia deals with this issue
in the following way: The most recent authorized revision of an article can be
accessed via a
canonical URL that always points to the most recent,
“accepted” version of an article. However, older versions can still be
accessed using their permalink which is also strongly recommended by Wikipedia
itself.
[2] When a user accesses an older version, a warning
message appears indicating that newer versions of this article are
available.
[3] In TEI editions, changes and revisions can be documented
using the
<revisionDesc> in the header section of the TEI document
by providing a list of changes made to a document.
[4] The benefit of this approach is that the editor(s) may provide
a descriptive summary of changes, which provide additional information about the
people involved, the parts of the documents that were effected, etc. and this can be
as detailed as the editor(s) require it to be. However, a disadvantage is that it is
very impractical to record all changes to a file in the TEI document and consequently
older versions cannot be recovered using the TEI method alone.
Granular citation. A citation should be as precise as possible
to make it easier for a reader to address and verify the source. In academic papers
and scholarly discussion, it is necessary to cite not only entire works but to be
precise sub-entities such as the page numbers of a book or verse and the number of a
poem. In the humanities, established granular citation methods are used to cite
chapters, paragraphs, pages and lines of a print publication or manuscript. A
frequently used example for canonical numbers is that of the books, chapters, and
verses of the Bible. These entities can be used to cite passages of the Bible across
different editions and translations which ensures, for instance, that John 3:16
(“For God so loved the world, that he gave...”) can be
found in any Bible version or translation under the same number.
[5] One must distinguish
between semantic numbers, which are content related (e.g. chapters of a book), and
visual numbers, which are object related (e.g. folios and lines of a manuscript) [
Kalvesmaki 2014]. This requirement has not changed for online resources;
however, the implementation of granular citation options is very different in a
digital editing context. From a technical perspective, granular referencing means
referring to individual resources within a website, using an URI, or to a place
within an electronic resource, either by addressing a position (e.g. byte position or
pixel in an image) or the relevant markup structure. For a digital edition editor, it
is important to evaluate what kind of granular citation strategy is required. It can
be argued that traditional forms of citation of a source text, such as canonical
numbers, can and should be considered and implemented in a new digital edition.
Citation practice. Several publications have pointed out that
despite the availability of an increasing number of digital editions these are not
being cited by the majority of students and researchers in the humanities who prefer
to cite print publications instead [
Porter 2013]
[
Rosselli Del Turco 2016]
[
Driscoll and Pierazzo 2016, 15]
[
Blaney and Siefring 2017]. Blaney and Siefring’s British History Online and Early
English Books Online survey has shown that users are both concerned about the
stability of digital resources and that they do not know how to cite digital
resources correctly. They suggest that creators of web content can contribute to
facilitate the citation of digital resources by providing recommendations on how to
cite a resource and by implementing a solid permalink strategy [
Blaney and Siefring 2017, 46–7]. The relevance of the topic is also highlighted
by an article about a recent survey on the use researchers make of digital editions
and their expectations. The authors report that participants indicated that the
provision of suitable citation information is still a neglected issue [
Franzini et al. 2019]. Consequently, clarity about how and what to cite in an
SDE is vital information that should be communicated to a user.
The following study attempts to explore this topic further. In a survey conducted by
the author of this study, the citation recommendations and permalink strategies of
670 digital editions, published between the 1990s and 2020, have been analysed. The
primary goal of the survey was to collect data about how digital editions ensure
citability of their content by:
- providing citation recommendations
- ensuring the content can be cited with permalinks or PIDs
- implementing a versioning strategy for its data
The quantitative analysis of the collected data will show developments that are,
hopefully, not only representative for this sample, but applicable to the wider SDE
community. In addition to the quantitative analysis, various citation strategies
applied in different editing projects will be illustrated by taking a closer look at
some of the editions in the list. The goal is to identify developments/strategies
that are frequently used and which also appear to have proven value as effective
solutions.
The full list of editions and data on which the analysis, charts and conclusions are
based is provided in a Github repository. All charts published in this paper were
produced by the author himself and are based on the compiled dataset. The Github
repository also contains the Jupyter notebook used to produce the graph
visualisations.
[6]
Sampling strategy and data collection
Patrick Sahle defines SDE as “scholarly editions that are guided
by a digital paradigm in their theory, method and practice”
[
Sahle 2016]. Following his own definition, Sahle has compiled a
catalogue of 670 editions over the last two decades.
[7] Sahle included several
important properties of the catalogued editions and some of them, such as date,
subject, language, and the category “Edition source period,” were used for
analysis purposes in this research. Figure 1 shows the distribution of these
categories across Sahle’s catalogue. However, at the time of writing this article,
the catalogue included only occasional information about the availability of
permalinks or citation information (for 85 editions). Therefore, all 670 editions had
to be inspected to collect relevant data to assess their citability.
The study of a temporal development was desired as an original assumption of the
author was that more recently developed editions show a greater awareness for issues
of citation and more frequently use permalinks or PIDs for their resources. However,
there are two main problems in trying to assign a date to a digital edition. First,
some digital editions do not clearly state a date of publication. In print publishing
it is standard to have a date of publication printed either on the title page or on
its reverse side. Digital editions do not always provide an official publication
date, some provide a period of time when the edition was under development, but
already available as alpha or beta version, and a few digital editions do not publish
any dates or include only vague information about their development. Second, digital
editions may have changed over time (and indeed many did) and their interface and
structure might have looked different ten or fifteen years ago. Consequently, a
“How to cite?” page or a citation info pop-up may have
been added more recently and may not have been present in an earlier release of the
edition. To balance these issues and to facilitate a temporal comparison, the author
divided the time between the 1990s (the period of development of the earliest edition
in our data set) and 2020 into four periods: pre 2006, 2006-2010, 2011-2015, and
2016-2020. Each edition was assigned to one of these four periods depending on the
publication date listed in Sahle’s catalogue. The Internet Archive’s Wayback Machine
was consulted for earlier editions (pre 2016) as it stores early snapshots of many
editions. The URL of the snapshot that was consulted was recorded in the dataset for
reference. By looking at an early snapshot it was often possible to see if a citation
recommendation was already present earlier in the edition or if it was added later.
For several editions no early snapshot was available, and I had to use a more recent
snapshot or create a new one.
Each edition in Sahle’s catalogue was inspected and the presence of citation
information was evaluated as follows. Data about three categories was collected:
- the presence of a citation recommendation with or without examples of how to
cite the edition or parts of it
- the presence of PIDs or permalinks
- the presence of any version information (of individual objects or the entire
edition)
What made the assessment slightly difficult, is that digital editions do not yet have
an established structure. Consequently, the citation information could be found in
different places. However, after looking through several editions, the author
identified the most likely places to look for citation information:
- Some editions have bibliographic data or an example of how to cite the
edition directly on the start page.
- Some editions have their own citation page which can be accessed through a
link in the top navigation.
- Occasionally the footer includes bibliographic data, citation recommendation
or a link to a page with citation information.
- The “About (this edition),”
“Project,”
“Copyrights and permissions” pages sometimes contain
a section explaining how to cite the edition and parts of it.
- The impressum may contain citation information.
- Some older editions provide an entry on how to cite the edition in the FAQ
section.
- Citation information could be available directly with the edited texts,
facsimile images, etc.
- And finally, it should be mentioned that some editions use a system of
“implicit citation” by providing structural entities within an edition,
but no explicit recommendation on how they should be used for citation and
consequently they were also not counted as providing citation recommendations.
This variety of options highlights a core problem for users of SDEs. When moving from
one SDE to another, one cannot be sure to find the citation information in the same
place or implemented in a similar way. As the collected data for this study is based
on the observation of the author, it might be possible that hard to find permalinks
or citation recommendations were overlooked and are not included in the dataset.
However, this does not impact on the results of the study as the goal was to explore
the citability of editions. Citation information that is hard to find and uncertainty
about the availability of permalinks, reduce the chance that an edition will be cited
correctly and, therefore, can be considered as “not present” or “not
relevant.”
The author looks at a “Citation recommendation” as a statement about how an
edition or parts of it should be cited. Most editions that provide such information
label it clearly. The dataset records not only the absence or the presence of such
recommendations, it also includes a distinction between citation recommendations for
the entire edition and recommendations for a more granular level of citation such as
individual letters, individual webpages, images, chapters and paragraphs of a text.
These categories are in concordance with the data collected in Sahle’s catalogue.
Citation recommendations can have different levels of detail. Some may only suggest a
permalink or an URL to be used, other editions include all necessary bibliographic
data (editor, year of publication, etc.) for a citation and provide citation
examples. Frequently, these citation examples follow a common citation style like MLA
or APA. In addition to citation recommendations and the presence of examples, version
information indicated by an SDE was also recorded. This information is very diverse
and ranges from a simple version number for the entire edition (like an edition
number in a book) to uniquely citable PIDs or permalink for all older versions of
individual resources. The provision of a date of access is frequently found in
citation recommendations. While the suggestion to include an access date in a
citation shows awareness of the problem of changeability of an online resource, an
access date is an indicator of an research process and does usually not correspond
with an actual change to an online resource [
Broyles 2020, 9].
Consequently, the inclusion of an access date in a citation recommendation has not
been counted as versioning strategy.
Finally, the provision of permalinks or PIDs was recorded. In the dataset either the
name of the PID system (e.g. URN, DOI) is listed, or if an edition uses permalinks,
or if neither was found. Another identifier which is usually used for print
publications, the ISBN, is occasionally found in SDEs too, for instance in the
Jane Austen’s Fiction Manuscripts’ edition or the
Vespasiano da Bisticci, Lettere. The use of ISBNs is
particularly frequent in earlier editions as they were used, for example, for CD-ROM
publications. Even if ISBNs are usually not listed among the PID systems, they are
globally unique identifiers and I have recorded them in my dataset among PIDs and
permalinks. One edition uses ISBN-A which is a DOI based on an ISBN.
[8] Some
editions use unique and persistent project IDs. For instance, the “How to cite” page of the edition
The
Diplomatic Correspondence of Thomas Bodley, 1585-1597 recommends that the
persistent project IDs of each object should be included in any citation.
[9] This so-called “Transcript ID” will ensure that a resource can
be found also in the future even if “the website is restructured
or the URL changes.” Persistent project IDs do have their value for
internal operations in an edition and can be used like canonical numbers to refer to
a resource or parts of it. In the context of a specific domain or edition and
combined with an URL (the editions URL) they become globally unique identifiers and
therefore I have counted them as permalinks in the dataset.
Citation recommendations provided by digital editions
The survey data shows a constant increase of editions that provide citation
information for the edition or parts of it (see Figure 2). Interestingly, the
number of editions that provide citation information for the entire edition has
not increased much, from 25.3% (pre 2006) to 29.1% (2006-2010) to 19.7%
(2011-2015), and 28.8% (2016-2020). However, the number of editions that provide
granular citation options has increased nearly five times over the last 20 years.
It seems that providing only a citation option for the entire edition was more
common in the early times of digital editing, while in the past decade, editions
increasingly provide recommendations for the citation of individual objects and
also for sections within an edition. To sum up, more and more SDEs provide
citation recommendations and especially editions that provide recommendations for
granular citation have increased.
During this research, several strategies for the inclusion of bibliographic
information and citation recommendations have been identified. The first strategy
is a very traditional approach: a title page. A title page, like the ones found in
printed books, is an established means to communicate essential bibliographic
information required by common citation styles. The author’s initial assumption
was that title pages might primarily be relevant in early SDEs that follow the
printed book style and, therefore, recorded the presence of a title page only in
the “Comments” column of the dataset. After having
analysed the data, it seems that title pages feature more prominently in SDEs than
originally expected and are present even in more recent editions (see Figure 4).
However, it must be noted that a great number of editions with title pages were
published series-like by the same organisations, for instance, the Herzog August
Bibliothek (HAB), and the Éditions en ligne de l'École des chartes (Élec), and
Romantic Circles (RC). These organisations seem to have developed a template for
SDEs which includes a title page. The presence of a title page does not exclude
the use of other forms of citation recommendations in the same edition. On the
contrary, the HAB uses a standardised citation statement and the Romantic Circles
provides a citation page.
Another strategy is to provide a brief statement that exemplifies how to cite the
edition or parts of it. Such citation examples can look very diverse and may be
found in different parts of the SDE. For instance, a citation recommendation for
the entire website may be found on the home page, in the footer section, or on
some subpage of the website – frequently on either the About, Copyright, Imprint,
Permissions, or Project page (see Figure 5). The citation example may follow a
common citation style; however, it is important that the necessary bibliographic
data is provided to apply to the most common citation styles. Sometimes, however,
only a permalink or PID is provided. While this is certainly enough to identify an
online resource and link to it from other online publications, it does not provide
enough information for citing using a traditional citation style.
In addition to a citation example for the entire SDE, granular citation examples
for parts of an edition, for instance, individual web pages, edited objects,
images, or even parts of texts, such as chapters and paragraphs, may be provided.
These text- and object-related citation examples are in most cases displayed
directly below, above, or next to a text or other object, for instance, as a text
box, a cite button, or a link that produces a pop-up with the citation example in
it. Citation information directly at the citable resource may, for instance, be
found in the Jahrrechnungen der Stadt Basel 1535 bis 1610 –
digital which provides a citation example for each transcription(see
Figure 6), or the Edition Humboldt digital, and the
edition Alfred Escher-Briefedition which provide
citation statements for their letter transcriptions at the bottom of each page.
The Welsche Gast digital displays an URN, DOI and a
permalink prominently with its online resources and additionally a link that
generates a detailed citation example (see Figure 7).
Some SDE citation examples are placed within a more elaborate citation statement
or citation page. Citation statements or citation pages discuss the rule that
should be applied when citing an edition or parts of it. This can be a brief
statement in the “About” or “User
Guide” section of the website, or an elaborate citation page. A good
example of the first type can be found in the edition
Wilhelmine von Bayreuth: Briefe über ihre Reise nach Frankreich und Italien
1754/1755. The edition provides a “Benutzerhinweis” (user guide) page that contains a section explaining
how the bibliographic information found on the home page (title page) and the URL
should be used to cite the edition.
[10] The following section is then dedicated to the use of
fragment identifiers to cite and address paragraphs within a text. The above
mentioned edition
The Diplomatic Correspondence of Thomas
Bodley, 1585-1597 has an “Editorial” section
containing a link to a citation page. This page contains basic bibliographic data
for citation and a brief description on how to cite the edition. Instead of giving
an example, it is recommended that the user consults citation guidelines such as
MLA for more information on how to cite electronic resources.
[11] An example of a different “How to
cite” page can be found in the
Samuel Beckett Digital
Manuscript Project.
[12] The
page is under the “About” section of the website and it
contains a list of citation examples for the genetic editions provided by the
project (see Figure 8).
Editions that provide citation pages often have a link in the main navigation page
of the website or they include them in the footer. If citation pages are designed
for a specific edition, they can become very detailed with instructions and/or
examples that illustrate how to cite the edition and its different resources using
popular citation styles. The
Theodor Fontane:
Notizbücher[13]and the
Willa Cather Archive are
examples of such websites. The Fontane notebooks’ citation page lists examples for
the citation of the entire edition, individual notebooks, and parts of the
notebooks. For the Willa Cather Archive citation examples for letters,
photographs, video clips, articles, audio recordings, speech transcripts, books,
and parts of books. Furthermore, examples are provided in different citation
styles (MLA, APA Style and Chicago/Turabian Style).
[14]
Another type of citation page is used by some publishers that maintain several
digital editions, such as Huygens ING and RC. In this case a generic citation page
is used to communicate basic information on how to cite editions and resources
produced by these publishers. The benefit is that this approach is much easier to
maintain than to maintain a citation page for each edition. It is also a
sustainable means to provide citation information as long as the page is not
moved. For instance, RC editions have the link to the citation page in the footer
either under “Electronic Citation,” or under “About Romantic Circles” and the citation page was already
in place in 2005.
[15]
An additional way that shows how a SDE may provide guidance for referencing is by
embedding metadata that can be used by reference management tools such as Zotero,
Citavi or Endnote. For instance, by using an embedded metadata strategy,
standardised metadata (e.g. Dublin Core) can be included in the head section of
the HTML document via so-called “meta-tags.” A better strategy is to include
bibliographic metadata as COinS (ContextObjects in Spans) in the HTML page. COinS
is a citation microformat based on the OpenURL standard which allows the inclusion
of metadata in URLs. As the name COinS suggests, bibliographic metadata is
included in an empty
<span> Element directly in the HTML page
[
Fenner et al. 2014]. This strategy is successfully used in online
catalogue platforms such as WorldCat. While the use of COinS and similar
strategies is still the exception in SDEs, the provision of an edition’s
bibliographic metadata in a machine-readable way for reference management tools
will become increasingly important in the future [
Dumont 2018, 125–6].
Granular referencing strategies
The survey had a very wide understanding of granular addressing strategies: if
editions provide recommendations on how to cite anything beyond the edition as a
whole, it has been considered as “granular addressing.” The collected data
shows that before 2010 only few editions provided the option for granular
citation, while in the last 10 years editions are increasingly providing
recommendation on how to cite their individual parts (see Figure 2 and 3).
However, occasionally SDEs go even further and provide addresses for smaller
citable entities. The “addressing markup” strategy for addressing parts of an
XML or HTML document using an URI and a
fragment identifier or
standards such as XPath and XPointer [
Simpson 2009] is most
frequently used by content providers to create citable entities within texts in
scholarly digital editions and we will look at different examples such as CTS
further below. However, it has to be pointed out that the other strategies for
granular referencing are also used; an example would be the addressing of regions
within an image using the Image Interoperability Framework (IIIF) [
Witt 2018, 229–33]
[
van Zundert 2018].
[16] This strategy is still
not widely present in SDEs, but holds great potential for future research with
images in editions.
Citable entities in SDEs are usually HTML fragments that have an attribute
“id” and can be addressed using a
fragment identifier
attached to a URL. The publication platform of the Max Weber Stiftung
Perspectivia.net provides a detailed citation recommendation for its editions
including a section on how to cite granular entities such as paragraphs using
fragment identifier (see Figure 9). The traditional workflow is to generate the
fragment identifier during the transformation process from TEI to HTML. The
Text Encoding for Interchange (TEI) standard has become
the main standard used for digital scholarly editing projects worldwide. The
traditional workflow in TEI editions is that edited texts are semantically marked
up and stored in a TEI document for long-term preservation and exchange, but the
primary access point for a user is an HTML, CSS and JavaScript-based user
interface generated using XSLT, XQuery or other technologies that translate
between data (storage) and publication. From the perspective of citability it is
important to note that the structure of the TEI document and the HTML web
presentation can be fundamentally different. In fact, during the transformation
process from TEI to HTML, content may be removed, added, or reorganised [
Flanders et al. 2016, 264]. Text, metadata, markup and fragment
identifiers in the TEI document are not necessarily the same as in the HTML
document. Consequently, citing the text from the TEI document or the presentation
of the text by the edition’s user interface are two different things. For the sake
of citability both should be possible. However, as TEI is the format for long-term
preservation of data; it could be argued that ideally a citation should point to
identifiers in the TEI source file. Furthermore, the mapping between TEI and HTML
structures and identifiers for citation needs to be clearly documented and
communicated. This can be done using the element
references
declaration (
<refsDecl>) in the TEI header. However,
documentation should also be present on the website of the edition. Lastly,
another option to reduce divergence between the TEI and the user interface would
be to create a user interface that reproduces the TEI source document (or parts of
it) in HTML. This is, for instance, realised by the CETEIcean project [
Cayless 2018, 258–62].
[17]
In a digital editing context attempts have also been made to use semantically
structured PIDs for granular addressing of sections of a text. One such attempt is
the protocol Canonical Text Services (CTS) which was originally developed for the
referencing of ancient texts. Ancient texts, such as Homer’s hymns, often have
established canonical numbers that were used for citation for decades or even
centuries [
Bleier 2021]. CTS provides a means to translate these
citations into digital form and makes them human and machine-readable [
Blackwell and Smith 2016]
[
Kalvesmaki 2014, 15–6]
[
Smith 2009]. The CTS understands a text as a number of
hierarchically structured entities that can each be addressed by established
canonical citation rules. To identify these entities URNs are used to follow a
syntax containing canonical names and numbers. Undoubtedly, this is an elegant and
practical solution to create abstract and human-readable persistent identifiers
for texts. However, as Kalvesmaki already pointed out, a fundamental problem is
that the “cts” namespace is not registered with the global namespace
authority IANA [
Kalvesmaki 2014, 20] and, therefore, anybody
could register and use the namespace for other purposes. Consequently,
Perseus
[18] and other projects must be aware that their URNs are
currently not sustainable in the long run. Perseus itself translates its URNs into
URLs. The provision of permalinks based on the URN syntax would be a suitable
workaround until the URN namespace issue is resolved.
A project that has a similar strategy is
The Classical Latin
Texts online collection by
The Packard Humanities
Institute (PHI). This project makes numerous classical texts available
and citable using canonical numbers in their human-readable URLs that follow a
similar logic as the CTS URNs.
[19] Other projects also implemented solutions to represent traditional
canonical citations as digital citations. The following are only a sample of
projects that succeeded in this process. The
dMGH
project makes the monumental edition series
Monumenta
Germaniae Historica (MGH) available in digital form. The project uses
human-readable permalinks containing the known abbreviations for volumes of the
MGH and individual page numbers can be included in the URL for citation. This is
an interesting example for the use of object-related visual numbers for
citation.
[20] The final
example is Peter Robinson’s suggested solution to the citation of electronic texts
which is also based on URNs. He calls it the “documents,
entities and texts” (DET) scheme and demonstrates the applicability of
the system through an implementation into the Textual Communities environment
using the Canterbury Tales as an example [
Robinson 2017].
Two final points concerning “granular citation” need to be discussed in
brief. The first is the bibliographic data for the paratextual pages of an SDE
that is frequently missing. Some editions provide recommendations on how to cite
the edition or parts of it, but it is sometimes not clear how to cite the
introductions, commentaries, bibliographies, technical documentation, the
documentation of the encoding model etc. These are fundamental resources of an
edition and it should be possible for a user to cite them correctly. The second
point is important in digital editing projects that were produced in teams. The
TEI <respStmt> (statement of responsibility) allows recording
of the individuals responsible for the editing of a TEI document and also their
roles. Frequently, the main editor(s) only are mentioned in a citation statement.
Many people apart from the editor(s), however, are often involved in compiling the
transcriptions or collations, while xml developer(s) are responsible for the data
modelling and web developer(s) are responsible for the correct presentation of the
edited text on the edition’s website. The question arises whose name should appear
besides that of the main editor(s) in a citation statement? When individual
resources, such as a transcription, are cited, at least the person(s) who did the
main editorial work should have a prominent place in the citation statement.
Similarly, the xml and web developer should be cited when referring to the data
model (as ODD or a prose documentation) or the source code of the website. To list
every project member in a citation is not practical as this will always result in
an “et al” mention and the essential information about responsibilities gets
lost.
PIDs and permalinks in digital scholarly editions
PIDs and permalinks are central for citation of SDEs as they are persistent names
for web content and consequently imperative for reference texts that are used for
citation by the scientific community. However, the survey results are slightly
disillusioning (see Figure 10). It is a positive development that the use of
permalinks and PIDs has increased in the last two decades. Yet over 75% of SDEs
still do not provide PIDs or permalinks – or at least do not communicate this
information to their users.
While URNs and DOIs are well known PIDs and their use is already a sign to the
informed user that these identifiers can be used for citation, URLs have a bad
reputation for being unreliable and have to be explicitly labelled as being
permalinks. Some editions might use their URLs as permalinks, but do not
explicitly state this, or if they do the statement is difficult to find. As
mentioned in the introduction, the maintenance of PIDs and permalinks is key to
their function as persistent names for online resources. Ideally, an institution
with suitable infrastructure and a commitment to preserve and maintain its online
resources should oversee the hosting of a SDE. The same is valid for the
management and maintenance of PIDs and permalinks. Long-term preservation and the
permalink strategy are important information for a user as they may increase trust
in the stability of an online resource. Therefore, such a statement should be
placed in a prominent position.
The HAB provides “persistent URLs” (permalinks) for its online resources and
has a brief, but clear commitment published on its website that ensures the
persistent availability of its online content via the persistent URLs.
[21] The
edition humboldt
digital, developed and maintained by BBAW, is another good example of
the use of permalinks in SDEs. The edition includes a permalink statement that
specifies what permalinks in the edition are and a commitment by the
Berlin-Brandenburg Academy of Sciences and Humanities (BBAW), which is committed
to making published resources available.
[22] The
statement is brief, but it also mentions that the persistent availability of
versions will be ensured via the indicated permalinks. This is a functionality
that is unfortunately rarely seen in SDEs. The edition
Welscher Gast digital is an interesting case in this context, since
for some of its resources the edition provides a DOI and a “Zitierlink”
(citation link), a permalink, and for other resources an additional URN. A user
may ask, should all these options be used for citation or only the
“Zitierlink?” A clear answer comes from the developers of the edition who
solved this issue elegantly by providing information about the stability of the
permalink in an info lightbox and additionally a “Zitierhinweis” (citation
recommendation with an example) indicating that the DOI should be used for
citation (see Figure 7).
Closely related to the use of permalinks and PIDs is the question of long-term
availability of online resources. Sahle’s catalogue includes several editions that
were available 10 or 20 years ago, but are gone today or have been moved and their
original URL is no longer functional.
[23] Various reasons may explain why an edition cannot be found at the
present time or why it was taken down. In any case, finding out that links are not
working can be very frustrating experience for readers. One cannot be sure whether
the editon no longer exists or if it has simply been moved to a different location
and no redirect was set up. If an edition is withdrawn fully from the web, a good
solution is to set up a tombstone page,
[24] a special type of landing page, at the original URL providing at
least basic information about the whereabouts of the edition. The Shakespeare
Quartos Archive was taken down recently, on 15. April 2020, “as the technologies which it is built with have reached end-of-life.”
The Bodleian Libraries set up a tombstone page, but call it a “holding page,” at the original URL of the SDE (see Figure 11) with
basic bibliographic information with a screenshot of the edition, a brief
description of the project, a brief statement why it is not available anymore and
when it was taken down, and a link to a 2019 version of it preserved in the
Wayback Machine.
[25] Even if it is regrettable that the edition could not be
maintained, the provision of the tombstone page and the link to an archived
version is an excellent service. Ideally one would also expect to have a download
option for the editions data. Even if the interface and functionalities are no
longer functional, the TEI data is still a valuable resource for research that can
be used independently from the outdated software for the edition.
Versioning strategies in SDEs
As pointed out in the introduction, a versioning strategy is important for SDEs
as their content can be changed as easily as the content of any other website.
Following strategies from software design, digital editions are also frequently
released at an early stage of development and resources are revised or added at
a later stage. Therefore, resources in digital editions might change over time
and it is important to make those changes transparent to a user. For scholarly
discourse, editions need to provide resources that can be reliably cited not
when they are finished, but as soon as they are released to the public. For
citation and later retrieval, it is imperative that users know what version
they are dealing with. Different strategies are used to indicate versions.
The results of the study have shown that over the past decade editions with a
versioning strategy have substantially increased (see Figure 12).
Unfortunately, however, most SDEs still do not have a versioning strategy
implemented. The reasons for this cannot be of a technical nature alone, as the
provision of version information in its most basic form only requires adding a
version number or a version date. Deeper reasons are to be found for the
problem rather in the way editors perceive their online editions. These are
often considered as being “final” and “static,” scarcely affected by
the changes that may occur once they have been released. Additionally, editors
may believe that minor changes applied to online editions do not matter to the
user. If this is the case, however, this perspective needs to be openly
communicated on an edition’s website. Another possible explanation for the lack
of a solid versioning strategy, might be that there is still too little
awareness of both the topic itself and the importance of transparency and
retractability for changes in a digital editing context. Scholarly arguments
and citations cannot be based on texts that may be modified overnight. The
following examples have been selected for briefly highlighting the different
strategies used by editions listed in Sahle’s catalogue to indicate for users
what version they are dealing with. The already mentioned study by Broyles
analyses the versioning strategies of about 30 digital editions [
Broyles 2020]. While this study is primarily concerned with
strategies to indicate versions to the user as part of a citation
recommendation, Broyles study also explored logged changes in data files and
other forms of content versioning.
Examples of early editions with a versioning strategy and declaration are
The Electronic Beowulf,
[26]
The Auchinleck Manuscript
[27] and
The Old Bailey Proceedings
Online.
[28] In these cases, the editions have a version number
and not individual web resources. This is the same approach used in traditional
print editing: the author or editor of a book or an edition decides when the
release of a revised version is necessary. Citation recommendations provided by
these editions suggest adding the current version number or the date to any
citation. Previous versions may be available in an archive, but neither of the
three editions provide permalinks for older versions.
The
Online Froissart is a more recent edition which also employs this
strategy, and it provides a very useful detailed list of changes made between
the different versions.
[29]
Another versioning strategy is that in which it is not the edition itself that
receives a new version with every update, but the versions of individual web
resources which are stored and made accessible for citation using permalinks or
PIDs. Version control systems (VCS) can be used for this purpose. Every little
change to a resource is recorded and generates a new version. In contrast to
the above-mentioned system, the creation of a new version is not triggered by
the author or editor. Every change, no matter how small, generates a new
version. The benefit of this system is that parts of an SDE can be updated
without the need to create a new version of the entire edition. Furthermore,
every old and new version is stored and can be referenced and retrieved from an
archive. A downside, however, is that if badly executed, this strategy can lead
to the creation of an enormous number of unnecessary versions, for instance of
irrelevant changes, which might be confusing to a user, require more storage
space, and maintenance.
The
Edition Humboldt digital is an example of an
edition which has an archive of older text versions citable via
permalinks.
[30] Moreover, if one accesses one of its old versions, a
warning message indicates that a newer version of this transcription is
available and points the user to it. The most recent authorized revision of a
resource can be then accessed via a
canonical URL. There is a
significant difference between canonical URLs and the before mentioned
canonical numbers. Canonical URLs were introduced out of a practical need to
optimise search engine results. They are implemented using an HTML
canonical tag the purpose of which is to specify what URL
should be listed in the search results.
[31] A canonical URL should always point to the newest,
accepted version of an online resource. As the canonical URL does not in any
way reflect the change of the online resource, it should not be used for
citation. While this versioning approach is relatively new to the digital
editing community, it has been in use for projects such as Wikipedia for quite
a while. Wikipedia is an example of a website where content is constantly
changing and where a new version with a permalink results each time an article
is edited. The project provides canonical URLs for the most recent, accepted
version of an article and permalinks for permanent addressing. Wiki as an
editing platform is also being used by digital edition projects. For instance,
the
Social Edition of the Devonshire MS (BL Add.
MS 17492) is published via
Wikibooks and uses the
same system as Wikipedia for the visibility of permalinks and versioning information.
[32]
Conclusion
Scholarly editions – both in print and in digital form – are reference works for
scholarly discourse. Consequently, resources provided by scholarly digital editions
(SDE) should be clearly citable and persistently available. This is, however, not
always the case as SDEs are often complex websites with different resources and
traditional citation style guides do not sufficiently cover this type of website.
Furthermore, developers of SDEs often do not provide the necessary bibliographic data
and guidance to cite an edition and its resources correctly. Apart from the
bibliographic data that is also required for the citation of editions in print,
crucial information specific to SDEs has been identified and discussed including
permalink, PID, versioning strategies, and granular citation options for digital
resources.
As in the digital realm there is no stable physical representative of the edition,
the provision of permalinks or PIDs for the scholarly digital edition and its
resources is crucial. If online resources are updated and changed, it is necessary to
provide stable identifiers for older versions and, ideally, to make them accessible
for users. Furthermore, the possibilities and potential for granular citation in
scholarly digital editions have been addressed by briefly discussing the strategies
used by IIIF, fragment identifiers and the CTS protocol. In this context, it is
important to note that a substantial number of texts used in the humanities have
established titles and numbers for citation. CTS, DET, dMGH, PHI and similar projects
try to translate the logic of traditional into a digital format. By this means not
only the edition, but fragments and objects within the edition may be cited in the
digital realm using existing and widely used logical structures of a text.
The survey conducted as part of this research has shown that over the past two
decades, the awareness of these topics has increased among SDE developers. However,
most digital editions still do not sufficiently communicate to the user what can be
cited in an SDE and how to do so safely. Citation information is, nevertheless,
imperative as each edition may have a different structure, it may also provide a
different set of online resources, or use diverse granular citation strategies.
Therefore, not providing clear information on how to cite the edition or partial
resources may decrease the trust in and the citability of a digital edition.
Furthermore, digital editions are produced by teams rather than individuals and the
acknowledgement of the individual responsibilities of team members can and should be
emphasised by citing these persons alongside the main editor(s) when appropriate.
This information needs to be communicated to a user.
A further important point of consideration is the provision of all necessary
bibliographic data for citation. The most practical approach for this would appear to
be the placing of a citation statement for the entire edition in a prominent position
on the home page, and additional citation statements referring to the individual
resources provided by the edition. Permalinks and PIDs should be clearly highlighted
and it must be communicated to the user that these can and should be used for
persistent citation. As the persistent availability of online resources always
depends strongly on the institution maintaining them, a digital edition should
outline its PID/permalink strategy and policy on the website. Furthermore, users
should be informed on how an edition deals with updates and revisions of its
resources. A good place for such a statement would be a citation page or the
“about” page. How changes are logged and how old and new versions can be
cited is vital information. As a matter of fact, very few online editions will remain
unchanged over time. Ideally, a new permalink or PID should be created for every
version.
Currently, there is no standardised way to provide citation information in an SDE.
Various strategies were discussed in this article and it was occasionally pointed out
that some solutions seem to be more frequently used. This might be the first step
towards the standardisation of this very important feature of an SDE. However, as
there are too many different types of editions covering various subjects, further
research is necessary to analyse what kind of citation recommendation fits to
different types of editions. Currently, the best advice for editors and developers
alike is to provide essential bibliographic data for the citation of an edition
easily accessible to the user (ideally in human- and machine-readable form). It is
essential for a user to know how to cite the individual resources of the edition and
to easily find permalinks/PIDs and information on the versioning strategy of an SDE.
Following these basic rules will increase the citability of an edition and help move
towards more stability for scholarly online resources.