Roman Bleier studied History and Religious Studies at the University of Graz, and received a Ph.D. in Digital Arts and Humanities at Trinity College Dublin in 2016. Since 2016 he works as postdoctoral researcher on different digital scholarly editing projects at the Institute Centre for Information Modelling–Austrian Centre for Digital Humanities, University of Graz.
This is the source
Citing and referencing is an important aspect of scholarship as it makes research
processes retraceable and transparent. Over centuries citation practices have been
optimized for print publications. However, the digital revolution and the increased
scholarly output online provide new challenges. For instance, the physical form of
the book which was the basis for citation styles in the past is replaces by different
types of online resources of which only some resemble books. Another issue is that
digital resources on the internet are not as durable as printed books that are
distributed in multiple copies in libraries across the world. For scholarly citation
this is a problem as a website referenced today, might not be available tomorrow or
might have changed its address which is frequently used for referencing. Editions,
in print or digital, are supposed to be stable reference points for scholarly
discourse as discussions the scholarly community engages today should still be
reproducible in 50 or 100 years – as it is the case with printed scholarly editions
from the 19th century. Consequently, assurance of long-term availability of resources
and clarity of how and what to cite in a digital edition is vital. That the lack of
providing this kind of assurance and information makes online resources less
attractive for citation has been shown in a recently published survey using as a case
study British History Online and Early English Books Online. The survey assessed if
researchers prefer to cite print or digital resources and the results indicate that
there is still a very strong culture of non-citation
of electronic resources
among students and researchers in the humanities. The authors of the study suggest
that creators of web content can contribute to facilitate the citation of digital
resources by providing recommendations on how to cite a resource and by having a
solid permalink strategy broken links
and strategies to provide
persistent identifiers for the citation and access of online content, the results of
a survey of 100 digital scholarly editions will be presented. The survey focuses on
citation recommendations provided and permalink strategies used by digital scholarly
editions. In this context the paper will look at how permalink/PID strategies and
citation recommendations can make digital editions better citable. A closer look will
be taken at citation recommendations of selected editions in order to highlight
possible solutions.
Surveys British History Online and Early English Books Online to examine the role of, and improvements to, permalink/PID strategies and citation recommendations in making digital editions more accessible
Editions, in print or digital, have a prominent place in humanities research as they make primary sources accessible and are the foundation for scholarly argumentation in many disciplines. Allowing the unambiguous citation of an edited text is an important task of a scholarly edition since it makes research processes retraceable and transparent. This has not changed in the digital age, however, despite the digital medium posing new challenges for editors and users of scholarly digital editions (SDE) alike.
The Institute for Documentology and Scholarly Editing (IDE) has developed a checklist
to evaluate scholarly digital editions. Two items of this catalogue deal with the
assessment of the citability of such editions:
The first quote from the catalogue is concerned with the possibility to extract suitable bibliographic data to cite an SDE correctly. It is important that the SDE should be identifiable in the same manner as traditional print publications. However, besides traditional bibliographic data such as the title, editor(s) and date of publication, the version or the date the digital object was last changed is also indicated and the Internet address is important for citation too. Two other crucial aspects of digital editions are mentioned in the second quote: does the SDE use persistent identifiers and does it provide citation guidelines for users? Print editing knew similar concepts: updated versions of a book are published as second, third and further subsequent editions, the ISBN is recognised as a persistent identifier and the title page as the location where all the bibliographic information can be found. When looking at digital editing however, updating an edition, providing persistent identifiers and bibliographic data for citation are not handled in a similar manner and pose new challenges.
Persistent names. URLs are globally unique identifiers that
could be used for persistent addressing of online resources
Versioning. Closely linked to stable addresses is the question
of how to deal with changes of digital objects over time. A content provider may
change the content of a website at any time and without any explicit indication of
the changes accepted
version of an article. However, older versions can still be
accessed using their permalink which is also strongly recommended by Wikipedia
itself.canonical URL
see Bleier
(2021).<revisionDesc>
in the header section of the TEI document
by providing a list of changes made to a document.
Granular citation. A citation should be as precise as possible
to make it easier for a reader to address and verify the source. In academic papers
and scholarly discussion, it is necessary to cite not only entire works but to be
precise sub-entities such as the page numbers of a book or verse and the number of a
poem. In the humanities, established granular citation methods are used to cite
chapters, paragraphs, pages and lines of a print publication or manuscript. A
frequently used example for canonical numbers is that of the books, chapters, and
verses of the Bible. These entities can be used to cite passages of the Bible across
different editions and translations which ensures, for instance, that John 3:16
(For God so loved the world, that he gave...
) can be
found in any Bible version or translation under the same number.
Citation practice. Several publications have pointed out that
despite the availability of an increasing number of digital editions these are not
being cited by the majority of students and researchers in the humanities who prefer
to cite print publications instead
The following study attempts to explore this topic further. In a survey conducted by
the author of this study, the citation recommendations and permalink strategies of
670 digital editions, published between the 1990s and 2020, have been analysed. The
primary goal of the survey was to collect data about how digital editions ensure
citability of their content by:
The quantitative analysis of the collected data will show developments that are, hopefully, not only representative for this sample, but applicable to the wider SDE community. In addition to the quantitative analysis, various citation strategies applied in different editing projects will be illustrated by taking a closer look at some of the editions in the list. The goal is to identify developments/strategies that are frequently used and which also appear to have proven value as effective solutions.
The full list of editions and data on which the analysis, charts and conclusions are
based is provided in a Github repository. All charts published in this paper were
produced by the author himself and are based on the compiled dataset. The Github
repository also contains the Jupyter notebook used to produce the graph
visualisations.
Patrick Sahle defines SDE as scholarly editions that are guided
by a digital paradigm in their theory, method and practice
Edition source period,
were used for
analysis purposes in this research. Figure 1 shows the distribution of these
categories across Sahle’s catalogue. However, at the time of writing this article,
the catalogue included only occasional information about the availability of
permalinks or citation information (for 85 editions). Therefore, all 670 editions had
to be inspected to collect relevant data to assess their citability.
The study of a temporal development was desired as an original assumption of the author was that more recently developed editions show a greater awareness for issues of citation and more frequently use permalinks or PIDs for their resources. However, there are two main problems in trying to assign a date to a digital edition. First, some digital editions do not clearly state a date of publication. In print publishing it is standard to have a date of publication printed either on the title page or on its reverse side. Digital editions do not always provide an official publication date, some provide a period of time when the edition was under development, but already available as alpha or beta version, and a few digital editions do not publish any dates or include only vague information about their development. Second, digital editions may have changed over time (and indeed many did) and their interface and structure might have looked different ten or fifteen years ago. Consequently, a
Each edition in Sahle’s catalogue was inspected and the presence of citation
information was evaluated as follows. Data about three categories was collected:
What made the assessment slightly difficult, is that digital editions do not yet have
an established structure. Consequently, the citation information could be found in
different places. However, after looking through several editions, the author
identified the most likely places to look for citation information:
implicit citation
by providing structural entities within an edition,
but no explicit recommendation on how they should be used for citation and
consequently they were also not counted as providing citation recommendations.
This variety of options highlights a core problem for users of SDEs. When moving from
one SDE to another, one cannot be sure to find the citation information in the same
place or implemented in a similar way. As the collected data for this study is based
on the observation of the author, it might be possible that hard to find permalinks
or citation recommendations were overlooked and are not included in the dataset.
However, this does not impact on the results of the study as the goal was to explore
the citability of editions. Citation information that is hard to find and uncertainty
about the availability of permalinks, reduce the chance that an edition will be cited
correctly and, therefore, can be considered as not present
or not
relevant.
The author looks at a Citation recommendation
as a statement about how an
edition or parts of it should be cited. Most editions that provide such information
label it clearly. The dataset records not only the absence or the presence of such
recommendations, it also includes a distinction between citation recommendations for
the entire edition and recommendations for a more granular level of citation such as
individual letters, individual webpages, images, chapters and paragraphs of a text.
These categories are in concordance with the data collected in Sahle’s catalogue.
Citation recommendations can have different levels of detail. Some may only suggest a
permalink or an URL to be used, other editions include all necessary bibliographic
data (editor, year of publication, etc.) for a citation and provide citation
examples. Frequently, these citation examples follow a common citation style like MLA
or APA. In addition to citation recommendations and the presence of examples, version
information indicated by an SDE was also recorded. This information is very diverse
and ranges from a simple version number for the entire edition (like an edition
number in a book) to uniquely citable PIDs or permalink for all older versions of
individual resources. The provision of a date of access is frequently found in
citation recommendations. While the suggestion to include an access date in a
citation shows awareness of the problem of changeability of an online resource, an
access date is an indicator of an research process and does usually not correspond
with an actual change to an online resource
Finally, the provision of permalinks or PIDs was recorded. In the dataset either the name of the PID system (e.g. URN, DOI) is listed, or if an edition uses permalinks, or if neither was found. Another identifier which is usually used for print publications, the ISBN, is occasionally found in SDEs too, for instance in the
Transcript IDwill ensure that a resource can be found also in the future even if
the website is restructured or the URL changes.Persistent project IDs do have their value for internal operations in an edition and can be used like canonical numbers to refer to a resource or parts of it. In the context of a specific domain or edition and combined with an URL (the editions URL) they become globally unique identifiers and therefore I have counted them as permalinks in the dataset.
The survey data shows a constant increase of editions that provide citation information for the edition or parts of it (see Figure 2). Interestingly, the number of editions that provide citation information for the entire edition has not increased much, from 25.3% (pre 2006) to 29.1% (2006-2010) to 19.7% (2011-2015), and 28.8% (2016-2020). However, the number of editions that provide granular citation options has increased nearly five times over the last 20 years. It seems that providing only a citation option for the entire edition was more common in the early times of digital editing, while in the past decade, editions increasingly provide recommendations for the citation of individual objects and also for sections within an edition. To sum up, more and more SDEs provide citation recommendations and especially editions that provide recommendations for granular citation have increased.
During this research, several strategies for the inclusion of bibliographic information and citation recommendations have been identified. The first strategy is a very traditional approach: a title page. A title page, like the ones found in printed books, is an established means to communicate essential bibliographic information required by common citation styles. The author’s initial assumption was that title pages might primarily be relevant in early SDEs that follow the printed book style and, therefore, recorded the presence of a title page only in the
Another strategy is to provide a brief statement that exemplifies how to cite the edition or parts of it. Such citation examples can look very diverse and may be found in different parts of the SDE. For instance, a citation recommendation for the entire website may be found on the home page, in the footer section, or on some subpage of the website – frequently on either the About, Copyright, Imprint, Permissions, or Project page (see Figure 5). The citation example may follow a common citation style; however, it is important that the necessary bibliographic data is provided to apply to the most common citation styles. Sometimes, however, only a permalink or PID is provided. While this is certainly enough to identify an online resource and link to it from other online publications, it does not provide enough information for citing using a traditional citation style.
In addition to a citation example for the entire SDE, granular citation examples for parts of an edition, for instance, individual web pages, edited objects, images, or even parts of texts, such as chapters and paragraphs, may be provided. These text- and object-related citation examples are in most cases displayed directly below, above, or next to a text or other object, for instance, as a text box, a cite button, or a link that produces a pop-up with the citation example in it. Citation information directly at the citable resource may, for instance, be found in the
Some SDE citation examples are placed within a more elaborate citation statement or citation page. Citation statements or citation pages discuss the rule that should be applied when citing an edition or parts of it. This can be a brief statement in the
Editions that provide citation pages often have a link in the main navigation page of the website or they include them in the footer. If citation pages are designed for a specific edition, they can become very detailed with instructions and/or examples that illustrate how to cite the edition and its different resources using popular citation styles. The
Zitationpage: https://fontane-nb.dariah.eu/zitationshinweise.html [20. July 2020].
Another type of citation page is used by some publishers that maintain several digital editions, such as Huygens ING and RC. In this case a generic citation page is used to communicate basic information on how to cite editions and resources produced by these publishers. The benefit is that this approach is much easier to maintain than to maintain a citation page for each edition. It is also a sustainable means to provide citation information as long as the page is not moved. For instance, RC editions have the link to the citation page in the footer either under
An additional way that shows how a SDE may provide guidance for referencing is by
embedding metadata that can be used by reference management tools such as Zotero,
Citavi or Endnote. For instance, by using an embedded metadata strategy,
standardised metadata (e.g. Dublin Core) can be included in the head section of
the HTML document via so-called meta-tags.
A better strategy is to include
bibliographic metadata as COinS (ContextObjects in Spans) in the HTML page. COinS
is a citation microformat based on the OpenURL standard which allows the inclusion
of metadata in URLs. As the name COinS suggests, bibliographic metadata is
included in an empty <span>
Element directly in the HTML page
The survey had a very wide understanding of granular addressing strategies: if
editions provide recommendations on how to cite anything beyond the edition as a
whole, it has been considered as granular addressing.
The collected data
shows that before 2010 only few editions provided the option for granular
citation, while in the last 10 years editions are increasingly providing
recommendation on how to cite their individual parts (see Figure 2 and 3).
However, occasionally SDEs go even further and provide addresses for smaller
citable entities. The addressing markup
strategy for addressing parts of an
XML or HTML document using an URI and a
Citable entities in SDEs are usually HTML fragments that have an attribute
id
and can be addressed using a
<refsDecl>
) in the TEI header. However,
documentation should also be present on the website of the edition. Lastly,
another option to reduce divergence between the TEI and the user interface would
be to create a user interface that reproduces the TEI source document (or parts of
it) in HTML. This is, for instance, realised by the CETEIcean project In a digital editing context attempts have also been made to use semantically
structured PIDs for granular addressing of sections of a text. One such attempt is
the protocol Canonical Text Services (CTS) which was originally developed for the
referencing of ancient texts. Ancient texts, such as Homer’s hymns, often have
established canonical numbers that were used for citation for decades or even
centuries cts
namespace is not registered with the global namespace
authority IANA
A project that has a similar strategy is
documents, entities and texts(DET) scheme and demonstrates the applicability of the system through an implementation into the Textual Communities environment using the Canterbury Tales as an example
Two final points concerning granular citation
need to be discussed in
brief. The first is the bibliographic data for the paratextual pages of an SDE
that is frequently missing. Some editions provide recommendations on how to cite
the edition or parts of it, but it is sometimes not clear how to cite the
introductions, commentaries, bibliographies, technical documentation, the
documentation of the encoding model etc. These are fundamental resources of an
edition and it should be possible for a user to cite them correctly. The second
point is important in digital editing projects that were produced in teams. The
TEI <respStmt>
(statement of responsibility) allows recording
of the individuals responsible for the editing of a TEI document and also their
roles. Frequently, the main editor(s) only are mentioned in a citation statement.
Many people apart from the editor(s), however, are often involved in compiling the
transcriptions or collations, while xml developer(s) are responsible for the data
modelling and web developer(s) are responsible for the correct presentation of the
edited text on the edition’s website. The question arises whose name should appear
besides that of the main editor(s) in a citation statement? When individual
resources, such as a transcription, are cited, at least the person(s) who did the
main editorial work should have a prominent place in the citation statement.
Similarly, the xml and web developer should be cited when referring to the data
model (as ODD or a prose documentation) or the source code of the website. To list
every project member in a citation is not practical as this will always result in
an et al
mention and the essential information about responsibilities gets
lost.
PIDs and permalinks are central for citation of SDEs as they are persistent names for web content and consequently imperative for reference texts that are used for citation by the scientific community. However, the survey results are slightly disillusioning (see Figure 10). It is a positive development that the use of permalinks and PIDs has increased in the last two decades. Yet over 75% of SDEs still do not provide PIDs or permalinks – or at least do not communicate this information to their users.
While URNs and DOIs are well known PIDs and their use is already a sign to the informed user that these identifiers can be used for citation, URLs have a bad reputation for being unreliable and have to be explicitly labelled as being permalinks. Some editions might use their URLs as permalinks, but do not explicitly state this, or if they do the statement is difficult to find. As mentioned in the introduction, the maintenance of PIDs and permalinks is key to their function as persistent names for online resources. Ideally, an institution with suitable infrastructure and a commitment to preserve and maintain its online resources should oversee the hosting of a SDE. The same is valid for the management and maintenance of PIDs and permalinks. Long-term preservation and the permalink strategy are important information for a user as they may increase trust in the stability of an online resource. Therefore, such a statement should be placed in a prominent position.
The HAB provides persistent URLs
(permalinks) for its online resources and
has a brief, but clear commitment published on its website that ensures the
persistent availability of its online content via the persistent URLs.
Zitierlink(citation link), a permalink, and for other resources an additional URN. A user may ask, should all these options be used for citation or only the
Zitierlink?A clear answer comes from the developers of the edition who solved this issue elegantly by providing information about the stability of the permalink in an info lightbox and additionally a
Zitierhinweis(citation recommendation with an example) indicating that the DOI should be used for citation (see Figure 7).
Closely related to the use of permalinks and PIDs is the question of long-term
availability of online resources. Sahle’s catalogue includes several editions that
were available 10 or 20 years ago, but are gone today or have been moved and their
original URL is no longer functional.As of 09/2018 the edition, once at http://archives.forasfeasa.ie seems to be gone. The
wayback-machine has a snapshop of the start page from 23.09.2010. What's
left is an article about the project, published in
http://www.digitale-edition.de/vlet_a-z.html [19. December
2019].provide a full bibliographic citation,
the DOI in human- and
machine-readable format and a statement of
unavailability
to outline why the online resource is not available
anymore. See https://support.datacite.org/docs/tombstone-pages [19. July
2021].as the technologies which it is built with have reached end-of-life.
The Bodleian Libraries set up a tombstone page, but call it a holding page,
at the original URL of the SDE (see Figure 11) with
basic bibliographic information with a screenshot of the edition, a brief
description of the project, a brief statement why it is not available anymore and
when it was taken down, and a link to a 2019 version of it preserved in the
Wayback Machine.
As pointed out in the introduction, a versioning strategy is important for SDEs as their content can be changed as easily as the content of any other website. Following strategies from software design, digital editions are also frequently released at an early stage of development and resources are revised or added at a later stage. Therefore, resources in digital editions might change over time and it is important to make those changes transparent to a user. For scholarly discourse, editions need to provide resources that can be reliably cited not when they are finished, but as soon as they are released to the public. For citation and later retrieval, it is imperative that users know what version they are dealing with. Different strategies are used to indicate versions.
The results of the study have shown that over the past decade editions with a
versioning strategy have substantially increased (see Figure 12).
Unfortunately, however, most SDEs still do not have a versioning strategy
implemented. The reasons for this cannot be of a technical nature alone, as the
provision of version information in its most basic form only requires adding a
version number or a version date. Deeper reasons are to be found for the
problem rather in the way editors perceive their online editions. These are
often considered as being final
and static,
scarcely affected by
the changes that may occur once they have been released. Additionally, editors
may believe that minor changes applied to online editions do not matter to the
user. If this is the case, however, this perspective needs to be openly
communicated on an edition’s website. Another possible explanation for the lack
of a solid versioning strategy, might be that there is still too little
awareness of both the topic itself and the importance of transparency and
retractability for changes in a digital editing context. Scholarly arguments
and citations cannot be based on texts that may be modified overnight. The
following examples have been selected for briefly highlighting the different
strategies used by editions listed in Sahle’s catalogue to indicate for users
what version they are dealing with. The already mentioned study by Broyles
analyses the versioning strategies of about 30 digital editions
Examples of early editions with a versioning strategy and declaration are
Another versioning strategy is that in which it is not the edition itself that receives a new version with every update, but the versions of individual web resources which are stored and made accessible for citation using permalinks or PIDs. Version control systems (VCS) can be used for this purpose. Every little change to a resource is recorded and generates a new version. In contrast to the above-mentioned system, the creation of a new version is not triggered by the author or editor. Every change, no matter how small, generates a new version. The benefit of this system is that parts of an SDE can be updated without the need to create a new version of the entire edition. Furthermore, every old and new version is stored and can be referenced and retrieved from an archive. A downside, however, is that if badly executed, this strategy can lead to the creation of an enormous number of unnecessary versions, for instance of irrelevant changes, which might be confusing to a user, require more storage space, and maintenance.
The
Scholarly editions – both in print and in digital form – are reference works for scholarly discourse. Consequently, resources provided by scholarly digital editions (SDE) should be clearly citable and persistently available. This is, however, not always the case as SDEs are often complex websites with different resources and traditional citation style guides do not sufficiently cover this type of website. Furthermore, developers of SDEs often do not provide the necessary bibliographic data and guidance to cite an edition and its resources correctly. Apart from the bibliographic data that is also required for the citation of editions in print, crucial information specific to SDEs has been identified and discussed including permalink, PID, versioning strategies, and granular citation options for digital resources.
As in the digital realm there is no stable physical representative of the edition, the provision of permalinks or PIDs for the scholarly digital edition and its resources is crucial. If online resources are updated and changed, it is necessary to provide stable identifiers for older versions and, ideally, to make them accessible for users. Furthermore, the possibilities and potential for granular citation in scholarly digital editions have been addressed by briefly discussing the strategies used by IIIF, fragment identifiers and the CTS protocol. In this context, it is important to note that a substantial number of texts used in the humanities have established titles and numbers for citation. CTS, DET, dMGH, PHI and similar projects try to translate the logic of traditional into a digital format. By this means not only the edition, but fragments and objects within the edition may be cited in the digital realm using existing and widely used logical structures of a text.
The survey conducted as part of this research has shown that over the past two decades, the awareness of these topics has increased among SDE developers. However, most digital editions still do not sufficiently communicate to the user what can be cited in an SDE and how to do so safely. Citation information is, nevertheless, imperative as each edition may have a different structure, it may also provide a different set of online resources, or use diverse granular citation strategies. Therefore, not providing clear information on how to cite the edition or partial resources may decrease the trust in and the citability of a digital edition. Furthermore, digital editions are produced by teams rather than individuals and the acknowledgement of the individual responsibilities of team members can and should be emphasised by citing these persons alongside the main editor(s) when appropriate. This information needs to be communicated to a user.
A further important point of consideration is the provision of all necessary
bibliographic data for citation. The most practical approach for this would appear to
be the placing of a citation statement for the entire edition in a prominent position
on the home page, and additional citation statements referring to the individual
resources provided by the edition. Permalinks and PIDs should be clearly highlighted
and it must be communicated to the user that these can and should be used for
persistent citation. As the persistent availability of online resources always
depends strongly on the institution maintaining them, a digital edition should
outline its PID/permalink strategy and policy on the website. Furthermore, users
should be informed on how an edition deals with updates and revisions of its
resources. A good place for such a statement would be a citation page or the
about
page. How changes are logged and how old and new versions can be
cited is vital information. As a matter of fact, very few online editions will remain
unchanged over time. Ideally, a new permalink or PID should be created for every
version.
Currently, there is no standardised way to provide citation information in an SDE. Various strategies were discussed in this article and it was occasionally pointed out that some solutions seem to be more frequently used. This might be the first step towards the standardisation of this very important feature of an SDE. However, as there are too many different types of editions covering various subjects, further research is necessary to analyse what kind of citation recommendation fits to different types of editions. Currently, the best advice for editors and developers alike is to provide essential bibliographic data for the citation of an edition easily accessible to the user (ideally in human- and machine-readable form). It is essential for a user to know how to cite the individual resources of the edition and to easily find permalinks/PIDs and information on the versioning strategy of an SDE. Following these basic rules will increase the citability of an edition and help move towards more stability for scholarly online resources.