Jonathan joined the IHR in 2007 as Project Editor for British History Online's project to complete the digitisation of the Calendars of State Papers. He now continues to work for part of the time on BHO, as well as spending time on Connected Histories and IHR web projects. ORCID ID: 0000-0002-5044-5038.
Judith Siefring is the Head of Digital Research at the Bodleian Libraries' Center for Digital Research at Oxford University. ORCID ID: 0000-0002-4385-6576.
This is the source
This article discusses the culture of digital citation within the humanities, with specific reference to research done on the citation of two well-used digital resources: British History Online and the Early English Books Online Text Creation Partnership. Because these two resources are available in both print and digital form, they provide a good test case of whether academics prefer to cite print sources when they have used digital resources in their research.
Blaney and Siefring explore the culture of digital citation within the humanities.
The initial effect of printing was not that of an increased distribution of identical copies being put into numerous hands...it was, first and foremost, the very transformation of the ethics of reading and writing
There is no doubt that print sources, as opposed to digital ones, still carry immense cultural cachet. As this article will explore in detail, many scholars prefer to cite the print version of a work, even when they have only seen a digital surrogate, in part because of this perceived prestige. We shall see that such practice is problematic in a number of ways, and causes difficulties even when using resources which are widely used and respected. For example, reference works such as the
An example of this unevenness is the OED entry for Presumption, orig. towards the gods; pride, excessive
self-confidence
wanton violence;
insolence; lust, lewdness; (of animals) violence; an outrage; violation,
rape; serious injury to the person; a loss by sea; an overbearing
man
. The sense development from Greek to English has been ignored by
the OED, except insofar as it has confused the definer.
A much better treatment of
< ancient Greekὀμϕαλός navel, centre, hub, round stone in the temple of Apollo at Delphi supposed to mark the centre of the earth, knob or boss (ultimately cognate with navel n.). In quot. 1847 at sense 2a via German Omphalos (1830 in the source translated)
It is clear that we are at a critical juncture in the culture of citation. Reference sources and journals are increasingly becoming online only, or exist online in expanded versions of print. Yet, for reasons this paper will explore, there is still a widespread aversion to the citing (as opposed to the using) of digital resources. It remains a brave researcher who will cite Wikipedia in preference to the OED.
In this article we will explore the resistance to citing digital sources that is
still widespread in the humanities. We have found that this resistance is
particularly prevalent where a print version of the source also exists; for this
reason we have focused on two digital projects which can be consulted online but
whose citations can be silently
Despite our discovery of this pervasive culture of non-citation, we have been
able to find little mention of it as a research topic. It is noticeable, in
reviewing the literature on digital citation, that cross-disciplinary studies of
the subject tend to Humanities papers were excluded from the analysis
because of their long citation windows and high uncitedness
rates
The very low percentage of articles cited at least once
may be a reflection of the tendency of humanities researchers to cite
books instead of articles
Note that here we are talking about citation of articles. The focus of this
article, the citation of digital resources, appears to be the unloved son of the
unloved son. This impression is borne out by the present authors’ experience
when surveying journal editors about their digital citation practices. Many
editors assumed, in an understandable but illuminating example of
Dalbello et al. have studied the digital citations in five journals in Classical
Studies and English and come to the same broad conclusions as the present
article: supporting argument by means of electronic document
[
A small-scale study (with 16 participants) examined the use of e-texts, defined
as any textual material in electronic form, used as a
primary source
references clearly did not represent the extent of
e-text used in the research
This paper studies the citation of two digital resources:
British History Online
EEBO-TCP aims to create an XML-encoded edition of every monographic text printed
in English or in England in the period 1473-1700. If there is more than one
edition of a particular work, then the first edition is selected unless there is
a compelling reason to choose a later one (if the first edition is badly damaged
or incomplete, for example). The texts for BHO are chosen by an academic
advisory group from the IHR and HoP, with a view to meeting the needs of its
core users; BHO is aimed at research-level historians (one university, after a
trial of BHO's subscription content, declined on the basis that it was too
advanced for its undergraduates). Similarly, EEBO-TCP’s core users are generally
scholars at postgraduate level and beyond, although many undergraduates do use
the resource too
BHO receives about 10 million page views per year. Having carried out the TIDSR
analysis in 2010-11, it became clear that, despite this high usage, it is very
little cited in academic literature
As set out in Webster and Blaney (2011), a search for citations of British History Online in journals (carried out in 2010) found 14 results using the Scopus service and 17 results using Google Scholar. By contrast, blog posts for just the period June to November 2010 showed that British History Online was referred to 84 times.
Qualitative data suggested the reason for under-citation: analysis of site
feedback over the period 2003-2010 found a number of complaints about the
non-display of the print page numbers from the digitized books. For example,
this feedback message from 2007 is representative: With
great pleasure, I have been going through your most excellent online version
of the ‘Thurloe State Papers’ for a scholarly paper which I am writing.
After consulting a particular transcribed document, I would then click on to
its approximate ‘Page’ number in the heading above, so as to search and find
the exact citation for my Endnotes. However, I now see that your current
online version does not display any original page images, thus preventing me
from determining a precise citation. How can I access the images, so as to
find the correct page-number for any given document?
EEBO-TCP’s application of the TIDSR analysis was carried out in 2012-13, as part of the Jisc-funded project
Bibliometric analysis suggested that EEBO-TCP is having a steadily increasing
impact on scholarship in relevant fields. The analysis surveyed EEBO and
EEBO-TCP-related publications in databases such as JSTOR and Scopus,
demonstrating a steady growth in such publications over the decade 2002 to 2012.
The Scopus data allows us to see the country of the authors of the publications,
indicating that authors from USA and the UK are most likely to mention their use
of EEBO, followed by two other English-speaking countries, Canada and Australia.
If we look at the journals that these articles were published in, we find a
range of journals chiefly in the fields of English Literature, Language and
History of the medieval and early modern periods. The bibliographic data,
therefore, supports the assertion that EEBO has had an increasing positive
impact on scholarship, particularly in English-speaking countries
However, user feedback (particularly via a user survey discussed in detail below), as with BHO, indicated that many scholars, particularly in the humanities, fail to cite or otherwise acknowledge their use of digital resources. The quantitative data accumulated during bibliometric analysis can therefore only be partial; if users are not citing their use of EEBO and EEBO-TCP then any numbers-based demonstration of their impact could be significantly lower than the true impact on scholarship. This disparity raises the issue of citation practice – what are users citing if they are not pointing to their use of digital collections? The image sets in EEBO and the full texts based on them in EEBO-TCP are based on actual printed books from libraries around the world: do users cite these original print copies even though they have never actually seen them?
This raises questions for all creators of digital resources: Why do users avoid citing the digital copies? What implications does this have for creators of digital resources, particularly when they need to demonstrate impact? And what measures can content creators introduce to combat the problem?
As part of the TIDSR analysis, the SECT project conducted an online survey of
EEBO-TCP users. The survey was run for around four months from the summer into
the autumn of 2012. The survey was advertised on the project website and via
Twitter, and was highlighted at the EEBO-TCP Oxford conference in 2012. Details
were also sent to faculty administrators at units specialising in the early
modern period at institutions across the UK, for circulation to their students
and staff. 220 people in total started the EEBO-TCP survey, 208 completed at
least part of the survey, and 185 completed it in full. The survey asked
participants for lots of information about their use of EEBO and EEBO-TCP, and
some of the questions asked pertained to citation practice
The survey sought to establish the impact of EEBO-TCP in both teaching and
research, and revealed interesting attitudes to citation in both areas. First we
asked users who identified themselves as spending at least one-fifth of their
time teaching
Most respondents use online resources in their teaching either daily (20%) or
several times a week (40%). Teaching academics not only use online resources for
teaching themselves but actively encourage their students to use them for their
own work. As we would expect from a survey that set out to reach EEBO-TCP users,
a high number of respondents encourage their students to use EEBO in particular
Almost all of these teachers encourage students to access online materials (97%),
and none of the remaining 3% of respondents actively discourage their use. The
survey also revealed that use of online resources in research is similarly
ubiquitous
It is clear that online resources are now heavily used by most teaching academics
and researchers. EEBO-TCP in particular is very widely used in early modern
studies. But is this enormous weight of use reflected in citation practice? In
order to explore this question further, researchers were asked how they
themselves cite materials from EEBO-TCP. Those who teach were asked how they
would instruct their students to acknowledge resources that they have consulted
online
The use of EEBO is now commonplace in research and teaching and yet 34% of
respondents fail to acknowledge that they have used an EEBO text and instead
cite the print version only. A quarter of respondents actively teach their
students to cite only print. These students are being taught to ignore or
disguise their use of digital resources. The responses to this question suggest
an additional problem: many (and in the case of the EEBO-TCP-aware audience for
the survey, most) researchers want to acknowledge their use of online material
but, as there is no single established way of doing so, there is considerable
variation in practice. Some cite both print and online sources, some online
only, some simply place
Examples of “other” answers to the question of how to teach students to cite:
Examples of
These responses suggest the uncertainty that many feel when confronted with the issue of how to cite their online sources. But the deeper problem remains that many apparently don’t want to reveal that they used online sources at all.
The EEBO-TCP user survey suggested that around a third of researchers fail to indicate their use of digital resources at all. Is this indicative of wider practice? What are the reasons for users failing to cite digital material? Why are (some) authors reluctant to cite digital if they can change to a print citation?
In order to try to find out more about the culture of citation, in April and May 2013 the authors sent via email a short survey to 60 UK-based print journals covering the fields of literature and history. Representatives from each journal were asked to send their responses also via email. An attempt was made to balance the selection by surveying journals covering different time periods, geographical focus, and thematic approach. In order to maximise the likelihood of reply, the survey asked three simple questions:
37 replies (a response rate of 62%) were received. 97% said they would not change
a print citation to digital, and 78% would not change a digital citation to
print. Nine asked for clarification on what was meant by
More interesting than the bald figures are the comments included by some editors
with their replies. For a number of journals digital citation by authors was
mentioned as a rare or non-existent occurrence, for example,
This raises the question of cause and effect. Do authors eschew digital citation because they think it would be frowned upon? If a journal never prints digital citations then potential authors reading it may think this is a deliberate policy (although the editor mentioned above did not say that they would not include digital citations, only that they do not receive them).
So, it may be that journal editors receive few digital citations and researchers
rarely see such citations in articles they are using for their research. What
assumptions are fuelling this cycle? Why do many shy away from citing digital,
whether they be authors or editors?
Practically, including URLs is seen to be a problem. Many fear that a particular
URL will no longer be active in five or ten years’ time (see, for example,
Legislation change in 2013 mandated the UK’s copyright libraries to capture the
UK’s public web domain as part of their remit
The Internet Archive has released a Firefox extension,
Harvard Law School Library leads a group of libraries and others in maintaining
Perma.cc, a free service which allows anyone with an account to create a link to
an archival version of a web page
A more pertinent barrier to digital citation might be URLs that are too long and cumbersome. Journal editors and print publishers dislike them because they look ugly and are hard to typeset or format. Academics and students too dislike their appearance, and the fact that, together with a full citation, they can affect the word or page count of a piece of work. They are unattractive for readers more generally. They have often been generated for technical reasons by the content creators with little thought for the needs of eventual users.
Philosophically, some researchers may feel that there is little difference
between the database where one accesses a text and the library where one reads a
book. Such researchers wouldn’t cite the library, so why cite a database? Many
scholars, especially early modernists, do of course cite the source library.
Those who give this as a reason for failure to cite have perhaps never
considered the question of whether it is honest to hide their use of digital
material, or thought about how such material is funded. We would further argue
that there is a clear difference in the reading experience between manuscript,
print and digital, and to elide this in citation does an injustice to each
This
It may be that leading scholars, secure in their jobs and reputations, would be less nervous about citing digital resources, if they think it appropriate. There is some anecdotal evidence that this is the case. Historians Paul Kennedy (internationally famous for
In a trenchant article, the historian Tim Hitchcock points to a deeper scholarly
problem that lack of transparency over digital sources is obscuring. Researchers
are trained in traditional source materials: libraries, archives, conventional
reference works. They are, usually, inexpert in using digital resources: We have not established the necessary new systems of
reference and validation that would make our use of these resources
transparent and repeatable.
Hitchcock is making a broader point than this article seeks to address: digital resources are problematic, often deeply so, and in claiming to use print where they have used digital, researchers are seriously misrepresenting their methodology. In order that scholarly work receive due scrutiny, it is essential that scholars be clear, open and honest about their use of digital resources.
What, then, can be done to improve the reputation of digital resources and to encourage users to acknowledge their use? Such change must start with the actual content creators themselves.
Fundamentally, content creators need to make it easy for their users to be open
and to properly acknowledge their use of a particular resource. If it is easy to
cite a digital resource, more users will do so. Digital resources should make
URLs as short as possible and, if possible, human-decodable, and should include
a clear link to an automatically-generated citation from the main page of a
text, image, or entry. In this way, content creators can make it as easy as
possible for their users to cite (or as difficult as possible for them not to),
with the result that citation rates should improve. A number of high profile
sites, such as the OED, Oxford DNB, Wikipedia, and indeed British History
Online, do allow users to automatically generate citations in multiple formats,
although even these excellent resources use URLs that are not obviously
decodable for readers.
Some scholars are concerned about how to cite something that may change or be updated. Digital editors must, therefore, make it clear how to date content accessed via their resource. Release information and/or editorial updates should be made as obvious as possible. By dating digital items in this way, online resource managers can help users feel comfortable about how to clearly refer to the evidence that they are citing and when they are citing it. Sites should encourage or guide users always to give a date of access whenever they cite a digital resource, and should include such a date in automatically generated citations.
Indeed, while this article was being researched and written, British History Online was relaunched with a new citation generator and using a completely new format for its URLs, to make them more immediately meaningful to the user. (British History Online had for a number of years offered citation help, allowing the automatic generation of a citation for any page of content: this has not had a discernible effect on citation habits). As a secondary benefit, if British History Online completely disappeared the new URLs would enable a researcher to trace the URL to a portion of a print book. In practice it might be easier to locate the URL in the UK Web Archive maintained by the British Library on behalf of all copyright libraries in the UK, although this is not currently publicly available; the Internet Archive’s crawls of the site are, it seems, not comprehensive.
What was previously a database-generated number has now been converted to a human-readable series and book. Further specificity is provided by the page range of the book; where this is not possible, for born-digital content and dictionary-like material, a meaningful subsection name has been chosen. For example the old URL for the Survey of London, Volume 46, pages 280 to 293 was:
This has now become:
An additional advantage here is that the user who only wants the volume level, or the series level, can strip off parts of the URL intuitively:
This process was carried out semi-automatically for about 100,000 URLs, using the original database and simply concatenating database fields. Although the results are surely a great improvement, the process was not onerous. There was consultation in the team about the best forms of abbreviations to use for the best trade-off between shortness of URL and clarity. The decisions here were, first, to use standard abbreviations where they exist. For example, for the Victoria County History (known to historians as the VCH) standard county abbreviations were used:
The process could only be semi-automatic because some fields of the database inevitably contained characters which are not allowed in URLs (such as quotation marks in the titles of Acts of Parliament), which had to be located with a regular expression and treated on a case-by-case basis (a side-benefit was that this process exposed some metadata errors which could be fixed). Further, the URLs had to be tested for uniqueness. Some of the new URLs were not unique because they pointed to the single page of the same book. This was most frequently the case with the folio volumes of the journals of the House of Lords and House of Commons, where several days’ sittings might occur on the same page, but had been separated for digitisation.
For example, page 14 of the print version of the
As this example shows, although page ranges are easily added automatically to URLs, they are not necessarily best practice. Better still would be a meaningful string chosen by an editor: in the Lords and Commons journals this would be the date of sitting; in the VCH example above, the parish or other unit under discussion. This could not be done retrospectively in this case, but can be done conveniently as content is created. Future material digitised on the site will use an editorial decision to create each URL. For example, since relaunch the site has published Proceedings in Parliament 1624, each with a bespoke URL encoding the date of sitting:
The new version of the site went live in December 2014 and so it is still too early to say if this change of URL convention will make a difference to the under-citation of BHO discussed above. But one of the impediments given to digital citation has, for this resource, been definitively removed. Indeed a further refinement was added in October 2015, in response to the objection that a page range was too imprecise for an academic reference. When the user mouses over a paragraph of text, a pilcrow appears in the left-hand margin (very faintly, so as not to be distracting); clicking on the pilcrow inserts the relevant paragraph number into the URL bar, allowing a more precise reference than a traditional print one.
These practical measures could help users feel more comfortable with the practicalities of digital citation. However, the more philosophical discomfort of according digital materials the same weight as print must also be addressed. Those working in the field of digital content creation are doubtless aware that some digital resources seem to be held in particularly high esteem. Despite the nuances of editorial practice discussed at the beginning of this article, the OED or the Oxford Dictionary of National Biography, for example, are understood to be built on sound scholarship and their medium is considered unimportant. With such examples in mind, other digital content creators could usefully consider how best to promote the scholarly rigour and importance of their own resource as a way of gradually eroding residual beliefs in the lack of respectability of the digital. As a first step, web resources should provide easily accessible editorial documentation at the point of accessing texts and images (rather than solely on project-descriptive websites), enabling users to fully understand the nature of the material they are accessing and the assumptions that they can make about it. By making clear the nature of their materials, content creators encourage researchers to use them in appropriate ways and thereby enhance their own scholarly reputations.
All of these mechanisms may help change individuals’ scholarly citation
practices. In turn, these scholars, should they become teaching academics, will
pass on their habits and expectations to their students. The EEBO-TCP user
survey asked participants how they prefer to learn about digital resources
Overwhelmingly participants prefer to explore digital resources themselves or
learn about them from peers. Uptake for library training sessions tends to be
low. Uptake for web tutorials seems rather low too – but this may be due to lack
of publicity or planning for dissemination. However, participants indicated in
the free text responses
Teaching academics play a vital role in disseminating scholarly practices. Online resources could usefully prepare citation guidelines and editorial documentation that could be circulated to academic departments and subject administrators for inclusion in local documentation given to students as they begin their studies. Although beyond the scope of this article, a survey of what guidelines are currently used would provide useful data on the current generation of students and how they are expected to cite. Making such teaching and training materials easily available on project websites would also be helpful. One-off project-led training sessions could be worthwhile, providing that they are properly promoted to encourage good attendance. By integrating awareness of the nature of digital resources and the importance of citing them directly into teaching, digital content creators can work together with academics to help shape the practices of the next generation of scholars.
An increasing problem for contributors and editors who prefer print citation for
journal articles will be that journal publishers themselves are steadily moving
towards online-only publication for their journals (for obvious economic
reasons). Alice Meadows, of Wiley Publishing, argues in a blog post that one
sticking point — the provision of print journals to members as a key benefit of
the membership of a learned society — is beginning to become less important as
an issue
As mentioned above, most survey respondents conceived our questions about digital
citation generally as being about DOIs specifically. As also mentioned earlier,
this may be a form of
Online-only journals can, obviously, only be cited digitally. It will become more and more confusing if – as scholars increasingly use three-fold citation types: items which exist only in print, digital versions or surrogates of originally print items, and items which are born-digital – researchers cite some online items and not others. As citing online-only material becomes commonplace, it should become equally commonplace to cite all material accessed online, regardless of whether a print version exists or not.
While active solutions can be undertaken by individual projects and content
creators, a gradual shift in culture and practice may already be being led by
respected institutions. But as Patrick Dunleavy argues in a thought-provoking
blog post, change cannot be effected unilaterally but will be a long process of driving out legacy citation
systems
Recently the Royal Society announced their move to continual publication, whereby
they will give a DOI but no page numbers
Change of practice at individual scholarly level reflects and promotes change at a wider cultural level. As more and more established academics are open about their use of online resources, the belief that digital content is less scholarly should lessen, and citation should improve. Open access and changes to the REF in the UK will, as mentioned above, sharpen the imperative to cite digital versions of content. Like the OED, many journals are abandoning print. Young researchers, not just luminaries like Kennedy and Davies, should find it increasingly easy to cite what they used in research without apology; their critics will find their position increasingly untenable.
In the meantime, it falls primarily to digital content creators to heighten
awareness of the issues involved. Simply raising the issue is often enough – in
conversation with them, the authors of this article have found that many
scholars have never thought carefully about their digital behaviour and simply
require some prompting to reconsider their citation habits. We must continue to
talk about it, formally and informally. Conference papers, blog posts, articles
and presentations which focus on the problem of digital citation will keep the
issue current and will encourage users to consider their own practices
Digital citation is important because it is a reflection of how digital resources are valued. It is important because it helps build cases for further funding and enhancement based on evidence of use and impact. It is important because it allows readers of published research to trace and discover sources, both known and new to them, as accurately as possible. It is also honest.
We hope and expect that, in time, the currently too widespread practice of citing a print work which has neither been seen nor used will come to seem an unfortunate historical interlude, one in which the practice of scholarly transparency was briefly and lamentably abandoned. Digital resources are here to stay – it is time that they received the credit that is their due.
We would like to thank Julianne Nyhan and Jane Winters for their feedback and suggestions while we were writing this article.