DHQ Encoding Documentation
This document provides an introduction to using the DHQ markup language. Because DHQ is a customization of TEI, most of the elements in DHQ markup are fully documented as part of the TEI Guidelines. This guide is intended to provide an overview of the DHQ document structure and information about good practice; it also documents the DHQ-specific elements and usages that may depart from standard TEI usage.
To suggest new tags or to ask for more information, please email us at
editors@digitalhumanities.org.
You can find information about future schema developments at our
Schema Requirements page.
For guidance on ensuring that your encoding is complete, see the
Encoding Checklist.
For guidance on getting started, see the
DHQ Quick Start page.
Overview
DHQauthor markup is fairly intuitive, particularly if you are already familiar with the TEI Guidelines. A simple DHQ document consists of a
teiHeader containing document metadata, followed by a
text. The
text can be subdivided into
div elements, and each of these is composed of components such as paragraphs, lists, figures, and the like. Within paragraph-level components, there are smaller elements available to encode names, emphasis, quotations, bibliographic references, foreign-language words, terminology, and other aspects of the verbal texture. At the end of the article there are sections for encoding works cited. As a shortcut, we provide a
template? with all required metadata elements.
Authorial Metadata
Most of the metadata in a DHQ document is supplied by DHQ, either through the encoding template or during the editing process. However, some metadata is required from the author:
*Article title
*For each author, their name, affiliation, and email address. A brief biographical paragraph will be requested if the article is accepted for publication (and this can also be supplied when the article is submitted)
*A small number of keywords for the article, indicating its subject domain. These are chosen by the author and may be anything the author feels would be useful for retrieval; DHQ will also add keywords from a standard vocabulary.
Large document structures
text
The main content of a DHQ article is contained within the
text element. Within
text, there are three major components:
*=front= contains a
teaser and
abstract, both of which are supplied by the author (either upon submission or after the article has been accepted for publication)
*=body= contains the article proper, including any appendices
*=back= contains the bibliography of works cited, encoded as
listBibl.
--+++body
In a short article with no internal subdivisions, the
body element may simply contain paragraphs and paragraph-level things. In longer articles containing multiple sections, the individual sections are encoded with
div. The
div element should not be used if there are no internal subdivisions.
Appendices should be encoded with
div, with a heading "Appendix".
floatingText
To embed a complete text within a DHQ article (for instance, a letter, a game, a poem that does not originate outside the text), use the
floatingText element. Its structure is essentially the same as that of
text. The
floatingText element may go anywhere within
body or
div.
div
The
div element is used to represent subdivisions of the text. This element may nest inside itself as necessary to represent nesting subdivisions. A division typically begins with a
head and includes one or more paragraphs and other paragraph-level structures, or further subdivisions. DHQ does not use any @type or @n attributes on
div. It is also unnecessary to assign IDs to
div elements unless they are the target of a cross-reference.
Elements occurring at the top of sections
The
text and
div elements will typically start with a
head element. In the case of
text this is the title of the article. In the case of
div it is the heading for the section. The
head element may contain basic inline elements (see below). While the
head element is optional, in a long article it is good practice to include headings for the divisions. Headings should not consist simply of a number, but should give the reader some idea of where the argument is going.
Epigraphs may appear at the top of a
text or
div, following the
head. They are encoded with
epigraph. Epigraphs will typically contain a
cit element, which consists of a quotation (encoded with
quote) followed by a citation (encoded with
ptr or
ref). A graphic or example may be used in place of
cit.
Inline elements
DHQ includes a subset of the TEI's phrase-level elements. The most commonly used elements of this type are:
foreign
The
foreign element is used to encode words and phrases that are in a language other than that of the surrounding text. It carries an @xml:lang attribute which should always be used to indicate what language its contents are in. The values for the @xml:lang attribute are two-letter language codes as follows:
| Arabic | ar |
| Armenian | hy |
| Basque | eu |
| Bulgarian | bg |
| Catalan | ca |
| Chinese | zh |
| Czech | cs |
| Danish | da |
| Dutch | nl |
| Finnish | fi |
| French | fr |
| German | de |
| Greek (modern) | el |
| Greek (ancient) | grc |
| Hebrew | iw |
| Hindi | hi |
| Hungarian | hu |
| Icelandic | is |
| Italian | it |
| Japanese | ja |
| Korean | ko |
| Latin | la |
| Norwegian | no |
| Polish | pl |
| Portuguese | pt |
| Romanian | ro |
| Russian | ru |
| Sanskrit | sa |
| Spanish | es |
| Tibetan | bo |
| Welsh | cy |
comprehensive list
emph
The
emph element is used to encode words and phrases that are intended to be rhetorically emphatic. It will typically display in italics.
hi
The
hi element is a fall-back element used to apply necessary formatting that is not driven by one of the more semantically distinctive elements provided in the schema. For instance, it might be used to represent superscripts or subscripts, or to highlight a portion of a code sample to bring it to the reader's attention. Its @rend attribute takes the following values:
- bold
- italic
- monospace
- quotes
- smcaps
- subscript
- superscript
No other values have any processing attached to them at this time.
q
The
q element is used to represent words or phrases (other than technical terms, quotations and direct speech) that should be presented in quotation marks for some reason. It encompasses the following:
- ironic usage or "scare quotes": the "experts" tell us not to worry...
- drawing attention to specific usages: the "dungeon" in this case is intended metaphorically...
- identification of words as words: when we consider the word "phthisis" we are immediately struck...
name
The
name element is used to encode names in the text: typically the names of persons, but also place names and the names of organizations as well. At present this element is not processed in any special way, but in the future we may provide a more detailed rationale for encoding certain types of names (e.g. the names of significant figures in digital humanities) to support analysis. Authors wishing to encode names are free to do so.
term
The
term element is used for technical terms, which will typically be displayed with some kind of formatting (e.g. italics or quotation marks). In the future this element may serve as the basis for a DHQ glossary. Authors wishing to encode terms are free to do so.
title
The
title element is used to encode the titles of works discussed or cited in a DHQ article. Within the
bibl element it represents the title of the work being cited. In running prose, it should be used for any title mentioned.
The @rend attribute on
title may take one of three values:
- italic (for books, journals, and other standalone works such as works of e-literature
- quotes (for journal articles, individual chapters or sections of larger works)
- none (for conferences, interviews, book series, and other titles that require no formatting)
In addition, for articles that are discussing markup or programming, the following additional elements may be useful:
tag
The
tag element is used to encode a complete XML tag (possibly including attributes). Content should be entered without the surrounding angle brackets.
att
The
att element is used to encode the name of an XML attribute. It will typically be formatted in a way that signals this fact: e.g. using the conventional @ prefix.
val
The
val element is used to encode an XML attribute value. It will typically be formatted in a way that signals this fact: e.g. by enclosing it within quotation marks.
eg
The
eg element is used to encode a code sample to be displayed as a block. By default its contents will be displayed in a manner similar to the HTML <pre> element, with white space and line breaks preserved, and in a fixed-width font. Within
eg, the
hi element is permitted to allow for highlighting of significant sections.
code
The
code element is used to encode a short snippet of computer code, to be displayed inline. By default, its contents will typically be displayed in a fixed-width font.
gi
The
gi element is used to encode the name of an XML element (without attributes). It will typically be formatted in a way that signals this fact: e.g. by enclosing it within angle brackets.
Prose, verse, and dialogue
Prose
Within the
div element (or directly within
body if there are no subdivisions), the text of the article is typically encoded as a series of prose paragraphs (the
p element) intermixed with specific elements for lists, etc. (which are covered below). However, articles may include, or may consist of, material in genres other than prose.
Verse
Verse is encoded using the
lg element, which represents a "line group" of one or more poetic lines.
Within
lg, individual lines are encoded with
l. A "line" in this context does not make significant semantic claims to being verse, but simply represents a line of text that operates as a unit that cannot be arbitrarily broken or relineated.
Dramatic dialogue
Within dramatic dialogue, individual speeches or utterances are encoded using the
sp element. This element may contain one or more
p,
lg, or
stage elements in any order. The
stage element is used to represent stage directions.
Elements of discourse and presentation
Quotations and epigraphs
The
quote element is used for quotations of material from outside the text (e.g. other articles, books, aphorisms, etc.).
- The @rend attribute takes values "inline" and "block"; inline quotations will be formatted with quotation marks; block quotations will be set off as blocks.
- When accompanied by a bibliographic citation (encoded as
ptr or ref), a quote should be enclosed within a cit element to associate the quote and the citation.
- The bibliographic reference should be encoded with
ptr unless its wording is distinctive and needs to be preserved; in that case, use ref. In both cases, the @target attribute points to a bibl in the listBibl at the end of the document. In rare cases, a ref may simply contain a brief citation without pointing to a bibl (for instance, if the quotation is from a famous speech or other non-bibliographic source), in which case ref should carry a @type attribute with the value "offline".
The
said element is used to represent direct speech.
Examples
The
eg element is used to encode literal examples, which may be sample code or other material that needs to be presented with its formatting and line breaks intact. If it contains an XML sample, its contents should be escaped, either by containing them inside a CDATA section, or by escaping the individual characters (we prefer the former). All white space and lineation will be preserved. The distinction between examples and quotations may in some cases be hard to draw; in cases where the example is significant for being quoted from some specific piece of code, the
eg element may be nested inside a
quote element.
Examples that are not sample code and do not require exact preservation of white space (for instance, a sample text that will be discussed in the article) should be encoded with
dhq:example. This is a DHQ-specific element; it contains an optional
head followed by the sample text (which may be encoded with
floatingText or with other structural elements such as
p,
sp, etc. as needed).
Figures
Figures are encoded with the
figure element. It contains several child elements:
- an optional
head (a heading with the figure number will be automatically generated and any author-supplied heading will be appended to this)
-
figDesc contains a brief description of the figure, to be used when the figure itself cannot be viewed
-
graphic contains a @url attribute which points to the relevant image file
-
dhq:caption contains a brief caption which will be displayed below the figure
Each
figure element should carry a unique identifier to permit cross-referencing. The value of the identifier should be "figure01", "figure02", etc. for ease of proofreading.
Image files should be named figure01.jpg (etc.).
Lists
The
list element is used to encode lists of all kinds. Its @type attribute takes the following values:
- "ordered": generates numbered labels
- "unordered": generates bullet labels
- "gloss": formats the contents of
label
- "simple": no labels at all
Tables
The
table element is used to encode tables. It contains a series of
row elements, each of which contains one or more
cell elements. It may also contain an optional heading and an optional
dhq:caption (as with
figure).
To identify a given row or cell as a label, use the @role attribute, with the value "label":
<row role="label">
<cell role="label">
Bibliographies and bibliographic citations
The items in the bibliography for the article should be encoded with
bibl. In the future, most bibliographic items will be stored in a comprehensive bibliographic database and their data will be extracted and imported automatically into DHQ articles. However, all items will be represented by a "stub"
bibl element. Some items may be unsuitable for inclusion in Biblio, either because they are too odd or because they are too article-specific (e.g. "Personal interview with the author, June 2008"). These should receive a full bibliographic entry in the article itself.
The
bibl element carries two attributes:
- @xml:id, which provides a unique identifier so that references in the text can point to their bibliographic item
- @label, which provides a string that can be automatically printed in the text as a reference (e.g. "Liu 2008")
The value of @xml:id should take the form lastnameYYYY, where YYYY is the four-digit year of publication. If there is more than one item by the same author in the same year, then letters may be used for disambiguation: smith2008a, smith2008b, etc. Id values should always be all lower case and should not contain diacritics or punctuation. For multi-author items, use the last name of the first author only.
The value of the @label attribute should be the label that will be displayed in the text when the item is cited, e.g. Smith 2008. The label should be the last name(s) of the authors plus the four-digit year of publication, separated by a space, without a comma. For example:
- Smith 2008
- Smith and Jones 2008
- Smith et al. 2008
- Sinclair-Smith 2008
Internal cross-references, external linking
DHQ uses two elements for linking and cross-references, with somewhat different and specific usages.
ptr
The
ptr element is an empty element, used for internal bibliographic references and certain external URIs. Its @loc attribute can be used to provide a page number or page range (for printed sources) or a section or paragraph number (for online sources).
The
ptr element is used in two cases:
- For simple internal references to bibliographic items, in cases where the link text can be automatically generated from the @label attribute on
bibl and the @loc attribute on ptr (e.g. "Smith 2008, 19"). The @target attribute on ref contains the @xml:id of the targeted bibl, preceded by a hash mark (#): e.g. <ptr target="#smith2008"/>. In cases where the author wants to supply some additional text for the link (e.g. "see Smith's excellent introduction, especially page 19") the ref element should be used instead of ptr.
- For simple references to external URIs from within the bibliography. In these cases, the @target attribute on
ref carries the full URI (including the protocol: "http://", etc.).
For display purposes, the information encoded with
ptr will be presented either as a URL (in the case of external targets) or as the formatted label of a bibliographic reference (in the case of internal pointers to the bibliography). If there is a value for @loc, the display will include that information as well; for instance:
<ptr target="jones1999" loc="4"/>
would display as [Jones 1999, 4]
<ptr target="http://www.digitalhumanities.org"/>
will be presented as
<http://www.digitalhumanities.org>.
For values of @loc that are not simple page numbers, a prefix should be included that indicates what kind of reference number is being used: for instance, "para. 1" or "section 3.2" or "item 6". The value of @loc will be displayed exactly as encoded.
ref
The
ref element is very similar to
ptr, but it must contain content. It is used for several purposes:
- cases where the wording of a bibliographic reference is significant and needs to be preserved.
- references to external URIs in the body of the article; in these cases, the URI should appear both in the content and in the value of @target. In some cases, it may be more appropriate to put the name of the site or page instead of a URI in the content of
ref; see stylistic note below.
- internal cross-references to other sections, notes, etc. in the DHQ article or in other DHQ articles. In these cases, the @target attribute should contain the @xml:id of the element being pointed to (e.g. <ref target="#figure01">Figure 1, below</ref>).
The
ref element also carries a @loc attribute; although this is not used for display, it can be used for analysis and to ensure that the page reference is captured in a formal manner.
For example:
<ref target="jones1999" loc="note 14">See Jones 1999, note 14</ref>
<ref target="http://www.wikipedia.org">Wikipedia</ref>
For references to external URIs,
ref works just like an HTML link; the author provides the text that will be inserted into the anchor of the resulting hyperlink. DHQ house style suggests that where the URL is unfamiliar or difficult to discover, or where the reference is to a very specific page on a site, the URL be included as the content of
ref so that it remains visible if the reader prints the DHQ page; for URLs that are very familiar or easy to discover (e.g. the New York Times, major funding agencies, etc.) the name of the site be used instead to reduce clutter on the page.
ref may also be used without a
target at all. In this case, no link will be generated. This is appropriate for references that are not to specific published items but to works that have conventionalized citation systems (e.g. Homer and most classical texts, the Bible and other scriptural texts), or that are not available in published form at all (e.g. private communications to the author, keynote lectures, etc.). In these cases,
ref is used with a @type attribute whose value is "offline", indicating that there is neither a bibliographic citation nor a URL available.
Where a citation accompanies an epigraph, it may be desirable for aesthetic reasons to give the bibliographic information in a more extended form (i.e. not the abbreviated "Emerson 1971" but the more extensive "Emerson, _The American Scholar_". In these cases, the desired display text should be encoded as the content of
ref, and the target attribute should point to the
bibl element in the usual way.
Special characters
Any Unicode character may be included directly in the data for a DHQ file. If the character can be typed directly, that is ideal. If not, then it can be represented as a numeric character reference, using the Unicode code point for the character being represented. For example, to represent an e with an acute accent:
Typed directly: é
Unicode, using the decimal notation: é
Unicode, using the hexadecimal notation: é
Accented characters should not be used in @id values.
A list of commonly occurring diacritics and other special characters:
- —: em-dash, used as a punctuational dash (option-shift-hyphen on a Macintosh keyboard), —
- –: en-dash, used for number ranges (option-hyphen on a Macintosh keyboard), –
- à: à
- á: á
- â: â
- ã: ã
- ä: ä
- è: è
- é: é
- ê: ê
- ë: ë
- ì: ì
- í: í
- î: î
- ï: ï
- ò: ò
- ó: ó
- ô: ô
- õ: õ
- ö: ö
- ù: ù
- ú: ú
- û: û
- ü: ü
- ñ: ñ
--
JuliaFlanders - 23 Jun 2008
people who've edited this page:
WendellPiez
last touched - 10 Nov 2007