DHQ ML Tag Library
This document provides an introduction to using the DHQ markup language. It documents the use of the various tags in the DHQ schema, and gives information about good practice.
To suggest new tags or to ask for more information, please email us at
editors@digitalhumanities.org.
You can find information about future schema developments at our
Schema Requirements page.
Getting started
To encode your own article using the DHQ markup language you'll need a few things:
You'll also need encoding documentation, which is provided here in several forms. This tag library covers the major areas of DHQ encoding and documents the individual tags provided. In addition, you can download reference documentation here:
The reference documentation is extracted directly from the RNG schema modules. The rendition has known limitations, but it does have the advantage of representing
exactly what is in the Relax NG schema. Prose documentation in these files is the schema's own inline documentation.
More detailed information on additional technical topics is available:
Overview
DHQauthor markup is fairly intuitive, particularly if you are already familiar with another encoding language like TEI. A simple DHQ document consists of a DHQ header followed by a
text. The
text can be subdivided into
div elements, and each of these is composed of familiar components such as paragraphs, lists, figures, and the like. Within paragraph-level components, there are smaller elements available to encode names, emphasis, quotations, bibliographic references, foreign-language words, terminology, and other aspects of the verbal texture. At the end of the article there are sections for encoding notes and works cited.
Before submitting your encoded article to DHQ, you should make sure that it is valid against the DHQauthor schema. If the article is accepted for publication, we may revise your encoding to match additional internal style guidelines.
DHQdraft and DHQarticle
The document element or "root element" of a DHQauthor document will be either
DHQdraft or
DHQarticle. The only difference between them is that in
DHQdraft, the
DHQheader element is optional. You can encode your article using the
DHQdraft element to begin with, but all articles submitted to DHQ must use the DHQarticle structure and must include a DHQheader.
In both
DHQdraft and
DHQarticle, the main body of the article is encoded as a
text.
Optionally, after
text there may be any of several back matter elements:
-
notes contains any notes accompanying the article; these may be encoded inline (in which case they will be moved automatically into the notes section). *=listBibl= contains all of the bibliographic citations for the article. It should only be used for works that are actually cited, not as a "further reading" section. *=appendix= (any number)
Note that none of these are required - not even if you have notes, figures or a bibliography, since these elements can also appear in line.
A minimally valid DHQdraft encoding will look like this:
<?xml version="1.0" encoding="UTF-8"?>
<DHQdraft xmlns="http://digitalhumanities.org/DHQ/namespace">
<text></text>
</DHQdraft>
Note: the http://digitalhumanities.org/DHQ/namespace is required.
A minimally valid DHQarticle encoding will look like this:
<?xml version="1.0" encoding="UTF-8"?>
<DHQarticle xmlns="http://digitalhumanities.org/DHQ/namespace">
<DHQheader>
<title>Minimal DHQarticle</title>
<author>
<name>A. Scribens</name>
</author>
</DHQheader>
<text></text>
</DHQarticle>
Note: the http://digitalhumanities.org/DHQ/namespace is required.
Large document structures
text
The main content of a DHQ article is contained within the
text element. In a short article with no internal subdivisions, the
text element may simply contain paragraphs and paragraph-level things. In longer articles containing multiple sections, the individual sections are encoded with
div.
xtext
To embed a complete text within a DHQ article (for instance, a letter, a game, a poem that does not originate outside the text), use the
xtext element. Its structure is the same as that of
text. The
xtext element may go anywhere within
text or
div.
div
The
div element is used to represent subdivisions of the text. This element may nest inside itself as necessary to represent nesting subdivisions. A division typically begins with a
head and includes one or more paragraphs and other paragraph-level structures.
Elements occurring at the top of sections
The
text and
div elements will typically start with a
head element. In the case of
text this is the title of the article. In the case of
div it is the heading for the section. The
head element may contain basic inline elements (see below). While the
head element is optional, in a long article it is good practice to include headings for the divisions. Headings should not consist simply of a number, but should give the reader some idea of where the argument is going.
Epigraphs may appear at the top of a
text or
div, following the
head. They are encoded with
epigraph. The structure of an epigraph is a quotation (encoded with
quote) followed by a citation (encoded with
ptr or
ref). A graphic or media object may be used in place of a quotation.
Inline elements
The DHQ markup language contains a number of inline or phrase-level elements which can be used to mark individual words and phrases within things like paragraphs, headings, and list items. The goal of this markup is two-fold: in the short term, to support processing (formatting, linking, etc.), and in the long term, higher forms of analysis.
The most commonly used elements of this type are:
foreign
The
foreign element is used to encode words and phrases that are in a language other than that of the surrounding text. It carries a @lang attribute which should always be used to indicate what language its contents are in. The values for the @lang attribute are two-letter language codes as follows:
| Arabic | ar |
| Armenian | hy |
| Basque | eu |
| Bulgarian | bg |
| Catalan | ca |
| Chinese | zh |
| Czech | cs |
| Danish | da |
| Dutch | nl |
| Finnish | fi |
| French | fr |
| German | de |
| Greek (modern) | el |
| Greek (ancient) | grc |
| Hebrew | iw |
| Hindi | hi |
| Hungarian | hu |
| Icelandic | is |
| Italian | it |
| Japanese | ja |
| Korean | ko |
| Latin | la |
| Norwegian | no |
| Polish | pl |
| Portuguese | pt |
| Romanian | ro |
| Russian | ru |
| Sanskrit | sa |
| Spanish | es |
| Tibetan | bo |
| Welsh | cy |
comprehensive list
emph
The
emph element is used to encode words and phrases that are intended to be emphatic. It will typically display in italics or bold type.
hi
The
hi element is a fall-back element used to apply necessary formatting that is not driven by one of the more semantically distinctive elements provided in the schema. For instance, it might be used to represent superscripts or subscripts, or to highlight a portion of a code sample to bring it to the reader's attention. Its @rend attribute takes the following values:
- bold
- italic
- monospace
- quotes
- smcaps
- subscript
- superscript
No other values have any processing attached to them at this time.
called
The
called element is used to represent words or phrases (other than quotations and direct speech) that should be presented in quotation marks for some reason. It does not specify the reason (and is thus equivalent to the TEI P5 <q> element), and encompasses the following:
- ironic usage or "scare quotes": the "experts" tell us not to worry...
- drawing attention to specific usages: the "dungeon" in this case is intended metaphorically...
- identification of words as words: when we consider the word "phthisis" we are immediately struck...
name
The
name element is used to encode names in the text: typically the names of persons, but also place names and the names of organizations as well. At present this element is not processed in any special way, but in the future we may provide a more detailed rationale for encoding certain types of names (e.g. the names of significant figures in digital humanities) to support analysis. Authors wishing to encode names are free to do so.
term
The
term element is used for technical terms, which will typically be displayed with some kind of formatting (e.g. italics). In the future this element may serve as the basis for a DHQ glossary. Authors wishing to encode terms are free to do so.
title
The
title element is used to encode the titles of works discussed or cited in a DHQ article. Within the
bibl element it represents the title of the work being cited. In running prose, it should be used for any title mentioned.
The @rend attribute on
title may take one of three values:
- italic (for books, journals, and other standalone works such as works of e-literature
- quotes (for journal articles, individual chapters or sections of larger works)
- none (for conferences, interviews, book series, and other titles that require no formatting)
In addition, for articles that are discussing markup or programming, the following additional elements may be useful:
tag
The
tag element is used to encode a complete XML tag (possibly including attributes). Content should be entered without the surrounding angle brackets.
att
The
att element is used to encode the name of an XML attribute. It will typically be formatted in a way that signals this fact: e.g. using the conventional @ prefix.
val
The
val element is used to encode an XML attribute value. It will typically be formatted in a way that signals this fact: e.g. by enclosing it within quotation marks.
class
The
class element is used to encode an abstract class, for document or data modelling.
eg
The
eg element is used to encode a code sample to be displayed as a block. By default its contents will be displayed in a manner similar to the HTML <pre> element, with white space and line breaks preserved, and in a fixed-width font. Within
eg, the
emph and
hi elements are permitted to allow for highlighting of significant sections.
code
The
code element is used to encode a short snippet of computer code, to be displayed inline. By default, its contents will typically be displayed in a fixed-width font.
gi
The
gi element is used to encode the name of an XML element (without attributes). It will typically be formatted in a way that signals this fact: e.g. by enclosing it within angle brackets.
Prose, verse, and dialogue
Prose
Within the
div element (or directly within
text if there are no subdivisions), the text of the article is typically encoded as a series of prose paragraphs (the
p element) intermixed with specific elements for lists, etc. (which are covered below). However, articles may include, or may consist of, material in genres other than prose.
Verse
Verse is encoded using the
lg element, which represents a "line group" of one or more poetic lines.
Within
lg, individual lines are encoded with
l. A "line" in this context does not make significant semantic claims to being verse, but simply represents a line of text that operates as a unit that cannot be arbitrarily broken or relineated.
Dramatic dialogue
Within dramatic dialogue, individual speeches or utterances are encoded using the
sp element. This element may contain one or more
p,
lg, or
stage elements in any order. The
stage element is used to represent stage directions.
Elements of discourse and presentation
Quotations
The
quote element is used for quotations of material from outside the text (e.g. other articles, books, aphorisms, etc.).
- The @rend attribute takes values "inline" and "block"; inline quotations will be formatted with quotation marks; block quotations will be set off as blocks.
- When accompanied by a bibliographic citation (encoded as
ptr or ref) it should be enclosed within a cit element to associate the quote and the citation
- The bibliographic reference should be encoded with
ptr unless its wording is distinctive and needs to be preserved; in that case, use ref. In both cases, the @target attribute points to a bibl in the listBibl at the end of the document.
The
q element is used to represent direct speech.
Examples
The
eg element is used to encode examples, which may be sample code or other material that needs to be presented with its formatting and line breaks intact.
Figures
The
figure element is used to encode figures. It contains several child elements:
-
figDesc contains a brief description of the figure, to be used when the figure itself cannot be viewed
-
caption contains a brief caption which will be displayed below the figure
-
graphic contains a @url attribute which points to the relevant image file
Each
figure element should carry a unique identifier, typically "figure01", "figure02", etc. for ease of prooofeading. The heading "Figure 1" (etc.) is automatically generated by the stylesheet, so no heading is required.
Lists
The
list element is used to encode lists of all kinds. Its @type attribute takes the following values:
- "ordered": generates numbered labels
- "unordered": generates bullet labels
- "gloss": formats the contents of
label
- "simple": no labels at all
Tables
The
table element is used to encode tables. It contains a series of
row elements, each of which contains one or more
cell elements.
To identify a given row or cell as a label, use the @role attribute, with the value "label":
<row role="label">
<cell role="label">
Bibliographies
The items in the bibliography for the article will be encoded with
bibl. In the future, most bibliographic items will be stored in the Biblio database and their data will be extracted and imported automatically into DHQ articles. However, all items will be represented by a "stub"
bibl element. Some items may be unsuitable for inclusion in Biblio, either because they are too odd or because they are too article-specific (e.g. "Personal interview with the author, June 2008"). These will receive a full bibliographic entry in the article itself.
The
bibl element carries an @id attribute, and its first child is
label. The @id should take the form lastnameYYYY, where YYYY is the four-digit year of publication. If there is more than one item by the same author in the same year, then letters may be used for disambiguation: smith2008a, smith2008b, etc. Id values should always be all lower case and should not contain diacritics or punctuation. For multi-author items, use the last name of the first author only.
The contents of the
label element should be the label that will be displayed in the text when the item is cited, e.g. Smith 2008. The label should be the last name(s) of the authors plus the four-digit year of publication, separated by a space. For example:
- Smith 2008
- Smith and Jones 2008
- Smith et al. 2008
- Sinclair-Smith 2008
External graphics and media
Internal cross-references, external linking
Both internal and external links are encoded using the same two elements,
ptr and
ref. Both of these elements carry a @target attribute which points either to an internal or external target.
If the target is internal, the value of @target is preceded by a hash mark (#), and points to the @id value of some other element in the same DHQ XML file.
If the target is external, the value of @target is a URL beginning with "http://".
ptr
The
ptr element is an empty element that points to an internal or external target. Internally, it is used to point to a
bibl element containing a bibliographic citation.
The @loc attribute can be used to provide a page number or page range (for printed sources) or a section or paragraph number (for online sources). Context is used to distinguish the two.
For display purposes, the information encoded with
ptr will be presented either as a URL (in the case of external targets) or as the formatted label of a bibliographic reference (in the case of internal pointers to the bibliography). If there is a value for @loc, the display will include that information as well; for instance:
<ptr target="jones1999" loc="4"/>
would display as (Jones 1999, 4)
<ptr target="http://www.digitalhumanities.org"/>
will be presented as
<http://www.digitalhumanities.org>.
ref
The
ref element is very similar to
ptr, but it must contain content. It is used in cases where the wording of the reference is significant and needs to be preserved. Its @target attribute behaves exactly like that of
ptr (described above). The
ref element does not carry a @loc attribute, since it is assumed that any specific page reference will be described in the element's content.
For example:
<ref target="jones1999">See Jones 1999, note 14</ref>
<ref target="http://www.wikipedia.org">Wikipedia</ref>
ref works just like an HTML link; the author provides the text that will be inserted into the anchor of the resulting hyperlink. This works alike for internal and external links. Because the text is not auto-generated,
ref is more flexible than
ptr, while being correspondingly more difficult to create and maintain.
ref may also be used without a
target at all. In this case, no hyperlink will be generated. This is an escape hatch for occasions when, for example, one of
bibl,
ptr or
ref must be provided to give the source of a citation. If the citation cannot be hyperlinked, a
ref may be used without a
target. This should, however, be an exceptional case.
--
JuliaFlanders - 23 Jun 2008
people who've edited this page:
WendellPiez
last touched - 10 Nov 2007