r25 - 18 Jan 2010 - 16:02:40 - JuliaFlandersYou are here: TWiki >  DHquarterly Web > Development > DHQauthorSchema > TagLibrary

DHQ Encoding Documentation

This document provides an introduction to using the DHQ markup language. Because DHQ is a customization of TEI, most of the elements in DHQ markup are fully documented as part of the TEI Guidelines. This guide is intended to provide an overview of the DHQ document structure and information about good practice; it also documents the DHQ-specific elements and usages that may depart from standard TEI usage.

To suggest new tags or to ask for more information, please email us at editors@digitalhumanities.org.

You can find information about future schema developments at our Schema Requirements page.

For guidance on ensuring that your encoding is complete, see the Encoding Checklist.

For guidance on getting started, see the DHQ Quick Start page.

Overview

DHQauthor markup is fairly intuitive, particularly if you are already familiar with the TEI Guidelines. A simple DHQ document consists of a teiHeader containing document metadata, followed by a text. The text can be subdivided into div elements, and each of these is composed of components such as paragraphs, lists, figures, and the like. Within paragraph-level components, there are smaller elements available to encode names, emphasis, quotations, bibliographic references, foreign-language words, terminology, and other aspects of the verbal texture. At the end of the article there are sections for encoding works cited. As a shortcut, we provide a template? with all required metadata elements.

Authorial Metadata

Most of the metadata in a DHQ document is supplied by DHQ, either through the encoding template or during the editing process. However, some metadata is required from the author:

*Article title *For each author, their name, affiliation, and email address. A brief biographical paragraph will be requested if the article is accepted for publication (and this can also be supplied when the article is submitted) *A small number of keywords for the article, indicating its subject domain. These are chosen by the author and may be anything the author feels would be useful for retrieval; DHQ will also add keywords from a standard vocabulary.

Large document structures

text

The main content of a DHQ article is contained within the text element. Within text, there are three major components: *=front= contains a teaser and abstract, both of which are supplied by the author (either upon submission or after the article has been accepted for publication) *=body= contains the article proper, including any appendices *=back= contains the bibliography of works cited, encoded as listBibl.

--+++body

In a short article with no internal subdivisions, the body element may simply contain paragraphs and paragraph-level things. In longer articles containing multiple sections, the individual sections are encoded with div. The div element should not be used if there are no internal subdivisions.

Appendices should be encoded with div, with a heading "Appendix".

floatingText

To embed a complete text within a DHQ article (for instance, a letter, a game, a poem that does not originate outside the text), use the floatingText element. Its structure is essentially the same as that of text. The floatingText element may go anywhere within body or div.

div

The div element is used to represent subdivisions of the text. This element may nest inside itself as necessary to represent nesting subdivisions. A division typically begins with a head and includes one or more paragraphs and other paragraph-level structures, or further subdivisions. DHQ does not use any @type or @n attributes on div. It is also unnecessary to assign IDs to div elements unless they are the target of a cross-reference.

Elements occurring at the top of sections

The text and div elements will typically start with a head element. In the case of text this is the title of the article. In the case of div it is the heading for the section. The head element may contain basic inline elements (see below). While the head element is optional, in a long article it is good practice to include headings for the divisions. Headings should not consist simply of a number, but should give the reader some idea of where the argument is going.

Epigraphs may appear at the top of a text or div, following the head. They are encoded with epigraph. Epigraphs will typically contain a cit element, which consists of a quotation (encoded with quote) followed by a citation (encoded with ptr or ref). A graphic or example may be used in place of cit.

Inline elements

DHQ includes a subset of the TEI's phrase-level elements. The most commonly used elements of this type are:

foreign

The foreign element is used to encode words and phrases that are in a language other than that of the surrounding text. It carries an @xml:lang attribute which should always be used to indicate what language its contents are in. The values for the @xml:lang attribute are two-letter language codes as follows:

Arabic ar
Armenian hy
Basque eu
Bulgarian bg
Catalan ca
Chinese zh
Czech cs
Danish da
Dutch nl
Finnish fi
French fr
German de
Greek (modern) el
Greek (ancient) grc
Hebrew iw
Hindi hi
Hungarian hu
Icelandic is
Italian it
Japanese ja
Korean ko
Latin la
Norwegian no
Polish pl
Portuguese pt
Romanian ro
Russian ru
Sanskrit sa
Spanish es
Tibetan bo
Welsh cy

comprehensive list

emph

The emph element is used to encode words and phrases that are intended to be rhetorically emphatic. It will typically display in italics.

hi

The hi element is a fall-back element used to apply necessary formatting that is not driven by one of the more semantically distinctive elements provided in the schema. For instance, it might be used to represent superscripts or subscripts, or to highlight a portion of a code sample to bring it to the reader's attention. Its @rend attribute takes the following values:

  • bold
  • italic
  • monospace
  • quotes
  • smcaps
  • subscript
  • superscript

No other values have any processing attached to them at this time.

q

The q element is used to represent words or phrases (other than technical terms, quotations and direct speech) that should be presented in quotation marks for some reason. It encompasses the following:

  • ironic usage or "scare quotes": the "experts" tell us not to worry...
  • drawing attention to specific usages: the "dungeon" in this case is intended metaphorically...
  • identification of words as words: when we consider the word "phthisis" we are immediately struck...

name

The name element is used to encode names in the text: typically the names of persons, but also place names and the names of organizations as well. At present this element is not processed in any special way, but in the future we may provide a more detailed rationale for encoding certain types of names (e.g. the names of significant figures in digital humanities) to support analysis. Authors wishing to encode names are free to do so.

term

The term element is used for technical terms, which will typically be displayed with some kind of formatting (e.g. italics or quotation marks). In the future this element may serve as the basis for a DHQ glossary. Authors wishing to encode terms are free to do so.

title

The title element is used to encode the titles of works discussed or cited in a DHQ article. Within the bibl element it represents the title of the work being cited. In running prose, it should be used for any title mentioned.

The @rend attribute on title may take one of three values:

  • italic (for books, journals, and other standalone works such as works of e-literature
  • quotes (for journal articles, individual chapters or sections of larger works)
  • none (for conferences, interviews, book series, and other titles that require no formatting)

In addition, for articles that are discussing markup or programming, the following additional elements may be useful:

tag

The tag element is used to encode a complete XML tag (possibly including attributes). Content should be entered without the surrounding angle brackets.

att

The att element is used to encode the name of an XML attribute. It will typically be formatted in a way that signals this fact: e.g. using the conventional @ prefix.

val

The val element is used to encode an XML attribute value. It will typically be formatted in a way that signals this fact: e.g. by enclosing it within quotation marks.

eg

The eg element is used to encode a code sample to be displayed as a block. By default its contents will be displayed in a manner similar to the HTML <pre> element, with white space and line breaks preserved, and in a fixed-width font. Within eg, the hi element is permitted to allow for highlighting of significant sections.

code

The code element is used to encode a short snippet of computer code, to be displayed inline. By default, its contents will typically be displayed in a fixed-width font.

gi

The gi element is used to encode the name of an XML element (without attributes). It will typically be formatted in a way that signals this fact: e.g. by enclosing it within angle brackets.

Prose, verse, and dialogue

Prose

Within the div element (or directly within body if there are no subdivisions), the text of the article is typically encoded as a series of prose paragraphs (the p element) intermixed with specific elements for lists, etc. (which are covered below). However, articles may include, or may consist of, material in genres other than prose.

Verse

Verse is encoded using the lg element, which represents a "line group" of one or more poetic lines.

Within lg, individual lines are encoded with l. A "line" in this context does not make significant semantic claims to being verse, but simply represents a line of text that operates as a unit that cannot be arbitrarily broken or relineated.

Dramatic dialogue

Within dramatic dialogue, individual speeches or utterances are encoded using the sp element. This element may contain one or more p, lg, or stage elements in any order. The stage element is used to represent stage directions.

Elements of discourse and presentation

Quotations and epigraphs

The quote element is used for quotations of material from outside the text (e.g. other articles, books, aphorisms, etc.).
  • The @rend attribute takes values "inline" and "block"; inline quotations will be formatted with quotation marks; block quotations will be set off as blocks.
  • When accompanied by a bibliographic citation (encoded as ptr or ref), a quote should be enclosed within a cit element to associate the quote and the citation.
  • The bibliographic reference should be encoded with ptr unless its wording is distinctive and needs to be preserved; in that case, use ref. In both cases, the @target attribute points to a bibl in the listBibl at the end of the document. In rare cases, a ref may simply contain a brief citation without pointing to a bibl (for instance, if the quotation is from a famous speech or other non-bibliographic source), in which case ref should carry a @type attribute with the value "offline".

The said element is used to represent direct speech.

Examples

The eg element is used to encode literal examples, which may be sample code or other material that needs to be presented with its formatting and line breaks intact. If it contains an XML sample, its contents should be escaped, either by containing them inside a CDATA section, or by escaping the individual characters (we prefer the former). All white space and lineation will be preserved. The distinction between examples and quotations may in some cases be hard to draw; in cases where the example is significant for being quoted from some specific piece of code, the eg element may be nested inside a quote element.

Examples that are not sample code and do not require exact preservation of white space (for instance, a sample text that will be discussed in the article) should be encoded with dhq:example. This is a DHQ-specific element; it contains an optional head followed by the sample text (which may be encoded with floatingText or with other structural elements such as p, sp, etc. as needed).

Figures

Figures are encoded with the figure element. It contains several child elements:
  • an optional head (a heading with the figure number will be automatically generated and any author-supplied heading will be appended to this)
  • figDesc contains a brief description of the figure, to be used when the figure itself cannot be viewed
  • graphic contains a @url attribute which points to the relevant image file
  • dhq:caption contains a brief caption which will be displayed below the figure

Each figure element should carry a unique identifier to permit cross-referencing. The value of the identifier should be "figure01", "figure02", etc. for ease of proofreading.

Image files should be named figure01.jpg (etc.).

Lists

The list element is used to encode lists of all kinds. Its @type attribute takes the following values:
  • "ordered": generates numbered labels
  • "unordered": generates bullet labels
  • "gloss": formats the contents of label
  • "simple": no labels at all

Tables

The table element is used to encode tables. It contains a series of row elements, each of which contains one or more cell elements. It may also contain an optional heading and an optional dhq:caption (as with figure).

To identify a given row or cell as a label, use the @role attribute, with the value "label":

<row role="label"> <cell role="label">

Bibliographies and bibliographic citations

The items in the bibliography for the article should be encoded with bibl. In the future, most bibliographic items will be stored in a comprehensive bibliographic database and their data will be extracted and imported automatically into DHQ articles. However, all items will be represented by a "stub" bibl element. Some items may be unsuitable for inclusion in Biblio, either because they are too odd or because they are too article-specific (e.g. "Personal interview with the author, June 2008"). These should receive a full bibliographic entry in the article itself.

The bibl element carries two attributes:

  • @xml:id, which provides a unique identifier so that references in the text can point to their bibliographic item
  • @label, which provides a string that can be automatically printed in the text as a reference (e.g. "Liu 2008")

The value of @xml:id should take the form lastnameYYYY, where YYYY is the four-digit year of publication. If there is more than one item by the same author in the same year, then letters may be used for disambiguation: smith2008a, smith2008b, etc. Id values should always be all lower case and should not contain diacritics or punctuation. For multi-author items, use the last name of the first author only.

The value of the @label attribute should be the label that will be displayed in the text when the item is cited, e.g. Smith 2008. The label should be the last name(s) of the authors plus the four-digit year of publication, separated by a space, without a comma. For example:

  • Smith 2008
  • Smith and Jones 2008
  • Smith et al. 2008
  • Sinclair-Smith 2008

Internal cross-references, external linking

DHQ uses two elements for linking and cross-references, with somewhat different and specific usages.

ptr

The ptr element is an empty element, used for internal bibliographic references and certain external URIs. Its @loc attribute can be used to provide a page number or page range (for printed sources) or a section or paragraph number (for online sources).

The ptr element is used in two cases:

  1. For simple internal references to bibliographic items, in cases where the link text can be automatically generated from the @label attribute on bibl and the @loc attribute on ptr (e.g. "Smith 2008, 19"). The @target attribute on ref contains the @xml:id of the targeted bibl, preceded by a hash mark (#): e.g. <ptr target="#smith2008"/>. In cases where the author wants to supply some additional text for the link (e.g. "see Smith's excellent introduction, especially page 19") the ref element should be used instead of ptr.
  2. For simple references to external URIs from within the bibliography. In these cases, the @target attribute on ref carries the full URI (including the protocol: "http://", etc.).

For display purposes, the information encoded with ptr will be presented either as a URL (in the case of external targets) or as the formatted label of a bibliographic reference (in the case of internal pointers to the bibliography). If there is a value for @loc, the display will include that information as well; for instance:

<ptr target="jones1999" loc="4"/>
would display as [Jones 1999, 4]
<ptr target="http://www.digitalhumanities.org"/>

will be presented as <http://www.digitalhumanities.org>.

For values of @loc that are not simple page numbers, a prefix should be included that indicates what kind of reference number is being used: for instance, "para. 1" or "section 3.2" or "item 6". The value of @loc will be displayed exactly as encoded.

ref

The ref element is very similar to ptr, but it must contain content. It is used for several purposes:

  1. cases where the wording of a bibliographic reference is significant and needs to be preserved.
  2. references to external URIs in the body of the article; in these cases, the URI should appear both in the content and in the value of @target. In some cases, it may be more appropriate to put the name of the site or page instead of a URI in the content of ref; see stylistic note below.
  3. internal cross-references to other sections, notes, etc. in the DHQ article or in other DHQ articles. In these cases, the @target attribute should contain the @xml:id of the element being pointed to (e.g. <ref target="#figure01">Figure 1, below</ref>).

The ref element also carries a @loc attribute; although this is not used for display, it can be used for analysis and to ensure that the page reference is captured in a formal manner.

For example: <ref target="jones1999" loc="note 14">See Jones 1999, note 14</ref> <ref target="http://www.wikipedia.org">Wikipedia</ref>

For references to external URIs, ref works just like an HTML link; the author provides the text that will be inserted into the anchor of the resulting hyperlink. DHQ house style suggests that where the URL is unfamiliar or difficult to discover, or where the reference is to a very specific page on a site, the URL be included as the content of ref so that it remains visible if the reader prints the DHQ page; for URLs that are very familiar or easy to discover (e.g. the New York Times, major funding agencies, etc.) the name of the site be used instead to reduce clutter on the page.

ref may also be used without a target at all. In this case, no link will be generated. This is appropriate for references that are not to specific published items but to works that have conventionalized citation systems (e.g. Homer and most classical texts, the Bible and other scriptural texts), or that are not available in published form at all (e.g. private communications to the author, keynote lectures, etc.). In these cases, ref is used with a @type attribute whose value is "offline", indicating that there is neither a bibliographic citation nor a URL available.

Where a citation accompanies an epigraph, it may be desirable for aesthetic reasons to give the bibliographic information in a more extended form (i.e. not the abbreviated "Emerson 1971" but the more extensive "Emerson, _The American Scholar_". In these cases, the desired display text should be encoded as the content of ref, and the target attribute should point to the bibl element in the usual way.

Special characters

Any Unicode character may be included directly in the data for a DHQ file. If the character can be typed directly, that is ideal. If not, then it can be represented as a numeric character reference, using the Unicode code point for the character being represented. For example, to represent an e with an acute accent:

Typed directly: é Unicode, using the decimal notation: é Unicode, using the hexadecimal notation: &#x00E9;

Accented characters should not be used in @id values.

A list of commonly occurring diacritics and other special characters:

  • —: em-dash, used as a punctuational dash (option-shift-hyphen on a Macintosh keyboard), &#x2014;
  • –: en-dash, used for number ranges (option-hyphen on a Macintosh keyboard), &#x2013;

  • à: &#x00E0;
  • á: &#x00E1;
  • â: &#x00E2;
  • ã: &#x00E3;
  • ä: &#x00E4;
  • è: &#x00E8;
  • é: &#x00E9;
  • ê: &#x00EA;
  • ë: &#x00EB;
  • ì: &#x00EC;
  • í: &#x00ED;
  • î: &#x00EE;
  • ï: &#x00EF;
  • ò: &#x00F2;
  • ó: &#x00F3;
  • ô: &#x00F4;
  • õ: &#x00F5;
  • ö: &#x00F6;
  • ù: &#x00F9;
  • ú: &#x00FA;
  • û: &#x00FB;
  • ü: &#x00FC;
  • ñ: &#x00F1;

-- JuliaFlanders - 23 Jun 2008 people who've edited this page: WendellPiez
last touched - 10 Nov 2007

Topic attachments
I Attachment Action Size Date Who Comment
htmlhtml DHQauthor-doc.html manage 309.8 K 23 Jun 2008 - 16:02 WendellPiez DHQauthor documentation (from RNG)
htmlhtml DHQauthor-header-doc.html manage 33.7 K 10 Nov 2007 - 19:28 WendellPiez documentation for header module of DHQauthor (from RNG)
htmlhtml DHQpublish-doc.html manage 9.7 K 10 Nov 2007 - 19:29 WendellPiez documentation for DHQpublish schema wrapper (with its own declarations)
Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r25 < r24 < r23 < r22 < r21 | More topic actions
 
DHQuarterly
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback