r17 - 08 Oct 2008 - 15:49:55 - JuliaFlandersYou are here: TWiki >  DHquarterly Web > Development > DHQauthorSchema > TagLibrary

DHQ ML Tag Library

This document provides an introduction to using the DHQ markup language. It documents the use of the various tags in the DHQ schema, and gives information about good practice.

To suggest new tags or to ask for more information, please email us at editors@digitalhumanities.org.

You can find information about future schema developments at our Schema Requirements page.

Getting started

To encode your own article using the DHQ markup language you'll need a few things:

You'll also need encoding documentation, which is provided here in several forms. This tag library covers the major areas of DHQ encoding and documents the individual tags provided. In addition, you can download reference documentation here:

The reference documentation is extracted directly from the RNG schema modules. The rendition has known limitations, but it does have the advantage of representing exactly what is in the Relax NG schema. Prose documentation in these files is the schema's own inline documentation.

More detailed information on additional technical topics is available:

Overview

DHQauthor markup is fairly intuitive, particularly if you are already familiar with another encoding language like TEI. A simple DHQ document consists of a DHQ header followed by a text. The text can be subdivided into div elements, and each of these is composed of familiar components such as paragraphs, lists, figures, and the like. Within paragraph-level components, there are smaller elements available to encode names, emphasis, quotations, bibliographic references, foreign-language words, terminology, and other aspects of the verbal texture. At the end of the article there are sections for encoding notes and works cited.

Before submitting your encoded article to DHQ, you should make sure that it is valid against the DHQauthor schema. If the article is accepted for publication, we may revise your encoding to match additional internal style guidelines.

DHQdraft and DHQarticle

The document element or "root element" of a DHQauthor document will be either DHQdraft or DHQarticle. The only difference between them is that in DHQdraft, the DHQheader element is optional. You can encode your article using the DHQdraft element to begin with, but all articles submitted to DHQ must use the DHQarticle structure and must include a DHQheader.

In both DHQdraft and DHQarticle, the main body of the article is encoded as a text.

Optionally, after text there may be any of several back matter elements:

  • notes contains any notes accompanying the article; these may be encoded inline (in which case they will be moved automatically into the notes section). *=listBibl= contains all of the bibliographic citations for the article. It should only be used for works that are actually cited, not as a "further reading" section. *=appendix= (any number)

Note that none of these are required - not even if you have notes, figures or a bibliography, since these elements can also appear in line.

A minimally valid DHQdraft encoding will look like this:

<?xml version="1.0" encoding="UTF-8"?>
<DHQdraft xmlns="http://digitalhumanities.org/DHQ/namespace">
  <text></text>
</DHQdraft>

Note: the http://digitalhumanities.org/DHQ/namespace is required.

A minimally valid DHQarticle encoding will look like this:

<?xml version="1.0" encoding="UTF-8"?>
<DHQarticle xmlns="http://digitalhumanities.org/DHQ/namespace">
  <DHQheader>
    <title>Minimal DHQarticle</title>
    <author>
      <name>A. Scribens</name>
    </author>
  </DHQheader>
  <text></text>
</DHQarticle>

Note: the http://digitalhumanities.org/DHQ/namespace is required.

Large document structures

text

The main content of a DHQ article is contained within the text element. In a short article with no internal subdivisions, the text element may simply contain paragraphs and paragraph-level things. In longer articles containing multiple sections, the individual sections are encoded with div.

xtext

To embed a complete text within a DHQ article (for instance, a letter, a game, a poem that does not originate outside the text), use the xtext element. Its structure is the same as that of text. The xtext element may go anywhere within text or div.

div

The div element is used to represent subdivisions of the text. This element may nest inside itself as necessary to represent nesting subdivisions. A division typically begins with a head and includes one or more paragraphs and other paragraph-level structures.

Elements occurring at the top of sections

The text and div elements will typically start with a head element. In the case of text this is the title of the article. In the case of div it is the heading for the section. The head element may contain basic inline elements (see below). While the head element is optional, in a long article it is good practice to include headings for the divisions. Headings should not consist simply of a number, but should give the reader some idea of where the argument is going.

Epigraphs may appear at the top of a text or div, following the head. They are encoded with epigraph. The structure of an epigraph is a quotation (encoded with quote) followed by a citation (encoded with ptr or ref). A graphic or media object may be used in place of a quotation.

Inline elements

The DHQ markup language contains a number of inline or phrase-level elements which can be used to mark individual words and phrases within things like paragraphs, headings, and list items. The goal of this markup is two-fold: in the short term, to support processing (formatting, linking, etc.), and in the long term, higher forms of analysis.

The most commonly used elements of this type are:

foreign

The foreign element is used to encode words and phrases that are in a language other than that of the surrounding text. It carries a @lang attribute which should always be used to indicate what language its contents are in. The values for the @lang attribute are two-letter language codes as follows:

Arabic ar
Armenian hy
Basque eu
Bulgarian bg
Catalan ca
Chinese zh
Czech cs
Danish da
Dutch nl
Finnish fi
French fr
German de
Greek (modern) el
Greek (ancient) grc
Hebrew iw
Hindi hi
Hungarian hu
Icelandic is
Italian it
Japanese ja
Korean ko
Latin la
Norwegian no
Polish pl
Portuguese pt
Romanian ro
Russian ru
Sanskrit sa
Spanish es
Tibetan bo
Welsh cy

comprehensive list

emph

The emph element is used to encode words and phrases that are intended to be emphatic. It will typically display in italics or bold type.

hi

The hi element is a fall-back element used to apply necessary formatting that is not driven by one of the more semantically distinctive elements provided in the schema. For instance, it might be used to represent superscripts or subscripts, or to highlight a portion of a code sample to bring it to the reader's attention. Its @rend attribute takes the following values:

  • bold
  • italic
  • monospace
  • quotes
  • smcaps
  • subscript
  • superscript

No other values have any processing attached to them at this time.

called

The called element is used to represent words or phrases (other than quotations and direct speech) that should be presented in quotation marks for some reason. It does not specify the reason (and is thus equivalent to the TEI P5 <q> element), and encompasses the following:

  • ironic usage or "scare quotes": the "experts" tell us not to worry...
  • drawing attention to specific usages: the "dungeon" in this case is intended metaphorically...
  • identification of words as words: when we consider the word "phthisis" we are immediately struck...

name

The name element is used to encode names in the text: typically the names of persons, but also place names and the names of organizations as well. At present this element is not processed in any special way, but in the future we may provide a more detailed rationale for encoding certain types of names (e.g. the names of significant figures in digital humanities) to support analysis. Authors wishing to encode names are free to do so.

term

The term element is used for technical terms, which will typically be displayed with some kind of formatting (e.g. italics). In the future this element may serve as the basis for a DHQ glossary. Authors wishing to encode terms are free to do so.

title

The title element is used to encode the titles of works discussed or cited in a DHQ article. Within the bibl element it represents the title of the work being cited. In running prose, it should be used for any title mentioned.

The @rend attribute on title may take one of three values:

  • italic (for books, journals, and other standalone works such as works of e-literature
  • quotes (for journal articles, individual chapters or sections of larger works)
  • none (for conferences, interviews, book series, and other titles that require no formatting)

In addition, for articles that are discussing markup or programming, the following additional elements may be useful:

tag

The tag element is used to encode a complete XML tag (possibly including attributes). Content should be entered without the surrounding angle brackets.

att

The att element is used to encode the name of an XML attribute. It will typically be formatted in a way that signals this fact: e.g. using the conventional @ prefix.

val

The val element is used to encode an XML attribute value. It will typically be formatted in a way that signals this fact: e.g. by enclosing it within quotation marks.

class

The class element is used to encode an abstract class, for document or data modelling.

eg

The eg element is used to encode a code sample to be displayed as a block. By default its contents will be displayed in a manner similar to the HTML <pre> element, with white space and line breaks preserved, and in a fixed-width font. Within eg, the emph and hi elements are permitted to allow for highlighting of significant sections.

code

The code element is used to encode a short snippet of computer code, to be displayed inline. By default, its contents will typically be displayed in a fixed-width font.

gi

The gi element is used to encode the name of an XML element (without attributes). It will typically be formatted in a way that signals this fact: e.g. by enclosing it within angle brackets.

Prose, verse, and dialogue

Prose

Within the div element (or directly within text if there are no subdivisions), the text of the article is typically encoded as a series of prose paragraphs (the p element) intermixed with specific elements for lists, etc. (which are covered below). However, articles may include, or may consist of, material in genres other than prose.

Verse

Verse is encoded using the lg element, which represents a "line group" of one or more poetic lines.

Within lg, individual lines are encoded with l. A "line" in this context does not make significant semantic claims to being verse, but simply represents a line of text that operates as a unit that cannot be arbitrarily broken or relineated.

Dramatic dialogue

Within dramatic dialogue, individual speeches or utterances are encoded using the sp element. This element may contain one or more p, lg, or stage elements in any order. The stage element is used to represent stage directions.

Elements of discourse and presentation

Quotations

The quote element is used for quotations of material from outside the text (e.g. other articles, books, aphorisms, etc.).
  • The @rend attribute takes values "inline" and "block"; inline quotations will be formatted with quotation marks; block quotations will be set off as blocks.
  • When accompanied by a bibliographic citation (encoded as ptr or ref) it should be enclosed within a cit element to associate the quote and the citation
  • The bibliographic reference should be encoded with ptr unless its wording is distinctive and needs to be preserved; in that case, use ref. In both cases, the @target attribute points to a bibl in the listBibl at the end of the document.

The q element is used to represent direct speech.

Examples

The eg element is used to encode examples, which may be sample code or other material that needs to be presented with its formatting and line breaks intact.

Figures

The figure element is used to encode figures. It contains several child elements:
  • figDesc contains a brief description of the figure, to be used when the figure itself cannot be viewed
  • caption contains a brief caption which will be displayed below the figure
  • graphic contains a @url attribute which points to the relevant image file

Each figure element should carry a unique identifier, typically "figure01", "figure02", etc. for ease of prooofeading. The heading "Figure 1" (etc.) is automatically generated by the stylesheet, so no heading is required.

Lists

The list element is used to encode lists of all kinds. Its @type attribute takes the following values:
  • "ordered": generates numbered labels
  • "unordered": generates bullet labels
  • "gloss": formats the contents of label
  • "simple": no labels at all

Tables

The table element is used to encode tables. It contains a series of row elements, each of which contains one or more cell elements.

To identify a given row or cell as a label, use the @role attribute, with the value "label":

<row role="label"> <cell role="label">

Bibliographies

The items in the bibliography for the article will be encoded with bibl. In the future, most bibliographic items will be stored in the Biblio database and their data will be extracted and imported automatically into DHQ articles. However, all items will be represented by a "stub" bibl element. Some items may be unsuitable for inclusion in Biblio, either because they are too odd or because they are too article-specific (e.g. "Personal interview with the author, June 2008"). These will receive a full bibliographic entry in the article itself.

The bibl element carries an @id attribute, and its first child is label. The @id should take the form lastnameYYYY, where YYYY is the four-digit year of publication. If there is more than one item by the same author in the same year, then letters may be used for disambiguation: smith2008a, smith2008b, etc. Id values should always be all lower case and should not contain diacritics or punctuation. For multi-author items, use the last name of the first author only.

The contents of the label element should be the label that will be displayed in the text when the item is cited, e.g. Smith 2008. The label should be the last name(s) of the authors plus the four-digit year of publication, separated by a space. For example:

  • Smith 2008
  • Smith and Jones 2008
  • Smith et al. 2008
  • Sinclair-Smith 2008

External graphics and media

Internal cross-references, external linking

Both internal and external links are encoded using the same two elements, ptr and ref. Both of these elements carry a @target attribute which points either to an internal or external target.

If the target is internal, the value of @target is preceded by a hash mark (#), and points to the @id value of some other element in the same DHQ XML file.

If the target is external, the value of @target is a URL beginning with "http://".

ptr

The ptr element is an empty element that points to an internal or external target. Internally, it is used to point to a bibl element containing a bibliographic citation.

The @loc attribute can be used to provide a page number or page range (for printed sources) or a section or paragraph number (for online sources). Context is used to distinguish the two.

For display purposes, the information encoded with ptr will be presented either as a URL (in the case of external targets) or as the formatted label of a bibliographic reference (in the case of internal pointers to the bibliography). If there is a value for @loc, the display will include that information as well; for instance:

<ptr target="jones1999" loc="4"/>
would display as (Jones 1999, 4)
<ptr target="http://www.digitalhumanities.org"/>

will be presented as <http://www.digitalhumanities.org>.

ref

The ref element is very similar to ptr, but it must contain content. It is used in cases where the wording of the reference is significant and needs to be preserved. Its @target attribute behaves exactly like that of ptr (described above). The ref element does not carry a @loc attribute, since it is assumed that any specific page reference will be described in the element's content.

For example: <ref target="jones1999">See Jones 1999, note 14</ref> <ref target="http://www.wikipedia.org">Wikipedia</ref>

ref works just like an HTML link; the author provides the text that will be inserted into the anchor of the resulting hyperlink. This works alike for internal and external links. Because the text is not auto-generated, ref is more flexible than ptr, while being correspondingly more difficult to create and maintain.

ref may also be used without a target at all. In this case, no hyperlink will be generated. This is an escape hatch for occasions when, for example, one of bibl, ptr or ref must be provided to give the source of a citation. If the citation cannot be hyperlinked, a ref may be used without a target. This should, however, be an exceptional case.

-- JuliaFlanders - 23 Jun 2008 people who've edited this page: WendellPiez
last touched - 10 Nov 2007

Topic attachments
I Attachment Action Size Date Who Comment
htmlhtml DHQauthor-doc.html manage 309.8 K 23 Jun 2008 - 16:02 WendellPiez DHQauthor documentation (from RNG)
htmlhtml DHQauthor-header-doc.html manage 33.7 K 10 Nov 2007 - 19:28 WendellPiez documentation for header module of DHQauthor (from RNG)
htmlhtml DHQpublish-doc.html manage 9.7 K 10 Nov 2007 - 19:29 WendellPiez documentation for DHQpublish schema wrapper (with its own declarations)
Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r17 < r16 < r15 < r14 < r13 | More topic actions
 
DHQuarterly
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback