Writing about It: Documentation and Humanities Computing

“Writing about It: Documentation and Humanities Computing”

Julia Flanders Brown University, USA

Documentation is arguably the most important part of a humanities computing project's long-term existence, in two senses. First, in the sense that without it a project cannot maintain continuity and consistency; and second, in the sense that without it a project cannot communicate its methods to other members of the larger community, offering them for critique if in need of improvement, and making them known if worthy of emulation. Without documentation a project is effectively without "self-knowledge", by which I mean the information which the project itself as an entity needs to know in order to survive and to perpetuate itself. It is crucial to distinguish here between the knowledge belonging to individual participants in the project's work, and the project itself. What individuals know is not necessarily accessible to other project members, and this knowledge is taken away with them when they leave. What the project knows, on the other hand, is an explicit part of its internal and public existence, with several important consequences. It can be found without recourse to private knowledge; it does not depend on any individual and is not vulnerable to changes in staff. And finally, this kind of self-knowledge has a particular rhetorical status within the project and as a public expression of identity, in that it has an understood authority: it is explicitly endorsed by the project and its truth or applicability are not in question. This is as much as to say that we need to take documentation seriously not only as a practical matter but also as a question of theory. Producers of documentation must negotiate between different rhetorical scenarios, from the didactic, developmental narrative of a training manual to the encyclopedic granularity of a reference guide. Treating this negotiation as a rhetorical problem seems to locate it in the writing itself. But in fact we can see, once we look more closely, that documentation is a specialized kind of data or content to be purveyed, and that these different scenarios amount to different approaches to data retrieval which must be accommodated. This complexity is compounded by the fact that we are concerned here with humanities computing documentation, to be used by humanists, with humanistic expectations about the relationship between that which is documentable - reproducible, deterministic, normative - and that which is subject to independent judgment and expertise. Finally, the challenges documentation poses - its peculiar embodiment of the Arnoldian tension between "Hebraizing" and "Hellenizing", between doing and thinking - also resonate with issues central to humanities computing. Documentation both embodies a project's self-reflection and calls it to a close, requires that reflection conclude in order that action may commence. And yet the normative statements which documentation strives to offer about a project's practice are inevitably, in a project of any scope, the occasion for discovering further issues which have not yet been decided. The perpetually unfinished work of documentation thus holds the project in a state of dynamic suspension, always trying to resolve issues and get back to work, always trying to finish work so that it can be documented. Some specific points are worth noting here, to be discussed at greater length in the finished paper. First of all is the issue already mentioned of the relationship between training documentation and reference documentation. The most apparent difference between these two forms is the kind of text being produced: in the first case, a developmental narrative which takes the trainee through the project's methods from basic to advanced in a way which maximizes memorability and comprehensibility; in the second case, a reference work which provides instant access to particular topics discussed as independent items with a high degree of granularity. These two modes are so different that it is often extremely difficult to convert from one to the other, increasingly so in proportion to how successfully the given mode has been realized. They also require quite different kinds of infrastructure to make them useful in the work environment: for instance, the reference model works best when accompanied by good metadata and a good retrieval system. It also requires attention to the level of granularity at which individual instructions are conceptualized, and to how related instructions will be identified and aggregated. Producing documentation in either mode requires that one conceptualize the consumer's needs and habits in detail, and this raises a second point which has already been mentioned above. What role do humanists allot to documentation, broadly considered? This question points to a larger issue for humanities computing, namely the role of judgment and interpretation in the creation of humanities data, and in what areas the exercise of these things is appropriate. If the documentation is framed for a workplace in which comparatively unskilled workers require explicit instructions on making absolutely consistent choices, then the documentation itself needs to be equally determinate, authoritative, and exhaustive in the way it communicates. It must anticipate every alternative and leave no opening for variation; from the retrieval standpoint, it must ensure that the correct information is always discovered no matter how inept or tangential the search strategy. In short, it must make the work resemble as little as possible the kind of intellectual environment envisioned by a liberal humanities viewpoint. On the other hand, if the intention is to guide the worker in exercising judgment - that is, to indicate the principles to be applied rather than the act to be performed - then the documentation will necessarily imagine its readers as part of an ongoing investigation into the project's methods and standards. To give these reflections some concreteness, the finished paper will also consider an actual documentation system currently in use in a major text encoding project, which has evolved over a period of seven years and is used both for training and reference. Although developed for a particular set of needs and by no means perfect, this system and the process of its development offer an example which may be of use to people currently designing or redesigning documentation systems of their own.