“Into the Depths of Data. Methods of Subject Specific
Content Retrieval”
Kurt
Gärtner
University of Trier
gaertnek@mailer.uni-marburg.de
Gisela
Minn
University of Trier
minn@uni-trier.de
Andrea
Rapp
University of Trier
rappand@uni-trier.de
Martin
Raspe
University of Trier
raspe@uni-trier.de
Ruth
Christmann
University of Trier
christma@uni-trier.de
Thomas
Schares
University of Trier
schares@uni-trier.de
In April 1998, the Competence Centre for Electronic
Retrieval and Publishing Techniques in the Humanities was founded
at the University of Trier. The use of international hard- and software
independent standards as SGML/XML is one of the main targets of the
Competence Centre in dealing with full-text digitization especially of
critical editions, dictionaries, and important reference works. Information
scientists and humanists from various disciplines are working closely
together in order to guarantee that the electronic resources developed at
the Centre meet with scientific requirements. Furthermore, the team aims at
complex and powerful retrieval mechanisms that can be handled easily by a
consistently user-oriented design of Graphical User Interfaces. An important
overall feature that has often been ignored by people working in the field
of digitization but is characteristic for the research done at the
Competence Centre is the close linking of software development to the
scholarly background of the material.
Examples for the development of user-oriented software in different projects
as well as for the embedding of the activities of the Competence Centre into
research done by universities and the German academies of sciences shall be
given in the following three papers on (A) the Rhine-Meuse Net, (B) the WIRE
project, and (C) the digitization of the Deutsche
Wörterbuch - a history, an art history, and a German language
and literature project.
(A) The conception of the so-called Rhine-Meuse Net originated from the
activities of a Collaborative Research Centre (= SFB 235) having examined
the history of a European core area from the Ancient World to the 19th
century. For more than 12 years, a large amount of valuable data has been
accumulated in multiple document types and formats. However, not all the
material was published, although, in many cases, even the unpublished
material is of high interest to researchers in and outside the context of
the SFB. Therefore, the existing data will now be encoded in order to ensure
its longevity and at the same time be entered into a database. Thus it will
be possible to use these data even though the funding of the SFB by the
Deutsche Forschungsgemeinschaft (= DFG) is due to cease in 2002.
(B) In contrast to the Rhine-Meuse Net dealing with material already
existing, WIRE, the Word and Image Retrieval Environment, is primarily intended as a tool for
scholars that need some support in building new (digital) collections of
scientifically relevant texts and images. The internet-based system allows
for an integration of texts, structured data, images, and bibliographies
into a relational database. As WIRE can be configured according to specific
needs, it does not only support the use by individual scholars but is also
well apt at being used by teams of scholars working together on a particular
object of research. Since various retrieval functions are implemented, WIRE
is not only useful for scholars who build new collections but also for those
who only want to browse through collections built by their colleagues.
(C) The retrodigitization of the Deutsche Wörterbuch
by Jacob and Wilhelm Grimm has to be seen in the broader context of
dictionary making at the University of Trier. When work on a new Middle High
German dictionary was started in 1994, lexicographers wished to have access
to as many electronic texts and dictionaries as possible. However, to fully
exploit the advantages of an electronic dictionary, one does not only need a
fairly thorough markup of the entries but also a highly comfortable way to
present the dictionary on screen and thus make it readable - just imagine
that several entries of the Deutsche Wörterbuch
cover more than 300 columns in print! The demonstration of the CD-ROM
prototype of the Deutsche Wörterbuch might serve as
a good example for how in-depth retrieval carried out thoroughly contributes
to the development of software that allows accessing the dictionary data in
new ways.
It will be very interesting to see how new possibilities to access data of
various provenance and of multiple kinds will lead to new questions, new
methods, and new insights into the digitally edited source material.
Title A: The Information and Reference Network for the History of the Rhine-Meuse Area. An Area-Oriented Subject Information System for the Humanities
Dr. Gisela Minn Dr. Andrea Rapp1. General and Institutional Preconditions
Apart from the parameter "time", the parameter "area" has in the past few years received increased attention as a fundamental category of human existence. Particularly regions as middle-sized units of area have established themselves in a multitude of disciplines as ideal units for investigation. In the Rhine-Meuse Net, the regional area is made use of as a central access and ordering category for the integration of research results that are far apart with regard to time and differ in document type, methods, and topic. The international research compound of the Collaborative Research Centre "Between the Meuse and the Rhine. Connections, Encounters, and Conflicts in a European Core Area from the Ancient World to the 19th Century" (SFB 235) has acquired a large amount of valuable and, with regard to document types, very heterogeneous data, that are not only concerned with a common area of investigation but are also closely connected with regard to content. This complex amount of data forms the nucleus of a projected database serving as a reference system for European regional history. The project is being funded by the Deutsche Forschungsgemeinschaft (DFG) since 1st November 2001. Apart from the historical field with all its specialist research interests, there are involved related disciplines such as art history, archaeology, history of law, and history of German and Roman languages; they all partake in the research compound, as well as various national and international, university and non-university cooperation partners. Therefore the project aims firstly to take into account the changed needs for information of a growing international research community and secondly to lay the grounds for European research in history beyond the borders of nation-states. For this the network is particularly apt, as it opens up a European core area at the intersection between Western and Middle Europe from ancient times up to the present, and it will present the results of international researchcollaboration. The long-term data-conservation and its platform-independent use is ensured by a consistent application of international standards on the basis of SGML/XML.2. Content-Related Principles of the Network
The realization of the network starts at two core units: Firstly, the annotated bibliography of the whole publication output of the SFB (about 900 nos.) will be edited, including all the unpublished dissertations and theses which document the whole scope of research. Due to the area-oriented interest of the SFB, cartographical methods and techniques of representation belong to the most important research procedures. Thus secondly, an electronic archive of maps was built (of about 500 items) that will be linked to the bibliography. By these two core units that are representative for the whole scope of the network, thesauri of places, persons, and subjects will be accumulated and structured hierarchically for an in-depth disclosure of the data. They form the basic framework for a further indexing of the data and will be extended to a dynamic research tool that will become more extensive and complex with the integration of each new reference unit. A sophisticated system of indexes and metadata will guarantee the linking of these units.3. Variety of Document Types
The document types representing the cultural heritage as well as the results of scientific research in digital form are very heterogeneous: texts, maps, pictures, plans, images, tables, archival finding-aids and repository guides, indices, bibliographies etc. At the same time, these document types are very closely related as regards content in a very complex and multidirectional manner. In the Humanities especially, far-reaching methodical and content-related impulses are to be expected by an explicit representation of these relations. Moreover, the general approach requires interdisciplinary and comparative studies, new access to digital resources, and the development of cartographic methods for analyzation and documentation. Therefore, we aim at a concatenation, retrieval and integration of these digital reference-units of different document types in a reference compound. The following document types form the database of the network and have to be opened up and interlinked:- Units of information referring to area and region such as local registers and catalogues, complex place lexica, single maps and series of maps, annotated atlasses that combine maps, place catalogues, and commentaries.
- Units of information referring to persons and institutions such as registers and catalogues of persons, prosopographies and biograms of persons, catalogues, lexica, tables, and lists of institutions.
- Units of information that combine information on texts and pictures such as text or picture catalogues, visualizations, and reconstructions.
- Units of information that represent sources, archival finding-aids, and instruments for the documentation of research such as special bibliographies, region-related source editions of different genres, and repository guides, literature and review service, documentations of research.