[en] Cuneiform Stroke Recognition and Vectorization in 2D ImagesAdéla Hamplová, Czech University of Life Sciences Prague; Avital Romach, Yale University; Josef Pavlíček, Czech University of Life Sciences Prague; Arnošt Veselý, Czech University of Life Sciences Prague; Martin Čejka, Czech University of Life Sciences Prague; David Franc, Czech University of Life Sciences Prague; Shai Gordin, Ariel University; Open University of Israel
Abstract
[en]
A vital part of the publication process of ancient cuneiform tablets is creating hand-copies, which are 2D line art representations of
the 3D cuneiform clay tablets, created manually by scholars. This research provides an innovative method using Convolutional Neural
Networks (CNNs) to identify strokes, the constituent parts of cuneiform characters, and display them as vectors — semi-automatically
creating cuneiform hand-copies. This is a major step in optical character recognition (OCR) for cuneiform texts, which would contribute
significantly to their digitization and create efficient tools for dealing with the unique challenges of Mesopotamian cultural heritage.
Our research has resulted in the successful identification of horizontal strokes in 2D images of cuneiform tablets, some of them from
very different periods, separated by hundreds of years from each other. With the Detecto algorithm, we achieved an F-measure of 81.7%
and an accuracy of 90.5%. The data and code of the project are available on GitHub.
[en] Exploring Combinatorial Methods to Produce Sonnets: An Overview of the Oupoco
ProjectFrédérique Mélanie-Becquet, LATTICE (CNRS & ENS/PSL & Univ. Sorbonne Nouvelle); Clément Plancq, LATTICE (CNRS & ENS/PSL & Univ. Sorbonne Nouvelle); Claude Grunspan, LATTICE (CNRS & ENS/PSL & Univ. Sorbonne Nouvelle); Mylène Maignant, LATTICE (CNRS & ENS/PSL & Univ. Sorbonne Nouvelle); Matthieu Raffard, Atelier Raffard-Roussel; Mathilde Roussel, Atelier Raffard-Roussel; Fiammetta Ghedini, RIVA Illustrations; Thierry Poibeau, LATTICE (CNRS & ENS/PSL & Univ. Sorbonne Nouvelle)
Abstract
[en]
In this paper, we describe Oupoco (l’Ouvroir de Poésie Combinatoire), a system producing new sonnets by
recombining lines of poetry from existing sonnets, following an idea that Queneau described in his
book Cent Mille Milliards de poèmes (A Hundred Thousand Billion
Poems, 1961). We first give the rationale of the project and review past experiments in poetry generation
using combinatorial methods. We then demonstrate different outputs of our implementation (a Web site, a Twitter bot
and a specifically developed device, called the Boîte à poésie) based on
a corpus of 19th century French poetry. We describe how this project was an opportunity to work with artists and
reach a new audience through the Boîte à poésie, and also through a
video clip that frequently served as an introduction to the project. Our goal is to revive people’s interest in
poetry by giving access to automatically produced sonnets through original and entertaining channels and
devices.
[en] Recognition and Analysis of the Proceedings of the Greek Parliament after WWIIEpameinondas-Konstantinos Barmpounis, Athens University of Economics and Business; John Pavlopoulos, Athens University of Economics and Business; Panos Louridas, Athens University of Economics and Business; Dritsa Konstantina, Athens University of Economics and Business
Abstract
[en]
The first post-WWII years in Greece were devastating. After a brutal Nazi occupation, the Greek Civil War (1946–1949) erupted. It
wrecked the economy and the country's infrastructure and altered politics and the social fabric for decades to come. A study of the
issues discussed in the Greek Parliament during the tense and unstable first years of the conflict (1946-1947) could
facilitate our understanding of the society at the time. An obstacle is that parliament proceedings are publicly available in a
machine-readable form beginning in 1989; before that only scanned images of the original records exist. We show that text recognition
followed by natural language processing can unlock this corpus for historical research. Using Transkribus, we trained a text recogniser
(1.5% CER) that we applied to 3,156 images from 1946 and 1947. As low-quality recognition is inevitable, we trained a language model on the
transcribed text and applied it to recognised text, discarding records with high average cross-entropy. Using information extraction
techniques, we sampled speeches that were applauded and we introduced the first quantification of issues that were thus received.
All our resources are made available at https://zenodo.org/record/8302990.
[en] Cross-codex Learning for Reliable Scribe Identification in Medieval
ManuscriptsJulius Weißmann, Media and Digital Technologies, St. Pölten University of Applied Sciences; Markus Seidl, Media and Digital Technologies, St. Pölten University of Applied Sciences; Anya Dietrich, MEG Unit, Brain Imaging Center; Martin Haltrich, Library, Klosterneuburg Abbey
Abstract
[en]
Historic scribe identification is a substantial task for obtaining information about the past. Uniform script
styles, such as the Carolingian minuscule, make it a difficult task for classification to focus on meaningful
features. Therefore, we demonstrate in this paper the importance of cross-codex training data for CNN based
text-independent off-line scribe identification, to overcome codex dependent overfitting. We report three main
findings: First, we found that preprocessing with masked grayscale images instead of RGB images clearly
increased the F1-score of the classification results. Second, we trained different neural networks on our
complex data, validating time and accuracy differences in order to define the most reliable network
architecture. With AlexNet, the network with the best trade-off between F1-score and time, we achieved for
individual classes F1-scores of up to 0,96 on line level and up to 1.0 on page level in classification. Third,
we could replicate the finding that the CNN output can be further improved by implementing a reject option,
giving more stable results. We present the results on our large scale open source dataset – the Codex
Claustroneoburgensis database (CCl-DB) – containing a significant number of writings from different scribes in
several codices. We demonstrate for the first time on a dataset with such a variety of codices that
paleographic decisions can be reproduced automatically and precisely with CNNs. This gives manifold new and
fast possibilities for paleographers to gain insights into unlabeled material, but also to develop further
hypotheses.
[en] Fingerprints of British Book History: A Feminist Labor History of EEBOAna Quiring, University of Missouri — St. Louis
Abstract
[en]
In this essay, I give a labor history of the commonly used database Early English Books Online. EEBO began its
life as a mirofilm archive produced beginning in 1940: a massive book-copying project was undertaken during World
War II to protect rare books from German bombs. These reproductions were made largely by unnamed woman archivists
and led by a team of woman photographers, academics, and secretaries. In this essay, I draw on existing theories
and histories of EEBO while highlighting these women's work, which manifests in the photographs through the shadows
of the fingertips they used to pin down the fragile books. This archive of reproductive labor exemplifies the
artificial divide between auteur and stenographer, artist and secretary, that animated many of the early twentieth
century’s avant-garde movements, including ones in which EEBO photographers were intimately involved. The fingers
represent the custodial labor of women working across artistic and administrative modes during World War II and the
decades that followed.
[en] An Annotated Multilingual Dataset to Study Modality in the GospelsHelena Bermúdez-Sabel, University of Neuchâtel; Francesca Dell'Oro, University of Neuchâtel; Swiss National Science Foundation
Abstract
[en]
This paper presents a number of resources for examining the expression of modality in the Gospels. The main resource is an XML-TEI
dataset that contains the linguistic annotation of a predefined list of potentially modal markers in both Ancient Greek and Latin. When
one of these markers conveys a modal meaning, each constituent of the modal passage (i.e., the marker, its scope, and the modal relation
between them) is annotated with a great level of detail through several linguistic features. One of the original features of our dataset
is the implementation of a cross-referencing system that enables the alignment of the potentially modal markers of both languages. To
facilitate the exploitation of our data by those unfamiliar with XML technologies, we also provide summary tables with the most relevant
features of the annotation. In addition, a program written in Apache Ant allows any user to generate the summary sheets and to align
modal passages in both Ancient Greek and Latin with any other language available in the Multilingual Bible Parallel
Corpus . This contribution presents the details of the semantic annotation and
its formalization, and how our resources may be exploited within semantics and translation studies. In addition, the encoding strategies
implemented are relevant for other projects dealing with the combination of multiple layers of (linguistic) annotation and/or tackling
the development of parallel corpora.
[en] Building an Interface as an Argument? The Case Study of Untangling the CordelElina Leblanc, University of Geneva
Abstract
[en]
The project Untangling the cordel (2020-2024) aims at studying and promoting a collection
of 19th-century Spanish chapbooks via a digital library (DL). This resource is composed of
digital scholarly editions of chapbooks and of a catalogue of woodcuts, which decorate the
first page of almost all the documents. In this paper, after presenting the project’s
editorial workflow, we focus our attention of the way we design the interface of this DL
to represent the different facets of chapbooks (document, text and illustrations). For
that, we have chosen to follow a method, proposed by Andrews and van Zundert in 2018, that
consider an interface as an argument editors made about their data and their digital
editions. Through this case study, we demonstrate the feasibility of this approach, where
each component of an interface contributes to the scientific discourse a project made
about its goals and its perception of digital editing. We also stress the impact of this
method on user experience and on a project itself, as another way to see data and their
modelling.
[en] Tractable Tensions: A Review of Digital Humanities: Knowledge and Critique in a
Digital Age by David M. Berry and Anders FagerjordOnyekachi Henry Ibekwe, University of Nigeria
Abstract
[en]
This book analyzes the tensions that arise when the numerical, predictive exactitude of digital computing embraces
the humanities. Such tensions are thrown into frequent and sharper relief, whereas the sciences
proceed on the basis of skepticism, and the humanities proceed on the basis of interpretation, criticism and
dialectical engagement with given issues. In addition, the pervasive nature of digital computing heightens the risk
of over-shadowing the humanities. The authors argue for a critical turn in digital humanities.
[en] Gamer Trouble: A ReviewHimadri Agarwal, University of Maryland
Abstract
[en]
This review of Amanda Phillips’ book Gamer Trouble traces the contents of the book,
notes its resonances with larger scholarly discussions, and considers what it says about gaming and game culture
overall. It also provides some perspectives on the book’s importance and appeal.