Notes
[1] Funded by
the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), project no. 382880410.
[2] Cf., for example: [Luebeck 1872], [Hagendahl 1958, 91–328], [Godel 1964], [Voß 1969], [Burzacchini 1975], [Burzacchini 1978], [Jakobi 2006], [Cain 2013], [Adkin 2011], and [Feichtinger 2021]. A synopsis of Jerome’s own statements on the Christian use of classical literature are
collected by Mohr (2007). [3] On the term
short citation, which is used throughout the paper, see Section
IV. [4] Analytical approaches to the
digital detection of intertextuality phenomena have been presented by Bamman and Crane (2008), Hohl Trillini and Quassdorf (2010), and Schubert and Heyer (2010) with the “Citationgraph” of the eAqua project, Büchler et al. (2014) with the “Tracer” of the eTrap project, and Manca et al. (2011) within the project Musisque Deoque. The approach of Coffee et al. (2013),
which focuses on two-word loci similes, pursues, within the Tesserae
project, an operationalization approach specifically pertaining to classical philology [Burns 2017],
[Diddams and Gawley 2017], [Coffee et al 2020]. However, problems arise in the concrete
application of the filter settings (lack of transferability of once calibrated settings, lack of explicability and
transparency of the selection and insufficient adjustment possibilities), so that this project environment is not
used for the development of a differentiated classification system of citations. [5] The present project,
unlike Tesserae, does not rely on pre-existing corpora. At the same time this
circumstance ensures that the criteria arising from the demands for transparency in the filtering process are
satisfied (see Section I, n. 4). [6] Although, in theory, the Copyright Act attempts to enable precisely this with regulations
on text and data mining for the purposes of scientific research, in practice — in our experience — negotiations
remain necessary.
[7] In the first
case, the texts of the relevant classical authors were made available directly by the publisher without further
negotiations within the framework of an existing license. In the second case, the situation was more difficult
because the question of which institution holds the rights to the electronic (!) texts remained unclear until the
end, even for the publishers involved. In addition, the cost of making the texts available within the existing
license offer was estimated to be very high, which made negotiations about the cost framework
necessary.
[8] The
copyright exceptions for text-mining methods (especially for research purposes) are limited by regulations
surrounding the publication of the texts within the context of research results.
[9] For this step in particular, it
would be important for publishers to provide texts in a systematic and consistent manner in order to avoid
time-consuming adaptation processes for singular and divergent text formats.
[10] The individual
analysis steps, so far, are stored in individual scripts. The aim is to combine the scripts into a
pipeline, not least for the sake of user-friendliness. It is our intention to publish the code base
on the GitHub development platform.
[11] Thus, the citation pair Verg. ecl. 4,60–61
(Incipe, parue puer, risu cognoscere
matrem: / matri longa decem tulerunt fastidia
menses – Begin, baby boy, to recognize your mother with a smile: ten months have brought your mother
long travail; all translations for Virgil’s works are taken from Fairclough’s two-volume Loeb Classical Library edition revised by Goold, 1999–2000) and Hier.
epist. 130,16,3 (putasne, frustra infans paruulus et qui
uix matrem risu et uultus hilaritate cognoscat, qui nec boni aliquid fecit nec mali,
daemone corripitur, morbo opprimitur regio et ea sustinet, quae uidemus inpios homines non sustinere et sustinere
deo seruientes? – Do you think a very young child, one who barely recognizes its mother by her
laughter and her face by her joy, who has done neither good nor bad, is seized by the devil for no reason, or
prostrated by jaundice, or endures such things for no reason, which we see godless people do not suffer, but
those who serve God do?; all translations for Jerome’s letters are by the authors) contains exactly two
shared words (risu – risu; matrem – matrem). Other similar word forms such as declined, conjugated, and
derived forms of the same lemma (matri – matrem; cognoscere – cognoscat; parue – paruulus) as well as synonyms (puer – infans) although particularly relevant for the hermeneutical reading process, initially are undetected in
the digital detection process.
[12] Cf. on the establishing of optimization rules: [Revellio 2022, 128–135]. [14] In this routine, the matching process with Simillima replaces Tesserae in its basic function of text
comparison, [Revellio 2022, 123]. Nevertheless, the process is structurally clearly comparable
to Tesserae. [15] During the algorithmic procedure the end of a sentence is indicated by a period, question
mark, and exclamation mark. Of course, the modern punctuation is not an unproblematic criterion, whose employment
and effects on the results of the study must always be reflected; [Revellio 2022, 116, n. 384]. Separation of sentences is also performed when a colon and quotation marks are combined, i.e., at the
beginning of quoted speech. If, on the other hand, a period, question mark, or exclamation mark occur within a
parenthesis (marked, for example, by brackets), separation of sentences is suppressed in order to keep the
parenthesis and the surrounding sentence together as one unit. However, insertions marked by dashes can only be
addressed programmatically in a few cases, because dashes can also occur individually (whereas brackets always
occur in pairs). Therefore, the separation of sentences is not suppressed for an insertion marked by dashes with
a period, question mark, or exclamation mark. This can be considered unproblematic, since insertions with such
punctuation marks are usually long and it is therefore unlikely that quote-constituting split word material can
be found around this insertion (distance criterion). Sentences of direct speech which are interrupted by
speech-introducing words such as inquit, ait, dixit, or dicit are also held together. The fact that only the most
common speech introductions can be considered here is negligible, because it is rather unlikely that a quotation
within a literal speech is interrupted (if at all) by an insertion other than these standardized variants.
Additionally, in order to improve the algorithm, periods as part of abbreviations should be treated as exceptions
during the separation of sentences. These exceptions have so far not been implemented, as Georgics, Eclogues, Aeneid and Jerome’s
letters do not contain any abbreviations. Naturally, with a view to other corpora (e.g., Cicero), this is
necessary and proposed. Moreover, the sentence-tokenization feature of the Classical
Language Toolkit (CLTK) is currently being tested as an alternative. [16] In
the case of poetry, the line in which a sentence “begins” is taken as the text reference.
[17] On this filter approach, [Revellio 2022, 158]. By focusing on the number of consecutive shared words and disregarding their actual
order, which may be different in the target and source text, there are indeed false (i.e., irrelevant) matches,
but only in small numbers. For the Georgica, Eclogae
and Aeneid, for example, the Complura filter yields 76
correctly identified matches and only five “false” matches, which can be rejected at first glance with
little manual effort. [18] This is due to the removal of stopwords, see Step 3. [19] All translations for Virgil’s works are taken from Fairclough’s two-volume
Loeb Classical Library edition revised by Goold,
1999–2000). [20] All translations for Jerome’s
letters are by the authors.
[21] On
stoplists, see Step 3. [22] On the deduction of these criteria and discussion
thereof, [Revellio 2022, 136–145]. [23] On the application of such stoplists, [Revellio 2022, 156–159]. In addition to
the standard list of the Perseus project, seven corpus-based stoplists consisting of
the most frequent words are used. Despite the combination of Virgil’s three works with Jerome’s Epistulae, the stoplists show a strong dominance of Jeromean word material with respect to
the autosemantics contained by the most-frequent-words approach. For example, in the first stoplist containing
the words occurring at least 250 times, the word Christi (445 times) appears more
frequently than many synsemantics such as quoque (too, 441 times) or
tu (you, 433 times). For comparison, the word Aeneas only appears in the list of words that occur at least 150 times. Approaches of a differentiated
weighting of the two texts or of taking into account the frequency of a word within the text (cf. Zou et al.,
2006) by way of which such disproportions could be corrected have been considered here but could not yet be
implemented. [24] The distance of 2 was preferred to the distance of 3 based on previous observations [Revellio 2022, 144]. On the implementation, see [Revellio 2022, 156–159]. [25] So far, for Virgil’s Eclogae, Georgica, and Aeneid, stoplist 6, which contains words with more than 100
occurrences in the Eclogae, Georgica, Aeneid, and Jerome’s Epistulae, has turned out to be optimal
in this regard.
[26] The filter searches the sentences of the target text for authorial
information (e.g., Uergilius, poeta, Aeneis, etc.) and returns all matches for the respective sentence if a match is found. However, if the
filter is applied immediately after matching (Simillima), Uergilius alone for Hier. epist. 121,10,5 already
produces 382 matches with various passages of Virgil, of which, except for Verg. georg. 2,256, all are the result of only two synsemantics such as et or in and cannot be considered as pretexts. Therefore, the
application after removal of the stoplist of the Perseus project is advisable in
order to exclude as many as possible, but no author-specific synsemantics. If corpus-based stoplists were
applied, just quotations, whose words fall under the most frequent words of a certain author, would no longer be
detectable — even if this author, his work, or the like is mentioned explicitly in the environment.
[27] After applying the filter,
pairs are kept where the split word material is separated by the same number of “soft” punctuation marks. In
this case, there could be a longer quotation containing, for instance, a subordinate clause.
[28] With
et (underlined), the pair contains, strictly speaking, another word shared by both
texts (two instances in Virgil, three in Jerome). As a frequent word (and synsemantic), however, the conjunction
does not factor into the evaluation of the quotation (see Step 6). [29] The fact that the shared words do not belong together syntactically in Virgil either
(pluuia ingenti is ablative; sata laeta is the direct
object of diluit) is irrelevant with regard to the separation of the word material in
the target text.
[30] The numbers refer to returned
matches after stoplist 6 is applied. The filter is currently being tested at an early stage. It also can only be
applied to potential new quotations that have exactly two shared words after application of the
stoplists. Beyond an extension of the filter’s general capabilities, there is potential for further optimization
in two respects. Currently, only an unequal number of punctuation marks between the shared words is
used as a decisive criterion (i.e., a match is rejected as soon as one of the two texts has more commas than the
other). In addition, it would also be possible to set a maximum limit for the number of allowed “soft”
punctuation marks between the shared words since the plausibility of the finding being an actual
quotation decreases even with the same high number of intervening “soft” punctuation marks. Second, there is
a need to optimize the filter also with respect to cases where one of the two shared words is
contained more than once in a phrase. For example, when applying the filter to Verg. ecl. 10,52–54 (certum est in siluis inter spelaea ferarum /
malle pati tenerisque meos incidere amores / arboribus: crescent illae, crescetis, amores – Well I know that in the woods, amid wild
beasts’ dens, it is better to suffer and carve my love on the young trees. They will grow, and you, my love, will
grow with them; shared words: meos; amores [2x]) two text segments are examined for their “soft” punctuation marks: incidere between meos and the first instance of amores as well as incidere amores / arboribus: crescent illae, crescetis,
between meos and the second instance of amores. This
results in 0 or 2 commas and 0 or 1 colon. At the moment, this finding is retained as soon as the text of Jerome
also contains either 0 or 2 commas and either 0 or 1 colon. Thus the find is only eliminated if both combinations
do not match the text material of Jerome.
[31] On the
development of the historical text-reuse grammar, [Revellio 2022, 146–152] as well as [Revellio 2022, 159–160] on the deduction and implementation of the HTRG filter. [32] “Potentielle Funde des computergestützten Textvergleichs, deren übereinstimmendes Wortmaterial der
Wortartenstruktur nach aus mindestens zwei Nomina oder zwei Verben sowie aus der Kombination dieser beiden
Wortarten besteht, [sind] besonders prädestiniert dafür, eine sinnproduzierende Text-Text-Beziehung zu
etablieren.” On the discussion of this criterion, [Revellio 2022, 146–152]. [33] On the implementation of the filter, [Revellio 2022, 159–160]. [34] The complete list of all Virgilian quotations in Jerome
will be the subject of a further publication.
[35] These include words such
as a, ab, ac, ad, ante, atque, autem, caelo, centum, contra, corpore, cui, cum, de, dies, dum,
ea, enim, est, et, etiam, ex, frater, haec, hanc, hic,
his, hoc, hominum, iam, illa, ille, in, inter, ipse, ita,
manu, me, mihi, mundi, ne, nec, neque, nihil, non, nos,
nunc, oculos, omnes, omnia, per, possumus, post, procul, quae, quam, quem, qui, quibus, quid, quidquid, quis, quo, quorum, sanguine, se, sed, si,
sint, sit, siue, sua, sub, sunt, super, te, terra, uerbo,
uix, unde, ut.
[36] For the Eclogues, the following numbers result: Of 89 potential new quotations (stoplist 6 applied), 87 have
exactly two shared words; of these, 46 share additional words (53%). Similarly in the case of Georgica: Of 167 potential citation pairs (stoplist 6 applied), 153 have exactly two
shared words; of these, 122 share further material (80%). Finally, the following numbers are
obtained for the Aeneid: Of 606 potential new quotations (stoplist 6 applied), 583
have exactly two shared words; of these, 414 share further words (71%).
[37] In relation to the list of potential new
quotations after application of stoplist 6.
[38] The same problem arises in Pliny the Younger; in
ep. 5,10 to Suetonius he uses the words rumpe iam moras with the goal of persuading his addressee to publish a literary work. Cf.
Schwerdtner: “Da Plinius’ rumpe iam moras nicht mit letzter
Sicherheit einer einzigen Quelle zugewiesen werden kann, ist es problematisch, die Stelle als Vergilzitat
bezeichnen zu wollen, auch wenn Plinius sonst bevorzugt aus Vergil zitiert und bereits in ep. 5,8 auf das Georgicaproömium zurückgreift” [Schwerdtner 2015, 250–255, esp. p. 254]. [39] Thus, the total number of identical word forms would amount to three; the quotation would
then not be listed among those quotations of Virgil in Jerome with exactly two identical words. On this problem
of the textual tradition, see 3. Optimization strategy. [40] The same
phenomenon is even more noticeable in another returned match consisting of Verg. georg. 1,401–403 and Hier. epist.
116,5,2. Again, there are the same two shared words culmine and
summo. However, in addition to the semantically different prepositions (de vs. in) as well as the attribute genitive (auctoritatis) activating the figurative meaning, the expression is even more complex because of a second
attribute (caelesti).
Verg. georg. 1,401–403 |
Hier. epist. 116,5,2 |
at nebulae magis ima petunt campoque recumbunt, / solis et occasum seruans de culmine summo / nequiquam seros exercet noctua cantus |
immo uero sanctam scripturam in summo et caelesti auctoritatis
culmine conlocatam de ueritate eius certus ac securus
legam |
But the mists are prone to seek the valleys, and rest on the plain, and the owl, as she watches the
sunset from some high peak, vainly plies her evening song |
but of course I should read the holy scriptures, set on the highest, heavenly summit of validity, firmly
convinced of their truth |
Table 7.
In this example,
culmine and
summo are highlighted in
both texts. Additionally,
de is underlined in the text from Virgil and
in, along with the phrase
et caelesti auctoritatis, is also
underlined in the text from Jerome.
[41] Virgil:
preposition + propositional object + attribute adjective; Jerome: preposition + attribute adjective I [+
conjunction] + attribute adjective II + attribute genitive + propositional object.
[42] The status as an intentional quotation of the
Aeneid cannot be questioned because of the references to Uergilius (Hier. epist. 126,2,2) and the poeta eloquentissimus (129,4,3).
[43] Another
occurrence of the phrase within the works of Jerome can be found in In Isaeam
5,21,13–17. Hagendahl (1958, p. 230 n. 4) already points to the fact that
uagantes is present in a number of late manuscripts of the Aeneid. Indeed, the critical apparatus of Conte’s Teubner edition lists
uagantes as another reading instead of furentes, which can,
besides Jerome, also be traced back to two Codices Bernenses from the 9th and 10th
century. [45] Variants are considered
comprehensively, but more extensive editorial changes such as omissions, conjectures, etc. cannot be
addressed.
[46] Cf., for example,
Ov. ars 3,511 and Sen. Thy. 81.
Works Cited
Adkin 2011 Adkin, N. (2011) “Catullus in Jerome? Notes on the
Cohortatoria de paenitentia ad Sabinianum (Epist. 147)”, VChr, 65 (4), pp.
108–424.
Bamman and Crane 2008 Bamman, D. and Crane, G. (2008) “The
logic and discovery of textual allusion”,
Proceedings of the second workshop on
language technology for cultural heritage data (LaTeCH 2008). Available at:
http://hdl.handle.net/10427/42685 (Accessed: 13 October
2022).
Büchler et al 2014 Büchler, M., Burns, P. R., Müller, M.,
Franzini, E. and Franzini, G. (2014). “Towards a Historical Text Re-use detection” in
Biemann, C. and Mehler, A. (eds.) Text mining. From ontology learning to automated text
processing applications: Festschrift in honor of Gerhard Heyer. Cham: Springer, pp. 221–238.
Burns 2017 Burns, P. J. (2017) “Measuring and mapping
intergeneric allusion in Latin poetry using Tesserae”,
Journal of Data Mining and
Digital Humanities, pp. 1–15. Available at:
https://jdmdh.episciences.org/3821 (Accessed: 13 October 2022).
Burzacchini 1975 Burzacchini, G. (1975) “Note sulla
presenza di Persio in Girolamo”, GIF, 27, pp. 50–72.
Burzacchini 1978 Burzacchini, G. (1978) “Marginalia
hieronymiana”, BstudLat, 8, pp. 270–272.
Cain 2013 Cain, A. (2013) “Two allusions to Terence, Eunuchus 579
in Jerome”, CQ, 63 (1), pp. 407–412.
Coffee et al 2013 Coffee, N., Koenig, J.-P., Poornima, S., Forstall, C. W.,
Ossewaarde, R. and Jacobson, S. (2013). “The Tesserae Project. Intertextual analysis of Latin
poetry”, Literary and Linguistic Computing, 28 (2), pp. 221–228.
Coffee et al 2020 Coffee, N., Forstall, C., Galli Milić, L. and Nelis, D.
(2020) Intertextuality in Flavian Epic Poetry. Berlin/Boston: Walter de
Gruyter.
Diddams and Gawley 2017 Diddams, C. and Gawley, J. (2017) “Measuring the presence of Roman rhetoric. An intertextual analysis of Augustine's De Doctrina
Christiana IV”, Mouseion, 14 (3), pp. 391–408.
Feichtinger 2021 Feichtinger, B. (2021) “Quid facit cum psalterio Horatius? (Hier. ep.
22,29,7). Untersuchung zu Hieronymus' Umgang mit klassischen und biblischen Referenzen am Beispiel von
Epistula 3 ad Rufinum”, VChr, 75, pp. 389–454.
Godel 1964 Godel, R. (1964) “Réminiscences de poètes profanes
dans les lettres de St-Jérôme”, MH, 21 (1), pp. 65–70.
Hagendahl 1958 Hagendahl, H. (1958) Latin fathers and the
classics. A study on the apologists, Jerome and other Christian writers. Göteborg: Almquist &
Wiksell.
Hohl Trillini and Quassdorf 2010 Hohl Trillini, R. and Quassdorf, S.
(2010) “A 'key to all quotations'? A corpus-based parameter model of intertextuality”,
Literary and Linguistic Computing, 25 (3), pp. 269–286.
Jakobi 2006 Jakobi, R. (2006) “Argumentieren mit Terenz. Die
Praefatio der ‘Hebraicae Quaestiones in Genesim’”, Hermes, 134 (2), pp. 250–255.
Luebeck 1872 Luebeck, E. (1872) Hieronymus quos nouerit
scriptores et ex quibus hauserit. Leipzig: Teubner.
Manca et al 2011 Manca, M., Spinazzè, L., Mastandrea, P., Tessarolo, L. and
Boschetti, F. (2011) “Musisque Deoque: Text Retrieval on Critical Editions”, Journal for Language Technology and Computational Linguistic, 26, pp. 129–140.
Mohr 2007 Mohr, A. (2007) “Jerome, Virgil, and the captive
maiden. The attitude of Jerome to classical literature” in Scourfield, J. H. D. (ed.) Texts and culture in late antiquity. Inheritance, authority, and change. Swansea: Classical Press of
Wales, pp. 299–322.
Revellio 2022 Revellio, M. (2022):
Zitate der
Aeneis
in den Briefen des Hieronymus. Eine digitale Intertextualitätsanalyse zur
Untersuchung kultureller Transformationsprozesse. Berlin/Boston: Walter de Gruyter. Available at:
https://doi.org/10.1515/9783110760828 (Accessed: 13 October
2022).
Schubert and Heyer 2010 Schubert, C. and Heyer, G. (2010). “Neue Methoden der geisteswissenschaftlichen Forschung – Eine Einführung in das Portal eAQUA”
in Schubert, C. und Heyer, G. (eds.) Das Portal eAQUA – Neue Methoden in der
geisteswissenschaftlichen Forschung I, Leipzig: Universität Leipzig, pp. 4–9.
Schwerdtner 2015 Schwerdtner, K. (2015) Plinius und seine
Klassiker. Studien zur literarischen Zitation in den Pliniusbriefen. Berlin/Boston: Walter de
Gruyter.f
Voß 1969 Voß, B.R. (1969) “Vernachlässigte Zeugnisse
klassischer Literatur bei Augustin und Hieronymus”, RhM, 112 (2), pp.
154–166.
Zou et al 2006 Zou, F., Wang, F. L., Deng, X., Han, S. and Wang, L. S. (2006)
“Automatic construction of Chinese stop word list”, Proceedings of
the 5th WSEAS International Conference on Applied Computer Science, pp. 1010–1015.