Digital Humanities Abstracts

“"Grotefend", a tool for deciphering ancient syllabic scripts”
Heikki S. Särkkä University of Joensuu, School of Translation Studies SARKKA@cc.joensuu.fi

The "Grotefend" program that will be introduced in this paper is intended to facilitate the making of hypotheses concerning the decipherment of an unknown syllabic script. In essence, the program consists of two modules, one that can be used for assigning readings to the signs and the other for analysis of the text. The two modules can be used independently of each other, that is, a text can be analysed without making any assumptions about how any given sign should be read. The program was written in Visual Basic at the Department of Computer Science of the University of Joensuu by Tuomo Pusa and Kari Tanskanen. At the outset, the user establishes the range of signs used in a given text and gives each sign a number. This converts the text into a sequence of numbers and, as the case may be, gaps representing word separators. The use of numbers is a purely practical device obviating the need for choosing a standard visual representation for each sign. The strings of numbers and any word separators where these can be recognized are then inputted as the data to the program. Once this has been done, tentative readings can be given to the individual signs by using an input table. Any time a reading is assigned to a given sign, the same reading is automatically given to all the occurrences of that sign in the text. Any reading can be changed afterwards without affecting the readings of other signs. The following reports can be generated by the program:
  • 1. Total frequency of basic signs
  • 2. Total frequency of basic signs that may occur word-initially
  • 3. Total frequency of basic signs that may occur word-finally
  • 4. Basic signs only occurring word-initially and their absolute frequency
  • 5. Basic signs only occurring word-finally and their absolute frequency
  • 6. Repeated strings. This is a list of strings of 2-5 basic signs occurring more than once in the text with their line numbers.
  • 7. Repeated strings found on the assumption that the text runs boustrophedon. If repeated strings are found on lines X and Y (X and Y being line numbers), there is a probability that the text was written left to right and right to left on alternative lines if the difference X-Y is an odd number. The higher the occurrence of repeated strings under the said conditions and the longer they are, the higher the probability that the text in fact does run boustrophedon.
Apart from the above, the data generated can be used as a basis for generating further, perhaps more interesting data such as the relative frequencies of different signs, which, in turn, should be helpful for formulating hypotheses about the genetic or structural kinship of the language concerned with languages of known structural characteristics. A fundamental problem that has to be addressed in calculating relative sign frequencies is the nature of the unknown system, that is, whether we are dealing with a syllabary consisting of vowels and open syllables only like the system of Linear B or the two kana syllabaries of Japanese as opposed to, say, the system of Akkadian cuneiform which uses closed syllables as well in a seemingly unsystematic way that allows the same word to be written in numerous different ways. Given a long enough text, the number of different signs is likely to give us a clue to the nature of the system used. Strings of signs regularly repeating in the text would be indicative of a stable graphemic system while few repetitions would lead one to expect a "variable key" system comparable to cuneifroms that would be correspondingly more difficult to break in a way that commands confidence. Worth studying is the question of how long a text should be in order for us to be able to draw valid inferences about the language. Not unnaturally, that depends on the type of inferences we would like to make. At the most basic level, the decipherer has to make sure that the textual material s/he is looking at consists of samples of the language. If we manage to determine the minimal length of text that is needed for the identification of a given text as representing a given language with a given degree of probability, we are in a better position to collect a corpus of texts that are indeed written in the same language. Any advance made in the decipherment on the basis of one text could then be checked against other texts in the same language. Problems of decipherment are further compounded by the fact that the script tells us very little about the phonology of the language unless we know the degree of fidelity with which phonemic contrasts are reflected by the it. A case in point here would be the differences between the older and the younger futhark in Scandinavia. Even in the face of the above uncertainties concerning the fit between the graphology and phonology of a language, there are certain features that are a priori likely to prove more fertile than others. If it is a question of a syllabic script, word-initial vowels are an obvious starting point. Even if the vowel 'a' is the most frequent vowel across a range of languages, absolute frequencies will vary depending on the historical phonology of the language concerned. Depending on the language, identification of proper nouns may occasionally be possible because of their greater length. The following regularity is suggested: in a text consisting of otherwise shorter reoccurring units, a reoccurring unit considerably longer than average is indicative of a proper noun. The rationale behind this may be either that sentential strings like 'Marduk will help him' are used as names or the fact that a longish repeated element consists of a noun plus one or more epithets. On comparison with known languages from the same area, identification of proper nouns in turn will give valuable clues to the surrounding textual material both in terms of its semantic content and syntactic function. The reports given by the program are only an aid to researchers that takes away some of the tedious spadework necessary for successful decipherment. As such, however, the program is a tool that should speed up the process by allowing scholars to direct their creative efforts towards more demanding tasks.

References

Problems in Decipherment. Ed. Yves Duhoux Thomas G. Palaima John Bennet. Louvain-la-Neuve: Peeters, 1989.