“Computational Methods for the Study of Multilingual Corpora”

Silvia Hansen University of the Saarland, Germany

Corpus linguistics is becoming increasingly important for translation studies (cf. Baker, 1995; Granger, 1999). In the past, the application of corpus linguistic methods was limited to the applied branch of this discipline. In particular, they were used in the fields of terminology, translation aids (e.g., to develop translation memories or machine translation programs), translation criticism and translation training (to improve the final product with the help of corpus-based contrastive analysis and the study of translationese). Recently also, in the theoretical and descriptive branches of translation studies corpus linguistic methods have been introduced. In particular, one issue that is receiving more and more attention is the question about translation as a particular text type (Baker, 1996; Laviosa-Braithwaite, 1996; Teich, 1999; Hansen, 1999). In this paper, I present the analysis of a corpus of translated texts and its comparison with a corpus of originals produced in the target language in order to investigate the universal features of translations (cf. Baker, 1996). Furthermore, on the basis of the analysis of universal features, I analyse the source language texts in order to see what has happened during the translational process. Thus, my aim is to identify both the universal features of translation (comparing the translation corpus with the originals of the target language) and, on this basis, the translation procedures (comparing the translation corpus with the originals of the source language). In particular, I discuss the use of various standard corpus tools, such as concordance programs, aligners and taggers for the analysis of parallel and comparable corpora. But the use of these tools is limited: only parts-of-speech and grammatical categories can be analysed with the help of such tools. Thus, it is not possible to say anything about translation procedures, translation strategies or the translational process because the results gained through standard corpus tools are quantitative values (cf. Hansen & Teich, 1999). But we need qualitative data, i.e., a linguistic description of the phenomena which occur in the translations, to test hypotheses concerning the translational process and the universal features of translations. In order to use the information which is provided through the standard corpus tools and in order to carry out deeper investigations, we need tools which are able to analyse more abstract linguistic categories. For this reason, we use the tool TATOE (http://www.darmstadt.gmd.de/~rostek/tatoe.htm) with which we annotate the corpus using Systemic Functional Linguistics (SFL; Halliday, 1978; Halliday, 1985). The systemic functional model, which allows the analysis of the relationships between the different linguistic levels (grammar, semantics, context), is used for various disciplines, e.g. for language teaching, for the area of functional stylistics, for grammatical text analysis, and for computational linguistics (in this discipline especially for automatic text generation (cf. Teich, 1995; Bateman, 1997), TATOE enables us to define systemic functional categories and, on this basis, to annotate the texts. These annotations make a systemic functional analysis of the parallel and comparable corpora possible, and thus a cross-linguistic description of the phenomena which occur in the texts. On this basis, hypotheses concerning the translational process and the universal features of translations can be tested and new ones can be generated.

Literature

M. Baker. “Corpora in translation studies: An overview and some suggestions for future research.” Target. 1995. 7: 223-243.

M. Baker. “Corpus-based translation studies: The challenges that lie ahead.” Terminology, LSP and Translation: Studies in Language Engineering in Honour of Juan C. Sager. Ed. H. Somers. Amsterdam: Benjamins, 1996. 175-186.

J. Bateman. KPML Development Environment: multilingual linguistic resource development and sentence generation. Deutsches Forschungszentrum Informationstechnik (GMD). Bonn (Birlinghoven): Institut für Integrierte Publikations und Informationssysteme (IPSI), 1997.

Proceedings of Symposium 'Contrastive Linguistics and Translation Studies. Empirical Approache', Louvain-la-Neuve, Belgien, February 1999. Ed. S. Granger. : , 1999.

M. A. K. Halliday. Language as social semiotic. London: Edward Arnold, 1978.

M. A. K. Halliday. An introduction to Functional Grammar.. London: Edward Arnold, 1985.

S.Hansen. “A Contrastive Analysis of Multilingual Corpora (English-German).” University of the Saarland, Saarbrücken, 1999.

S. Hansen E. Teich. “Kontrastive Analyse von Übersetzungskorpora: ein funktionales Modell.” Sammelband der Jahrestagung der GLDV 99. Ed. J. Gippert. Frankfurt a. Main: , 1999. 311-322.

S.Laviosa-Braithwaite. “The English Comparable Corpus (ECC): A Resource and a Methodology for the Empirical Study of Translation..” UMIST, Manchester, 1996.

E. Teich. “Towards a methodology for the construction of multilingual resources for multilingual generation.” Proceedings of the IJCAI workshop on multilingual generation, International Joint Conference on Artificial Intelligence (IJCAI), Montreal, Canada, August 1995. : , 1995. 136-148.

E. Teich. “Towards a model for the description of cross-linguistic divergence and commonality in translation.” Beyond content: Exploring translation and multilingual text production. Ed. E. Steiner C. Yallop. Berlin: Mouton de Gruyter, 1999.