“Computational Methods for the Study of Multilingual
Corpora”
Silvia
Hansen
University of the Saarland, Germany
Corpus linguistics is becoming increasingly important for translation studies
(cf. Baker, 1995; Granger, 1999). In the past, the application of corpus
linguistic methods was limited to the applied branch of this discipline. In
particular, they were used in the fields of terminology, translation aids (e.g.,
to develop translation memories or machine translation programs), translation
criticism and translation training (to improve the final product with the help
of corpus-based contrastive analysis and the study of translationese). Recently
also, in the theoretical and descriptive branches of translation studies corpus
linguistic methods have been introduced. In particular, one issue that is
receiving more and more attention is the question about translation as a
particular text type (Baker, 1996; Laviosa-Braithwaite, 1996; Teich, 1999;
Hansen, 1999). In this paper, I present the analysis of a corpus of translated
texts and its comparison with a corpus of originals produced in the target
language in order to investigate the universal features of translations (cf.
Baker, 1996). Furthermore, on the basis of the analysis of universal features, I
analyse the source language texts in order to see what has happened during the
translational process. Thus, my aim is to identify both the universal features
of translation (comparing the translation corpus with the originals of the
target language) and, on this basis, the translation procedures (comparing the
translation corpus with the originals of the source language). In particular, I
discuss the use of various standard corpus tools, such as concordance programs,
aligners and taggers for the analysis of parallel and comparable corpora. But
the use of these tools is limited: only parts-of-speech and grammatical
categories can be analysed with the help of such tools. Thus, it is not possible
to say anything about translation procedures, translation strategies or the
translational process because the results gained through standard corpus tools
are quantitative values (cf. Hansen & Teich, 1999). But we need qualitative
data, i.e., a linguistic description of the phenomena which occur in the
translations, to test hypotheses concerning the translational process and the
universal features of translations. In order to use the information which is
provided through the standard corpus tools and in order to carry out deeper
investigations, we need tools which are able to analyse more abstract linguistic
categories. For this reason, we use the tool TATOE (http://www.darmstadt.gmd.de/~rostek/tatoe.htm) with which we
annotate the corpus using Systemic Functional Linguistics (SFL; Halliday, 1978;
Halliday, 1985). The systemic functional model, which allows the analysis of the
relationships between the different linguistic levels (grammar, semantics,
context), is used for various disciplines, e.g. for language teaching, for the
area of functional stylistics, for grammatical text analysis, and for
computational linguistics (in this discipline especially for automatic text
generation (cf. Teich, 1995; Bateman, 1997), TATOE enables us to define systemic
functional categories and, on this basis, to annotate the texts. These
annotations make a systemic functional analysis of the parallel and comparable
corpora possible, and thus a cross-linguistic description of the phenomena which
occur in the texts. On this basis, hypotheses concerning the translational
process and the universal features of translations can be tested and new ones
can be generated.
Literature
M. Baker. “Corpora in translation studies: An overview and some
suggestions for future research.” Target. 1995. 7: 223-243.
M. Baker. “Corpus-based translation studies: The challenges that
lie ahead.” Terminology, LSP and Translation: Studies in Language Engineering in Honour of Juan C. Sager. Ed. H. Somers. Amsterdam: Benjamins, 1996. 175-186.
J. Bateman. KPML Development Environment: multilingual linguistic resource development and sentence generation. Deutsches Forschungszentrum Informationstechnik (GMD). Bonn (Birlinghoven): Institut für Integrierte Publikations und Informationssysteme (IPSI), 1997.
Proceedings of Symposium 'Contrastive Linguistics and Translation Studies. Empirical Approache', Louvain-la-Neuve, Belgien, February 1999. Ed. S. Granger. : , 1999.
M. A. K. Halliday. Language as social semiotic. London: Edward Arnold, 1978.
M. A. K. Halliday. An introduction to Functional Grammar.. London: Edward Arnold, 1985.
S.Hansen. “A Contrastive Analysis of Multilingual Corpora
(English-German).” University of the Saarland, Saarbrücken, 1999.
S. Hansen E. Teich. “Kontrastive Analyse von Übersetzungskorpora: ein
funktionales Modell.” Sammelband der Jahrestagung der GLDV 99. Ed. J. Gippert. Frankfurt a. Main: , 1999. 311-322.
S.Laviosa-Braithwaite. “The English Comparable Corpus (ECC): A Resource and a
Methodology for the Empirical Study of Translation..” UMIST, Manchester, 1996.
E. Teich. “Towards a methodology for the construction of
multilingual resources for multilingual generation.” Proceedings of the IJCAI workshop on multilingual generation, International Joint Conference on Artificial Intelligence (IJCAI), Montreal, Canada, August 1995. : , 1995. 136-148.
E. Teich. “Towards a model for the description of cross-linguistic
divergence and commonality in translation.” Beyond content: Exploring translation and multilingual text production. Ed. E. Steiner C. Yallop. Berlin: Mouton de Gruyter, 1999.