Corpora in Translation Studies

Baker (1995) describes various types of electronic corpora that are of specific interest to translation scholars. In Baker's terminology, a parallel corpus consists of texts originally written in a language A alongside their translations into a language B. Parallel corpora exist for several language pairs including English-French (Salkie 1995; and see also Church and Gale's work (1991) using the Canadian Hansard), English-Italian (Marinai et al. 1992), and English- Norwegian (Johansson and Hofland 1994; Johansson, et al. 1996). Parallel corpora can be used to provide information on language-pair specific translational behaviour, or to posit certain equivalence relationships between lexical items or structures in source and target languages (Marinai et al. 1992).Typical applications of parallel corpora include translator training, bilingual lexicography and machine translation. Baker (1995) uses the term comparable corpus to describe a collection of texts originally written in a language, say English, alongside a collection of texts translated (from one or more languages) into English, and suggests that comparable corpora have the potential to reveal most about features specific to translated text, i.e., those features that occur exclusively, or with unusually low or high frequency, in translated text as opposed to other types of text production, and that cannot be traced back to the influence of any one particular source text or language. Translation theorists such as Shlesinger (1991), Toury (1980), Vanderauwera (1985) and Baker (1993) have posited the following as features of translated text: translated texts tend to be more explicit, less ambiguous, and grammatically and lexically more conventional than source texts or other texts produced in the target language.

Using collocation as an indicator of conservatism

The idea that translations are more conventional than their source texts or other target language texts can also be tested by investigating collocational patterns. Should familiar collocational patterns be somehow flouted in a source text, then the point at which this happens will also have special textemic status in that source text. This could occur, for example, at points where the preference of a word under investigation for collocates of a particular semantic set is not respected in a text, or, more specifically, where there are "departures ... from the expected profiles of semantic prosodies" (Louw 1993: 157). Corpus linguistics provides interesting techniques for spotting recurring and, by contrast, unconventional patterns of co-occurrence in vast quantities of text (Clear 1993; Louw 1993) and such techniques are being extended to bilingual corpora (Peters and Picchi 1996; Smadja et al. 1996). If translators really are under pressure to conform to target-language norms, one could expect unconventional co-occurrences in source texts to be replaced by more conventional collocations in the target text. The current doctoral research represents an attempt to use collocation as an indicator of conservative tendencies amongst translators. It involves the building of a parallel corpus of contemporary German fiction translated into English. Unconventional lexical co-occurrences are to be identified in the German source texts, by comparing the source texts with a large reference corpus of German, and using the tools of collocation analysis (Clear 1993; Barnbrook 1996). The translation into English of such unusual lexical combinations will then be investigated to see whether these are conventionalized in any way. Such conventionalization can, of course, only be established with reference to a large corpus of fiction originally written in English, in other words, using a comparable corpus.

A pilot investigation

This poster sets out specifically to report on a pilot test designed to investigate collocational patterns in a small number of German source and English target texts. The principle issues at stake are: how to choose node words worth investigating in the original German texts; and how to identify statistically significant collocations and, by contrast, unusual co-occurrences in the source and target texts. Various approaches are taken in the literature: Stubbs (1996), for example, investigates the collocates of culturally significant nodes; other researchers (Clear 1993; Smadja 1993) report on approaches that compute collocation patterns for every word form in a corpus from the outset, only to later jettison those combinations that fall below an arbitrary threshold of significance. It is also well known that different measures of statistical significance yield different results in automatic collocation recognition (Clear 1993; Smadja 1993). By comparing approaches, it is hoped that this pilot test will indicate how the research should proceed when it is scaled up to include the full set of German source texts. It is also intended to reveal problems that may be specific to the identification of collocations in two different languages, and specifically, whether unconventional (free) lexical combinations can be fruitfully used as a springboard for investigating conservative linguistics tendencies among literary translators.

References

M. Baker. “Corpus Linguistics and Translation Studies: Implications and Applications.” Text and Technology: In Honour of John Sinclair. Ed. M. Baker G. Francis E. Tognini-Bonelli. Amsterdam/Philadelphia: John Benjamins, 1993. 223-250.

M. Baker. “Corpora in Translation Studies: An Overview and Some Suggestions for Future Research.” Target. 1995. 7: 223-243.

Text and Technology: In Honour of John Sinclair. Ed. M. Baker G. Francis E. Tognini-Bonelli. Amsterdam/Philadelphia: John Benjamins, 1993.

G. Barnbrook. Language and Computers. Edinburgh: Edinburgh University Press, 1996.

K. Church W. Gale. “Concordances for Parallel Text.” Using Corpora: Proceedings of the Seventh Annual Conference of the UW Centre for the New OED & Text Research. Oxford: St. Catherine's, 1991.

J. Clear. “From Firth Principles: Computational Tools for the Study of Collocation.” Text and Technology: In Honour of John Sinclair. Ed. M. Baker G. Francis E. Tognini- Bonelli. Amsterdam/Philadelphia: John Benjamins, 1993. 271-292.

S. Johansson K. Hofland. “Towards an English-Norwegian parallel corpus.” Creating and using English language corpora, Papers from the Fourteenth International Conference on English Language Research on Computerized Corpora, Zürich 1993. Ed. U. Fries G. Tottie P. Schneider. Zürich: , 1993.

S. Johansson J. Ebeling K. Hofland. “Coding and Aligning the English-Norwegian Parallel Corpus.” Languages in Contrast Papers from a Symposium on Text-based Cross-linguistic Studies, Lund 4-5 March 1994. Ed. K. Aijmer B. Altenberg M. Johansson. Lund: Lund University Press, 1996. 87-112.

B. Louw. “Irony in the Text or Insincerity in the Writer? The Diagnostic Potential of Semantic Prosodies.” Text and Technology: In Honour of John Sinclair. Ed. M. Baker G. Francis E. Tognini- Bonelli. Amsterdam/Philadelphia: John Benjamins, 1993. 157-176.

E. Marinai C. Peters E. Picchi. “Bilingual Reference Corpora: Creation, Querying, Applications.” Papers in Computational Lexicography Complex '92. Ed. F. Kiefer G. Kiss J. Pajzs. Budapest: Linguistics Institute, Hungarian Academy of Sciences, 1992.

C. Peters E. Picchi. “Bilingual reference corpora for translators and translation studies.” Paper presented at Unity in Diversity, International Translation Studies Conference, Dublin City University, 9-11 May 1996. : , 1996.

R. Salkie. “Intersect: a Parallel Corpus Project at Brighton University.” Computers and Texts. 1995. 9: .

M. Shlesinger. “Interpreter Latitude vs. Due Process. Simultaneous and Consecutive Interpretation in Multilingual Trials.” Empirical Research in Translation and Intercultural Studies. Ed. S. Tirkkonen-Condit. Tübingen: Gunter Narr., 1991.

F. Smadja. “Retrieving Collocations from Text: Xtract.” Computational Linguistics. 1993. 19: 143-177.

F. Smadja K. McKeown V. Hatzivassiloglou. “Translating Collocations for Bilingual Lexicons: A Statistical Approach.” Computational Linguistics. 1996. 22: 1-38.

M. Stubbs. Text and Corpus Analysis. Oxford: Blackwell, 1996.

G. Toury. In Search of a Theory of Translation. Tel Aviv: The Porter Institute for Poetics and Semiotics, 1980.

R. Vanderauwera. Dutch Novels Translated into English: The Transformation of a "Minority" Literature. Amsterdam: Rodopi, 1985.