Digital Humanities Abstracts

“Computer-Aided Acquisition of Language Teaching Materials from Corpora”
Svetlana Sheremetyeva New Mexico State University, USA

Awareness of domain-tuned linguistic peculiarities present in expository texts is a relevant concept in helping students' reading and writing competency in terms of genre literacies. Support for this point of view comes from the analysis of academic written genres, competing demands for limited resources, the tyranny of scheduling and from graduate students' verbal protocols about their reading process (Sengupta 1997). Genre literacy or sublanguage approach in instructed SLA advocated in this paper tries to exploit lexical, morphological, syntactic and semantic restrictions on the specialized languages used by experts in certain fields of knowledge for communication or in particular types of texts (technical and scientific articles, instructions, installation manuals, etc.). Notions of sublanguage distinctiveness rely on linguistic knowledge concerning different kinds of sublanguage regularities and restrictions. (Kittredge and Lehrberger 1982). Sublanguages are special subsystems of a natural language with restricted vocabulary and grammar which, on the one hand, share some properties with a language as a whole and, on the other hand, are characterized by some deviations from "general" language. As far as language instruction is concerned, both defining the content of this knowledge and ways of sublanguage knowledge elicitation are problems which do not have a single answer. Despite a long-standing interest in the analysis of written genres (sublanguages), little research has focused on how to really use genre specificity in language instruction. This presentation explores critical issues in the selection of an appropriate methodological framework for the analysis of profession-related texts. It aims to provide suggestions as to the kind of sublanguage analysis method that is supposed to form the basis for developing a system of typological parameters useful in acquisition of teaching materials and thus tuning language instruction to the needs of professional communication. To describe a particular sublanguage it is necessary to study laws underlying natural language phenomena and laws which make a sublanguage differ from a language. Sublanguages can be described in many ways. Language instruction is influenced by such practical parameters as scope and nature of vocabulary, grammar specificity, potential for ambiguity, lexical and grammar correlation, if any, which can and should be discovered on the basis of corpus analysis (Biber et al. 1998; Wichmann et al. 1997). This study focuses on verbs, as they are central to the structure of a sentence and consequently to text structure (Levin 1993; Aarts and Meyer 1995). The reason is that in professional reading most problems usually derive not from technical nouns and noun expressions which are relatively easy to find in specialized dictionaries but from grammar which is often characterized by extended sentences with frequently long and telescopic embedded structures. The current study also proposes and tests a sublanguage-specific hypothesis of correlation between lexical meaning, morphological representation (tense, voice, finiteness) and syntactic realization (subject, object, predicate, attribute, etc.) of a particular verb in a sublanguage. Material for the research includes five corpora of 50,000 words each from different technical sublanguages: aerospace engineering, automobile engineering, mechanical engineering, technology engineering and patents. The sample corpora are taken from four technical journals (Space Flight, Automobile and Tractor, Materials Engineering, and Machine Design) and a corpus of US patent claims. The main method of analysis is a computer-aided corpus-based combination of qualitative and quantitative (statistical) techniques applied to a pre-tagged corpus, which proved to be useful for linguistic knowledge elicitation (Sheremetyeva 1998). Tagging, done manually by trained linguists, codes morphosyntactic realizations of sublanguage verbs. For example, in the sentence "Making_TIA this apparatus they used_2IA a new technology", the tag TIA means that the verb "make" is used as an adverbial modifier in the form of Present Participle, the tag 2IA shows that the verb "use" is realized as a predicate in the form of Past Simple Active. This methodology allows for a standard automatic frequency count procedure to be applied to provide:
  • a) a verb inventory and its size in terms of verb occurrences;
  • b) a verb morphology and grammar inventory and their sizes in terms of occurrences of specific values of tense, aspect, voice, finiteness/nonfiniteness and syntactic functions as well as in terms of co-occurrence of grammatical features (for example, in the sublanguage of automobile engineering the most frequently used nonfinite realization of verbs is the Past Participle in the function of attribute, while no realization of verbs as Gerunds or Infinitives in the function of subject was found);
  • c) an inventory of lexical and morphosyntactic correlations (for example, in the sublanguage of automobile engineering the verb "use" is most often realized as the Past Participle in the function of attribute while the most frequent realization of the same verb in the aerospace engineering sublanguage is the Present Participle in the function of adverbial modifier).
Qualitative analysis of each of the above inventories included sense analysis. The number of senses for each lexeme in a sublanguage is, on average, much smaller than in the language as a whole. Thus, of the seven senses of the word engage in the Cobuild English Language Dictionary, the patent sublanguage uses only one, which includes this word in the following synonym set: engage, hold, attach, lock, join, clamp, fasten. Clearly,paradigmatic and syntagmatic relations are different in a sublanguage.

Conclusions

The paper presents a computer-aided methodology and the results of selecting teaching materials for optimizing students' reading and writing competencies in terms of genre literacies on the material of four technical sublanguages. The results of the study show "deviations" of every sublanguage from the general language and from each other. They also confirm that there exists a correlation between lexical meanings of many sublanguage verbs and their morphosyntactic realizations. These deviations can be used for selecting professionally oriented language teaching materials to most effectively foster language proficiency development. The approach was tested and proved to be very useful at the Department of Foreign Languages of South Ural State University (Russia). It is expected to be portable to other sublanguages and can be used both for developing theoretical and practical issues in applied linguistics.

References

The Verb in Contemporary English: Theory and Description. Ed. B. Aarts Ch. F. Meyer. Cambridge: Cambridge University Press, 1995.
D. Biber S. Conrad D. Reppen. Corpus Linguistics. Investigating Language Structure and Use. Cambridge: Cambridge University Press, 1998.
R. Kittredge J. Lehrberger. Sublanguage: studies of language in restricted domains. Berlin: , 1982.
B. Levin. English Verb Classes and Alternations. Chicago: University of Chicago Press, 1993.
S. Sengupta. “Academic reading skills for L2 learners: Does teaching selective reading help?.” Proceedings of the Annual Conference of American Association for Applied Linguistics. Seattle, March 13-17, 1997. : , 1997.
S. Sheremetyeva. “Acquisition of Language Resources for Special Applications.” Proceedings of the workshop Adapting Lexical and Corpus resources to Sublanguages and Applications in conjunction with The First International Conference on Language Resources and Evaluation, Granada, Spain, May 1998. : , 1998.
Teaching and large Corpora. Ed. A. Wichmann S. Fligelstone T. McEnery G. Knowles. New York: Eddison Wesley Longman Inc., 1997.