Digital Humanities Abstracts

“Phraseological Database Extended by Educational Material for Learning Scientific Style”
Elena I. Bolshakova Center of Computer Investigations (CIC), National Polytechnic Institute (IPN), Mexico City, Mexico elena@pollux.cic.ipn.mx

Literary styles, as well as specialized sublanguages, accomplishing communicative goals in particular fields of human activity, share main features of natural language as a whole, and at the same time demonstrates some deviations from it, with respect to their syntax, morphology, and lexicon (Grishman and Kittredge 1986). As a rule, each functional style has its own phraseology, i.e. a system of word stereotypes (cliche) exploited as stable colloquial formulas that are ready for use and thus optimize communication. Among the others, the functional style of scientific and technical (sci-tech) prose is admittedly the most distinctive one, primarily due to the intensive use of scientific phraseology including special sci-tech terms (Mitrofanova 1973). The style covers documents of various genres and particular types - manual, research paper, technical report, instructions, patents, etc. Scientific phraseology provides economical ways to express ideas in sci-tech texts with their factuality, informativeness, and precision. Teaching and learning literary styles is of great importance not only for students in the humanities, but also for students in technical and natural sciences. Student's competence in particular fields should be supplemented with the ability to write sci-tech documents of a sufficiently high quality. Thus, education in technical and natural sciences should include some humanity knowledge, in particular, knowledge of scientific style. Phraseology of specialized scientific sublanguages includes both sci-tech terms and the common scientific phraseology. Acquiring the latter presents the major difficulty in learning scientific style, because terms can be usually found in specialized dictionaries, while there are few available dictionaries of typical scientific phraseological expressions. However, students need certain educational information or/and an assistant system for acquiring scientific phraseology. We describe a computer system being under development over a period of two years and integrating phraseological database of Russian scientific language and explanatory educational material. It is intended to help students to improve their linguistic competence in the scientific style and genres and belongs to hybrid computer systems supporting both process of sci-tech writing and learning its fundamentals. Another example of such hybrid systems is an experimental system described in (Bolshakova 2000). While designing the phraseology database, the principles of several computer lexical databases were considered (Fellbaum 1998, Bolshakov 1994).

Features of the System

>From the user's point of view, the system can be regarded as a linguistic database supplied with a computer reference guide accumulating general explanatory information about scientific style and phraseology. Text of the guide has been specially written and structured for representation in hypertext form, since usefulness of hypertext for learning is well acknowledged (Brusilovsky 1996). Thus, each page of the reference guide presents a relatively independent topic and is connected by hypertext links with another pages of the guide and pages presenting items of the phraseology database. In turn, hypertext pages with phraseological expressions are both interconnected and connected with guide pages explaining necessary concepts. Besides browsing through various pages, the search of phraseological expressions containing fixed words can be made, resulting in a relevant page. The system is flexibly organized: it allow a free navigation through pages of the reference guide and of the phraseological database, thus enabling to view the information in a desirable sequence. At the same time, a student can learn the educational material in a predetermined systemic way recommended for beginners. Such flexibility envisioned by a liberal humanities viewpoint proved to be more effective learning strategy.

Covered Phraseology

Phraseology represented in the database was gathered from several textual dictionaries of common scientific phraseology - see, for example, (DICT 1973) and then complemented by phraseological data obtained through manual scanning of scientific texts in several fields. Units of common scientific phraseology, including domain independent word stereotypes and colloquial templates specific for particular scientific genres, was systemized and arranged according to their functions in texts. The biggest group of expressions concerns words regarded as common scientific variables, e.g. "problem", "analysis", "result". For instance, phraseological expressions with such variables are: "objective analysis shows/yields ", "to question the results". Another group presents units of metatext character, designing and organizing scientific text narrative. It includes expressions serving as connectors of different textual parts ("in addition", "mentioned above", etc.), expressions indicating information source (like "in their/our opinion"), and estimating expressions (e.g., "it seems reasonable"). Each item of the phraseological database integrates all semantically equivalent variants (synonyms) of a particular expression that are described by a semantico-syntactic pattern with associated information including an explanation of its meaning and examples of typical sentences exploiting it. Empty valences of the expression are indicated in the pattern, with specification of their semantic roles.

Conclusions

We have described both the methodological framework and the main features of a computer system intended for learning phraseology of Russian sci-tech texts. Its interrelated components, i.e. phraseology database and educational material represented in hypertext form, are partially implemented with the aid of Borland Delphi environment tools. Among directions of system improvement being now under consideration we should point out further extension of phraseology lexicon. Text corpora reflecting contemporary sci-tech language usage will supposedly be exploited, since features of any style and sublanguage can be revealed exhaustively on the basis of corpus analysis (Biber et al. 1998). Another direction concerns merging into a common database of scientific phraseologies of several natural languages. Preliminary comparative study of scientific phraseology of Russian, English, and Spanish languages shows an evident similarity of their word stereotypes. This fact can be used for the systematical computer-aided teaching of foreign scientific phraseology.

References

D. Biber S. Conrad D. Reppen. Corpus Linguistics. Investigating Language Structure and Use. Cambridge: Cambridge University Press, 1998.
I. Bolshakov. “Multifunctional Thesaurus for Russian Word Processing.” Proceedings of 4th Conference on Applied Natural Language Processing, Stuttgard, 13-15 October, 1994. : , 1994. 200-202.
P. Brusilovsky. “Methods and Techniques of Adaptive Hypermedia.” User Modeling and User-Adapted Interaction. 1996. 6: 87-129.
E. Bolshakova. “Computer Assistance in Writing Technical and Scientific Texts.” Proceedings of 2nd International Symposium "Las Humanidades en la Educacion Tecnica ante el Siglo XXI", Mexico, 27-29 September, 2000. : , 2000. 59-63.
unknown. Dictionary of Verb-Noun Combinations of the Common Scientific Speech. Moscow: Nauka Publ., 1973.
WordNet: An Electronic Lexical Database. Ed. C. Fellbaum. Cambridge: MIT Press, 1998.
Analyzing Language in Restricted Domains: Sublanguage Description and Processing. Ed. R. Grishman R. Kittredge. Hillsdale, N.J.: Lawrence Erlbaum Associates, 1986.
O. Mitrofanova. Language of Scientific and Technical Literature. : Moscow University Press, 1973.