Digital Humanities Abstracts

“The Academic vs. Subject Corpus: Development of Criteria for the Teaching of ESP According to Lexical Needs in Spanish Polytechnic Courses”
Alejandro Curado Fuentes Universidad de Extremadura, Spain

The aim of this paper is to offer the details gathered from the lexical analysis of English texts read in Information Science related majors in Spanish universities. In such a textual collection, lexical items are arranged according to the notions of word frequency and range across text types and genres, or within given subject fields and topics in the Information science and technology disciplines. How strong these lexical combinations are, based on their statistical M.I. (Mutual Information) measurement, is also quite pertinent to our study. The degree of collocation is thus assessed in the light of common coreness. That these patterns are more or less consistent in our corpus is, indeed, a key characteristic to value so that a reference with the total number of texts and running words can be established. Finally, as the findings show that there exist representative lexical items for a limited or reduced number of texts, keywords must be explored. For the observation of results drawn according to the three approaches mentioned - word frequency / range, collocations and keywords - the focus is placed on both the text and the subject-matter. This is essentially done to follow the priority of working with language and content from the ESP (English for Specific Purposes) perspective. As a consequence, a categorization is made regarding a specified kind of context - e.g. text types. As in the case of genre, the environments of text and discourse are of prime importance for the situation of lexical items in the scope of academic linguistic competence. Text types are approached in relation to how text is organized and reflects coherence and cohesion, while the second setting - genre - registers the writer's inclination and intentionality to produce discourse for a community (e.g. academic). There are two other parameters - subject and topic - on which the distribution of the lexical items of our corpus is based. In their case, a framework based on content is provided, and the findings yield the core lexis according to thematic / conceptual fields.

a. Word frequency and range.

In this first division, the most frequent text type words are provided according to how recurrent they are across six sets of ten texts. These are grouped as follows:
  • 1. Definitions.
  • 2. Descriptions.
  • 3. Classifications.
  • 4. Exemplifications.
  • 5. Discussions.
  • 6. Conclusions.
The samples are taken randomly to represent the rhetorical functions and sub-sections of genres with which the learner must cope and come to grips.° The relevant vocabulary analyzed from this perspective is classified as argumentative, procedural and discourse/grammar items, examined in demarcated domains such as distinctive subject fields and genres. ° Our immediate concern thus lies in having all the interrelationships among the subject fields represented visually in order to make the selection of text samples accordingly. We offer figures which refer to the number of sources belonging to four specified disciplines in Informatics-related majors - pointed out by abbreviations (e.g. 'C.S.' stands for Computer Science and so forth). In addition, in our corpus, capital letters refer to the codes used for the subjects/topics within disciplines as shown in the Appendices (Appendix 1). As will be observed, in addition to all the labels A - F, each single subject field is also represented individually by some texts (not shared with other studies). There are more texts in the 'F' category, which the four scientific areas share - Computer Science, Information Science, Audio-visual Communication and Optical/Wave communication. In contrast, the subject 'Communication Theory', included in the Audio-visual Communication, Information Science and Telecommunication programs of studies, is formed by only four sources. In turn, Audio-visual Communication is the discipline with the smallest amount of readings involved - only one text for each genre. ° The selection of the texts is made by having as yardsticks the overall distribution and length of these in the corpus. As a result, if there are up to five descriptions (out of 10 possible ones) included in the 'F' or 'All disciplines' category, this is due to the fact that these passages are quite common in these readings. In addition, these five samples are not as long as other types, such as definitions in this division. Finally, that the balance in relation to the entire corpus be kept is, as has been pointed out, a chief consideration. So that the text type findings based on frequency and range are also framed with the detailed knowledge of the subjects and topics comprised, the distribution of text type sources within each sub-category or label must be provided. The maximum number of texts encompassed is three - e.g. in the case of Descriptions on the topic of 'Information infrastructure' (F6 category). This distinction reflects both the larger amount of readings existent in the corpus dealing with issues of this kind and the recurrence of this type of rhetorical function employed in sub-division F6. In contrast, where no samples are contained within a given sub-category, the reason is that the model was either less developed or not included at all in the content of the text (e.g. Conclusions in 'Perspectives on information' [F1], 'Media theory' [D2], 'Media documentation' [C2], 'Automated Knowledge-based systems' [B3], etc). A final comment must be made regarding the importance of keeping a balance with the representation of three academic genres - textbooks, reports and research articles - in the construction of the corpus. The intent of this arrangement is to offer a weighed basis for text selection and analysis. Since the end of such an organizing procedure is to provide adequate ground for lexical sifting, this text type sub-corpus should incorporate as many different language and content settings - i.e. contextual factors - as possible. In this sense, even some text units as characteristic of one single genre as sections of research articles - Discussions and Conclusions, in this case - can be located in the other two genres (e.g. a discussion appearing in a textbook on Communication Theory [E1] or a conclusion taken from a report on Software Programming [A1], as figure 4 shows). The end results should thus be adequate and fitting for the design of both written and oral lexical activities and tasks that reveal the importance of academic and subject lexis, based on the analysis of common texts across different disciplines.