“That Was Then: Canonicity in the Trésor”
Susy
C.
Santos
University of Manitoba
umsant06@UManitoba.CA
Paul
A.
Fortier
Centre on Aging, University of Manitoba
Fortier@cc.umanitoba.ca
The Trésor de la Langue Française (TLF) corpus (http://www.lib.uchicago.edu/efts/ARTFL/databases/TLF/index.html) was set up almost half a century ago. When one reads the description of
how this was done, the distance becomes evident. Professor Imbs quite openly
admits that the goal is to reflect "elite" usage of the French language;
texts were chosen after consultation of histories of literature, some of
which were quite dated even then (Imbs 1971, I, xv-xl). Considerations of
inclusiveness, of representativity, as discussed in Scholes (1992) or von
Hallberg (1984), do not seem to have concerned the committee which finalized
the corpus. One is entitled to wonder to what extent this corpus represents
the interests of scholars of French literature a half century later.
Purpose
It is legitimate to evaluate the extent to which the texts included in the TLF database do represent important trends in French literature, as judged by what interested scholars at the time it was constituted, and as reflected by what has interested scholars of the present. More specifically, it is possible to see whether the choices embodied in the TLF reflect what scholars of the time judged important by comparing the choices of texts in a given genre - the novel - to the number of lines dedicated to the authors chosen for the TLF found in the Oxford Companion to French Literature (Harvey & Heseltine 1959). Similarly, the MLA Bibliography (http://www.mla.org/publications/bibliography) provides online data showing the number of publications in the modern languages and literatures for the periods 1963-90 and 1991 to the present. A comparison between the number of publications mentioning a novelist found in this bibliography and the number of texts by the same novelist in the TLF will show the extent to which choices made by the TLF group have been confirmed by the interest of later scholars. Given the volume of data involved these questions must be dealt with using statistics.Data
A subset of the TLF database was chosen for analysis: novels published between 1789 and 1954 (See Table 1). The name of the novelist (Author) and the number of novel texts included in the database for each writer (Texts) was recorded, along with the publication date of the text included in the database (Pub Date). When more than one novel by a given author is in the TLF Pub Date records the date of the earliest one published. In cases where authors were better known for other genres rather than prose fiction, they were removed from the test data, because they would be a source of ambiguity. These numbers were compared to three series of test data. The column OxC in Table 1 records the number of lines devoted to the novelist and to the included novels by that author which are found in the Oxford Companion to French Literature (Harvey & Heseltine 1959), a volume contemporary with the formation of the TLF database. Columns MLA 1 and MLA 2 record the number of articles mentioning the novelist or work(s) found in the MLA online bibliography of learned articles dealing with language and literature. MLA 1 covers the period 1963-1990 and MLA 2, 1991-2000. For analysis the entire set of 128 frequencies concerning novels was used. Subsequently subsets of roughly equal numbers of authors were generated, covering the periods 1789-1859 (33), 1860-1907 (35), 1908-23 (25), and 1925-54 (35).Author | Pub Date | Texts | OxC | MLA 1 | MLA 2 |
Abellio | 1946 | 1 | 0 | 9 | 0 |
About | 1857 | 2 | 14 | 1 | 0 |
Adam | 1902 | 1 | 25 | 1 | 4 |
Alain-Fournier | 1913 | 1 | 93 | 29 | 4 |
Ambriere | 1946 | 1 | 0 | 1 | 0 |
Aragon | 1936 | 1 | 25 | 445 | 305 |
Arland | 1929 | 1 | 0 | 37 | 4 |
Ayme | 1933 | 1 | 7 | 38 | 9 |
Baillon | 1927 | 1 | 0 | 3 | 6 |
Balzac | 1824 | 16 | 577 | 1986 | 781 |
Barbusse | 1916 | 1 | 16 | 52 | 13 |
Barres | 1888 | 5 | 87 | 93 | 72 |
Method
A glance at the frequencies of the texts recorded for individual authors shows a large number of authors with one text, and a very small number of authors with ten or more, a distribution pattern quite familiar to people who work with word frequencies in natural languages. These data do not form the familiar bell-shaped curve typical of the Gaussian or normal distribution. Since the data are not normally distributed, Pearson's product-moment correlation analysis cannot legitimately be used on them. Similarly these data would produce a very high proportion of predicted values smaller than 5 in a contingency table for a chi-squared analysis, so this method cannot be employed. The usual way of handling such a problem (grouping the data) is not appropriate, since it is the treatment of individual authors which is of interest. Spearman's rank correlation analysis does not require normally distributed data nor predicted frequencies greater than five; it has been chosen as the primary analytic technique and applied in pairwise fashion to the data, and to the four subsets of the data. At the same time, jackknifed outlier analysis provided by JMP-IN (Sall & Lehman 1996) has been used to identify authors whose distribution varies the most from the trends in the data.Results
Taken as a whole, the data show a high degree of correlation among the number of texts in the TLF database, the number of lines in the Oxford Companion, and the two sets of MLA Bibliographic data (See Table 2). There is no measurable probability that these correlations be the result of chance alone.Table 2: Nonparametric Measure of Association | |||
---|---|---|---|
Variable by | Variable | Spearman Rho | Prob>|Rho| |
OxC | Texts | 0.5528 | <.0001 |
MLA_1 | Texts | 0.4475 | <.0001 |
MLA_1 | OxC | 0.6101 | <.0001 |
MLA_2 | Texts | 0.4047 | <.0001 |
MLA_2 | OxC | 0.5918 | <.0001 |
MLA_2 | MLA_1 | 0.9084 | <.0001 |
Conclusion
The analysis carried out on the number of novel texts included in the TLF database shows that the texts included tend to be about the same as what might have been included if a different team of scholars had drawn it up in the late 1950s. Similarly the works included do correspond - particularly for the period up to 1908 - to what scholars of our day find sufficiently interesting to be included in their published studies. It is thus reasonable to conclude that the TLF database is a valid representation of important French literary texts for the period from 1789 to 1954. As more and more databases become commercially available, the method presented here for validating the representativity of a database using readily-available online bibliographical information would seem to have a significance which goes beyond modern French literature.Acknowledgements
The research reported here has been supported by the Social Sciences and Humanities Research Council of Canada (SSHRCC) under grant number 410-98-1348.Bibliography
Paul Harvey J. E. Heseltine. The Oxford Companion to French Literature. Oxford: Oxford UP, 1959.
Paul Imbs. Le Trésor de la Langue Française: Dictionnaire de la langue du XIXe et du XXe siècle. Paris: CNRS, 1971. 16 vols..
John Sall Ann Lehman. JMP Start Statistics. Belmont, Ca.: SAS Institute, 1996.
Robert Scholes. “Canonicity and Textuality.” Introduction to Scholarship in Modern Languages and Literatures. Ed. Joseph Gibaldi. New York: MLA, 1992.
Robert von Hallberg. Canons. Chicago: U of Chicago P., 1984.