Volume 3 Number 2
Vive la Différence! Text Mining Gender Difference in French Literature
Abstract
In this study, a corpus of 300 male-authored and 300 female-authored French literary and historical texts is classified for author gender using the Support Vector Machine (SVM) implementation SVMLight, achieving up to 90% classification accuracy. The sets of words that were most useful in distinguishing male and female writing are extracted from the support vectors. The results reinforce previous findings from statistical analyses of the same corpus, and exhibit remarkable cross-linguistic parallels with the results garnered from SVM models trained in gender classification on selections from the British National Corpus. It is found that female authors use personal pronouns and negative polarity items at a much higher rate than their male counterparts, and male authors demonstrate a strong preference for determiners and numerical quantifiers. Among the words that characterize male or female writing consistently over the time period spanned by the corpus, a number of cohesive semantic groups are identified. Male authors, for example, use religious terminology rooted in the church, while female authors use secular language to discuss spirituality. Such differences would take an enormous human effort to discover by a close reading of such a large corpus, but once identified through text mining, they frame intriguing questions which scholars may address using traditional critical analysis methods.
Amanda Bonner: What I said was true, there's no difference between the sexes. Men, women, the same.
Adam Bonner: They are?
Amanda Bonner: Well, maybe there is a difference, but it's a little difference.
Adam Bonner: Well, you know as the French say...
Amanda Bonner: What do they say?
Adam Bonner: Vive la difference!
Amanda Bonner: Which means?
Adam Bonner: Which means hurrah for that little difference. (Adam's Rib, 1949)
Introduction
Comparison With Previous Research
Experimental Design
Machine Learning Runs
Word | Lemma | PoS | PoSgroup | |
Male | 88.3% | 87.3% | 73.0% | 69.7% |
Female | 83.3% | 84.4% | 75.7% | 78.7% |
All | 85.7% | 85.9% | 74.4% | 74.2% |
Word | Lemma | PoS | PoSgroup | |
Male | 91.3% | 92.4% | 73.9% | 73.9% |
Female | 81.5% | 81.5% | 78.3% | 69.6% |
All | 86.4% | 87.0% | 76.1% | 71.7% |
153 persistent features in Male-authored documents: 1, a, abord, action, affaire, ajouta, amie, article, au, aura, auteur, autour, autre, aux, avons, bas, bouche, bras, c, capitaine, cent, chacun, chair, champ, charles, chez, christ, ciel, cinq, comment, comtesse, contre, corps, coup, coups, crime, côté, d', des, deux, diable, dis, docteur, doigts, dont, doute, droite, du, entre, est, face, fait, façon, femme, feu, fin, fit, fois, foule, gens, gros, haut, histoire, homme, hé, hôtel, ils, in, jacques, jean, juge, jusqu', la, laquelle, le, les, leurs, ligne, long, lorsque, main, mains, maîtresse, messieurs, mis, mit, moins, monseigneur, monsieur, montre, mot, même, nez, nom, nombre, nos, oeil, oeuvres, ordre, oreille, ou, oui, où, par, passage, pied, pieds, présente, président, prêtre, quatre, quelqu', quelque, quelques, question, qui, quoi, reprit, reste, rue, récit, saint, saints, salut, sang, second, seconde, selon, ses, seulement, simple, sire, soit, sous, sur, table, tirer, tour, toute, trente, trois, un, v, ventre, vers, vieux, village, vin, vingt, voici, y, yeux, à |
192 persistent features in Female-authored documents: 192 persistent features in Female-authored documents: absence, admiration, afin, agréable, ai, aimable, aime, aimer, aller, amitié, amour, anglais, angleterre, auguste, auprès, aurais, avais, avait, avec, avez, avoir, beaucoup, belle, bien, bonheur, bonne, brillante, but, cacher, car, caractère, celle, chagrin, chercher, chère, coeur, comprendre, compte, comte, confiance, conserver, cour, crois, destinée, disant, donner, douceur, douleur, doux, elle, elles, empêcher, encore, enfance, enfant, enfants, entièrement, envie, esprit, espérance, estime, eût, faisait, fallait, faut, fièvre, fleurs, france, frère, fût, gloire, goût, grande, grandes, généreux, henri, hiver, ici, il, imagination, impossible, inquiétude, inspire, inspirer, instant, intérêt, jamais, jardin, jours, liberté, lui, lumières, m, ma, mais, malgré, manière, manières, me, moi, mon, montrer, mère, ne, ni, nécessaire, opinion, parce, parler, parlez, passion, pauvre, pays, personne, personnes, petite, peut, peuvent, plaire, plaisir, pleurs, plusieurs, possible, pourquoi, pourrais, pouvait, prince, princes, princesse, pu, puisque, puissance, père, quand, que, quitter, regarder, reine, repos, retrouver, revenir, roi, sais, sait, sans, savoir, secret, sentiment, sentir, seule, si, son, souffrir, souvenir, souvent, soyez, suis, supporter, surprise, tant, toi, toujours, tous, toutes, trop, trouva, trouver, très, tu, utile, veux, vie, vit, vivre, voir, vois, vos, votre, voulait, voulut, vous, voyage, voyant, véritable, âme, éducation, égard, égards, émotion, épouser, était, êtes |
Enduring Male Terms | Enduring Female Terms |
|
|
Conclusion
Male Features | Female Features | ||
Word | Weight | Word | Weight |
qui | 3.032 | elle | -4.270 |
un | 2.706 | ne | -2.768 |
à | 2.568 | vous | -2.256 |
le | 2.512 | pas | -1.812 |
des | 2.392 | et | -1.594 |
du | 1.993 | avec | -1.435 |
les | 1.847 | mais | -1.433 |
au | 1.598 | lui | -1.365 |
monsieur | 1.396 | était | -1.346 |
est | 1.302 | si | -1.245 |
deux | 1.264 | avait | -1.178 |
de | 1.250 | me | -1.127 |
sur | 1.033 | ma | -1.069 |
a | 0.953 | pour | -0.952 |
homme | 0.884 | sans | -0.811 |
par | 0.867 | moi | -0.794 |
ce | 0.746 | consuelo | -0.779 |
madame | 0.690 | quand | -0.779 |
d' | 0.656 | bien | -0.702 |
une | 0.594 | roi | -0.676 |
ces | 0.590 | l' | -0.666 |
ses | 0.586 | il | -0.614 |
dont | 0.566 | beaucoup | -0.570 |
quelque | 0.554 | n' | -0.560 |
femme | 0.535 | henri | -0.543 |
ils | 0.528 | m' | -0.535 |
où | 0.511 | jamais | -0.523 |
tems | 0.496 | reine | -0.513 |
charles | 0.493 | je | -0.482 |
ou | 0.487 | princesse | -0.479 |
autre | 0.451 | toujours | -0.470 |
aux | 0.449 | car | -0.465 |
yeux | 0.429 | ai | -0.462 |
main | 0.417 | votre | -0.459 |
fit | 0.392 | esprit | -0.453 |
leurs | 0.386 | avais | -0.447 |
quelques | 0.384 | m | -0.444 |
leur | 0.380 | personne | -0.430 |
cette | 0.379 | albert | -0.419 |
fait | 0.379 | temps | -0.400 |
après | 0.374 | mon | -0.393 |
avois | 0.374 | bonne | -0.383 |
reste | 0.363 | être | -0.381 |
mille | 0.355 | dans | -0.379 |
même | 0.327 | ça | -0.371 |
saint | 0.326 | se | -0.365 |
fille | 0.324 | liberté | -0.364 |
francs | 0.309 | la | -0.360 |
tout | 0.307 | âme | -0.356 |
lettre | 0.299 | très | -0.356 |
étoit | 0.298 | enfants | -0.349 |
entre | 0.287 | peut | -0.347 |
Male Features | Female Features | ||
Word | Weight | Word | Weight |
qui | 3.043 | elle | -4.291 |
un | 2.716 | ne | -2.780 |
à | 2.578 | vous | -2.265 |
le | 2.522 | pas | -1.820 |
des | 2.400 | et | -1.599 |
du | 2.000 | avec | -1.441 |
les | 1.856 | mais | -1.439 |
au | 1.603 | lui | -1.366 |
monsieur | 1.400 | était | -1.348 |
est | 1.305 | si | -1.250 |
deux | 1.269 | avait | -1.179 |
de | 1.252 | me | -1.127 |
sur | 1.037 | ma | -1.072 |
a | 0.956 | pour | -0.956 |
homme | 0.888 | sans | -0.814 |
par | 0.870 | moi | -0.795 |
ce | 0.749 | quand | -0.782 |
madame | 0.690 | bien | -0.706 |
d' | 0.657 | roi | -0.679 |
une | 0.597 | l' | -0.668 |
ces | 0.592 | il | -0.621 |
ses | 0.587 | beaucoup | -0.572 |
dont | 0.568 | n' | -0.564 |
quelque | 0.555 | henri | -0.549 |
femme | 0.537 | m' | -0.536 |
ils | 0.530 | jamais | -0.526 |
où | 0.513 | reine | -0.515 |
tems | 0.498 | je | -0.483 |
charles | 0.495 | princesse | -0.481 |
ou | 0.488 | toujours | -0.471 |
autre | 0.452 | car | -0.466 |
aux | 0.450 | ai | -0.462 |
yeux | 0.430 | votre | -0.460 |
main | 0.418 | esprit | -0.455 |
fit | 0.394 | avais | -0.447 |
leurs | 0.387 | m | -0.445 |
quelques | 0.386 | personne | -0.431 |
cette | 0.381 | albert | -0.420 |
leur | 0.381 | temps | -0.402 |
fait | 0.380 | mon | -0.392 |
après | 0.375 | bonne | -0.385 |
avois | 0.375 | être | -0.380 |
reste | 0.364 | dans | -0.378 |
mille | 0.356 | ça | -0.375 |
même | 0.329 | se | -0.366 |
saint | 0.327 | liberté | -0.365 |
fille | 0.325 | la | -0.358 |
francs | 0.311 | très | -0.358 |
Works Cited
Recommendations
DHQ is testing out three new article recommendation methods! Please explore the links below to find articles that are related in different ways to the one you just read. We are interested in how these methods work for readers—if you would like to share feedback with us, please complete our short evaluation survey. You can also visit our documentation for these recommendation methods to learn more.
SPECTER Recommendations
Below are article recommendations generated by the SPECTER model:
- Gender, Race, and Nationality in Black Drama, 1950-2006: Mining Differences in Language Use in Authors and their Characters, 2009, Shlomo Argamon, Linguistic Cognition Lab, Dept. of Computer Science, Illinois Institute of Technology, Chicago; Charles Cooney, ARTFL Project, University of Chicago; Russell Horton, Digital Library Development Center, University of Chicago; Mark Olsen, ARTFL Project, University of Chicago; Sterling Stein, Linguistic Cognition Lab, Dept. of Computer Science, Illinois Institute of Technology, Chicago; Robert Voyer, Powerset
- Made to Be a Woman: A case study on the categorization of gender using an individuation-based approach in the analysis of literary texts, 2023, Mareike Schumacher, University of Regensburg; Marie Flüh, Hamburg University
- Text Minding: "A Response to Gender, Race, and Nationality in Black Drama, 1850-2000: Mining Differences in Language Use in Authors and their Characters", 2009, Sean Ross Meehan, Washington College, Chesterton, MD
- Mining for characterising patterns in literature using correspondence analysis: an experiment on French novels, 2017, Francesca Frontini, Université Paul-Valéry Montpellier 3 - Praxiling UMR 5267 CNRS - UPVM3; Mohamed Amine Boukhaled, Laboratoire d'Informatique de Paris 6 (LIP6 UPMC) / Labex OBVIL; Jean-Gabriel Ganascia, Laboratoire d'Informatique de Paris 6 (LIP6 UPMC) / Labex OBVIL
- Modernism and Gender at the Limits of Stylometry, 2021, Sean Weidman, Pennsylvania State University; Aaren Pastor, Pennsylvania State University
DHQ Keyword Recommendations
Below are article recommendations generated by DHQ Keywords:
- Gender, Race, and Nationality in Black Drama, 1950-2006: Mining Differences in Language Use in Authors and their Characters, 2009, Shlomo Argamon, Linguistic Cognition Lab, Dept. of Computer Science, Illinois Institute of Technology, Chicago; Charles Cooney, ARTFL Project, University of Chicago; Russell Horton, Digital Library Development Center, University of Chicago; Mark Olsen, ARTFL Project, University of Chicago; Sterling Stein, Linguistic Cognition Lab, Dept. of Computer Science, Illinois Institute of Technology, Chicago; Robert Voyer, Powerset
- Mining Public Discourse for Emerging Dutch Nationalism, 2016, Maarten van den Bos, Utrecht University; Hermione Giffard, Utrecht University
- Computational Stylistic Analysis of Popular Songs of Japanese Female Singer-songwriters, 2014, Takafumi Suzuki, Toyo University; Mai Hosoya, Toyo University
- Text Minding: "A Response to Gender, Race, and Nationality in Black Drama, 1850-2000: Mining Differences in Language Use in Authors and their Characters", 2009, Sean Ross Meehan, Washington College, Chesterton, MD
- Mining Embodied Emotions: A Comparative Analysis of Sentiment and Emotion in Dutch Texts, 1600-1800., 2018, Inger Leemans, Faculty of Humanities, Vrije Universiteit Amsterdam, The Netherlands; Janneke M. van der Zwaan, Netherlands eScience Center, Amsterdam, The Netherlands; Isa Maks, Faculty of Humanities, Vrije Universiteit Amsterdam, The Netherlands; Erika Kuijpers, Faculty of Humanities, Vrije Universiteit, Amsterdam, The Netherlands; Kristine Steenbergh, Faculty of Humanities, Vrije Universiteit, Amsterdam, The Netherlands
TF-IDF Recommendations
Below are article recommendations generated by the TF-IDF Model:
- Les Sganarelle de Molière : un nom, des syntaxes ?, 2018, Élodie Bénard, Université Paris-Sorbonne; Francesca Frontini, Université Paul Valéry Montpellier
- Que mille lectures s’épanouissent… Modélisation du personnage et expérience de crowdreading, 2018, Ioana Galleron, University of Grenoble; Fatiha Idmhand, Université de Poitiers; Cécile Meynard, University of Angers
- Élaboration d’un modèle appuyé sur le modèle RDF dans le cadre de la réalisation d’une Bibliothèque virtuelle Chris Marker à la Cinémathèque française, 2018, Camille Monnier, French Ministry of Culture
- Pour une analyse automatique du jugement critique : les citations modalisées dans le discours littéraire du XIXe siècle, 2018, Marine Riguet, Labex OBVIL (Paris-Sorbonne); Motasem Alrahabi, Paris-Sorbonne Abou Dhabi
- Reconstruire ce qui manque – ou le déconstruire ? Approches numériques des sources historiques, 2018, Anne Baillot, Le Mans Université