“Multi-Authorship of the Scriptores
Historiae Augustae: How the Use of Subsets Can Win or Lose the
Case.”
Penelope
J.
Gurney
University of Ottawa
pgurney@uottawa.ca
Lyman
W.
Gurney
Themis Research Corporation
This paper describes recent research that we have carried out on the
arbitration of disputed authorship attribution. In an earlier study, we
lemmatized and disambiguated fully the thirty biographies of the Scriptores Historiae Augustae, and the results of the
analysis of that work have provided us with a benchmark and control group
for further analysis of two major related questions:
- * Can statistical analysis be based successfully upon the argument that "... stylometric theory posits the existence of homogeneity within a single work by a single author." (Ledger & Merriam, 1994).
- * Does such homogeneity imply that stylometric analysis of a subset can be equivalent to the analysis of the whole?
Results
The first results demonstrated that the structure of a work, something which a reader recognizes subjectively, appears to be reflected in the stylometric analysis: for the various subsets of a work are not necessarily homogeneous; and the various quarters appear to be incompatible, to various degrees, one with another. For example, the seven works by Spartianus, studied as quarter-works, became twenty-eight works, which can be analyzed in the same manner as the original thirty. The only measure, however, which gives results even remotely comparable to those of the original text, is the analysis of the usage of conjunctions, in which the set of final quarters of the seven texts stands off very clearly from the others. This demonstrates a major drawback to the use of the subsets, however, since it is clear that the only means by which the results of the final quarter can be seen to be relevant and important, is by use of the fully-disambiguated original text as control: the subsets cannot be used as controls for themselves. In the half-text analyses, one test, that of the 15 most frequently used lemmas which appear in each of the 30 segments, gave almost complete differentiation; but most of the other tests gave very indifferent results. In the analysis of function words in the 500-word segments, there was good differentiation of only three authors: Capitolinus, Spartianus, and Lampridius. The various quarter-texts provided mixed results: on one test, quarter #1 provided some differentiation of specific authors; whereas on a different test, and different quarter, other authors were separated. Hence, it is clear that no one test on subsets can provide clear discrimination of authorship, although many do appear to demonstrate a degree of multiple authorship.Discussion of Results
As the various analyses were carried out, several points became clear:- * The larger the size of the subset in relation to the full text, the better the result;
- * Any increases above the minimum possible number of key words used to generate full differentiation of authorship in the full texts produced only slight increases in the degree of differentiation. Such increases, when applied to the subsets, however, generated large increases in differentiating power, but without reaching anything near the degree of resolution of the full texts; and in no case was the discrimination clear-cut.
- * There appears to be no homogeneity within any specific work of an individual author; rather, each work is suffused with sufficient variety internally as to maintain the interest of the reader. This variety may be the fundamental cause for analysis of small subsets not to be equivalent to analysis of the whole.
Bibliography
J. F. Burrows D. H. Craig. “Lyrical Drama and the "Turbid Mountebanks": Styles of
Dialogue in Romantic and Renaissance Tragedy.” Computers and the Humanities. 1994. 28: 63-86.
A. Ellegård. A Statistical Method for Determining Authorship. The "Junius" Letters, 1769-1772. Acta Universitatis Gothoburgensis. Göteborg: , 1962.
P. J. Gurney L. W. Gurney. “Enhanced Content-Analysis of Inflected Languages
Through a System of Computer Assisted Lemmatization.” Presented at 'Consensus ex Machina?'. ALLC/ACH, Paris, 19-23 April. : , 1994.
P. J. Gurney L. W. Gurney. “Disputed Authorship: 30 Biographies and Six Reputed
Authors. A New Analysis by Full-Text Lemmatization of the Historia Augusta.” Presented at ALLC/ACH '96. Bergen, June 25-29. : , 1996.
T. Janson. “The Problems of Measuring Sentence-Length in Classical
Texts.” Studia Linguistica. 1964. 18: 26-36.
A. Kenney. A Stylometric Study of the New Testament. : Oxford University Press, 1986.
G. R. Ledger T. V. N. Merriam. “Shakespeare, Fletcher, and the Two Noble
Kinsmen.” Literary and Linguistic Computing. 1994. 9: 235-248.