Mathematics Colloquium



Department of Mathematics
University of Nebraska at Omaha


WHEN:
Thursday, January 29, 2009 at 2:30 PM

WHERE:
Durham Science Center 109

WHAT:


Dr. Mikhail Malyutov

Northeastern University


will give a talk on

The MDL-Principle in Attributing Authorship of Texts


ABSTRACT:
We study a new context-free computationally simple stylometry - based attributer: the sliced conditional compression complexity (SCCC) of literary texts which is inspired by the incomputable Kolmogorov conditional complexity. Whereas other stylometry tools can occasionally almost coincide for different authors, our CCC-attributer introduced by author in 2005 is asymptotically strictly minimal for the true author, if the query texts are sufficiently large but much less than the training texts, universal compressor is good and sampling bias is avoided. This classifier simplifies the Ryabko and Astola (2006) homogeneity test (partly based on compression) under insignificant difference of unconditional complexities of training and query texts which can be verified using its asymptotic normality proved by Szpankowski in 2001 and elsewhere for IID and Markov sources and normal plots for real literary texts. The asymptotic SCCC study is complemented by our attributing the Federalist papers (Madison vs. Hamilton) agreeing with previous results obtained with various classifiers, we also showed a significant (beyond any doubt) mean SCCC-difference between two translations of Shakespeare sonnets into Russian and between the two parts of M. Sholokhov's early short novel and discovered intriguing SCCC-relations between certain Elizabethan poems. At the same time, two different S. Brodsky's novels deliberately written in different styles showed insignificant mean CCC-difference.


[Back]    Back to the Mathematics Colloquium Page