Echoes of the Troubadours: A Corpus of Troubadour Poetry for Stylometric Analysis and Authorship Attribution
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
We present TrobaCor, a curated corpus of medieval troubadour poetry, which comprises 1668 unique Old Occitan texts by a large variety of authors. Clustering and stylometric experiments show that we can accurately model authorial style beyond topical content, even though formulaic or topically diverse genres remain challenging. Furthermore, we can model and detect traces of an author’s stylistic "DNA" even in short-form collaborative poetry, offering a uniquely fine-grained perspective in the field. In addition, we provide self-organizing map visualizations in order to provide an interpretable view of stylistic patterns across authors. TrobaCor is publicly released to support reproducible research in NLP and digital humanities on this low-resource historical corpus.