Back to Main Conference 2026
LREC 2026main

Echoes of the Troubadours: A Corpus of Troubadour Poetry for Stylometric Analysis and Authorship Attribution

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/5kd5docuu6qb

Abstract

We present TrobaCor, a curated corpus of medieval troubadour poetry, which comprises 1668 unique Old Occitan texts by a large variety of authors. Clustering and stylometric experiments show that we can accurately model authorial style beyond topical content, even though formulaic or topically diverse genres remain challenging. Furthermore, we can model and detect traces of an author’s stylistic "DNA" even in short-form collaborative poetry, offering a uniquely fine-grained perspective in the field. In addition, we provide self-organizing map visualizations in order to provide an interpretable view of stylistic patterns across authors. TrobaCor is publicly released to support reproducible research in NLP and digital humanities on this low-resource historical corpus.

Details

Paper ID
lrec2026-main-069
Pages
pp. 905-918
BibKey
langhe-etal-2026-echoes
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • LL

    Loic De Langhe

  • OC

    Orphee De Clercq

  • VH

    Veronique Hoste

Links