HomeLREC 2026WorkshopsLLMS4SSHlrec2026-ws-llms4ssh-12
Back to LLMS4SSH 2026
LREC 2026workshop

Automatic Metrical Scansion of Poetry in a Low-Resource Setting

Proceedings of Shaping Multilingual, Multimodal AI for the Social Sciences and Humanities (LLMs4SSH) @ LREC 2026

DOI:10.63317/4pzd6u7388jm

Abstract

We present the first neural systems for automatic metrical scansion of poetry in Galician, a Romance language close to Portuguese and Spanish. The task is threefold: First, identifying metrical syllables based on lexical ones; both syllable series may differ given metrical licenses modifying a line’s syllable structure to enable stress-related rhythms. Second, identifying stress patterns, and third identifying the metrical syllable count, based on stressed positions. We manually annotated a corpus of 4,287 examples, a first in Galician, and fine-tuned an 8B-parameter LLM specialized in Galician and Portuguese, and two encoder–decoder models: ByT5, a token-free byte-to-byte model, and the multilingual mT5, which includes Galician. We also tested our recent symbolic scansion system. Several fine-tuning setups reached exact per-line accuracy above 90% on our test-set at all three scansion subtasks, using orthographic syllables with explicit stress marks as input. Encoder–decoders performed better than the LLM. The token-free ByT5 was best, particularly when adding the two surrounding lines to the input. The symbolic system (89.9% acc.) managed rare metaplasms infrequent in training data better than the neural ones, and the approaches can be seen as complementary.

Details

Paper ID
lrec2026-ws-llms4ssh-12
Pages
pp. 114-125
BibKey
ruizfabo-etal-2026-automatic
Editors
Arturo Montejo-Raez, Cristina Grisot, Joanna Blochowiak, Nikola Ljubešić, Elena Battaner, German Rigau
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of Shaping Multilingual, Multimodal AI for the Social Sciences and Humanities (LLMs4SSH) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • PR

    Pablo Ruiz Fabo

  • AA

    Anxo Alonso Pérez

  • PR

    Pablo Rodríguez Fernández

  • PG

    Pablo Gamallo

Links