Back to Main Conference 2026
LREC 2026main

Audio-Lyrics Alignment Dataset for Italian Arias

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4cpveetxxtmx

Abstract

Aligning song lyrics with sung audio is challenging, especially for languages and music styles where annotated datasets are scarce. We address this gap by presenting the first dataset of Italian opera arias annotated with lyrics and time-stamps per word. The dataset comprises of 24 arias drawn from well-known operas of the 18th to 20th centuries with a total audio duration of nearly two hours. We benchmark both music alignment models and speech forced alignment models and show that existing methods face significant challenges on this dataset, with performance dropping by 45% compared to other datasets. Multilingual and speech-based models exhibit relatively better performance on this dataset. We also evaluate few-shot fine-tuning of these models on the new dataset and find that, while it yields only marginal overall improvement, it produces localized gains on specific arias, suggesting that limited exposure helps the model adapt to some patterns but cannot fully overcome differences in language or musical style.

Details

Paper ID
lrec2026-main-454
Pages
pp. 5757-5766
BibKey
jajoria-etal-2026-audio
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • PJ

    Pushkar Jajoria

  • AG

    Arianna Graciotti

  • GC

    Giovanna Casali

  • JA

    Jesujoba Alabi

  • RD

    Rodolfo Delmonte

  • AP

    Angelo Pompilio

  • RT

    Rocco Tripodi

  • JM

    James McDermott

  • DK

    Dietrich Klakow

Links