HomeLREC 2026WorkshopsDIALRESlrec2026-ws-dialres-19
Back to DIALRES 2026
LREC 2026workshop

South Tyrolean Dialect-to-Standard Speech Translation: A Resource

Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective

DOI:10.63317/3visgk9f8s7z

Abstract

This paper presents a developing oral resource for South Tyrolean, a German dialect spoken in Northern Italy. The dialect is ubiquitous in spoken communication but lacks a standardised orthography. In this context, strict transcription into dialect is of limited to no utility to the local community. Instead, there is a distinct and strong demand for technology capable of directly translating spoken dialect into Standard German. To address this specific need, we introduce a dynamic, incrementally growing dataset designed to fine-tune ASR models for this translation task. Our corpus aggregates diverse sources, including media and research interviews, totalling over 13 hours of aligned audio. We describe a collaborative workflow where community partners contribute audio archives in exchange for automated transcriptions, creating a virtuous cycle of data improvement. Additionally, we detail our iterative model fine-tuning strategy, data collection challenges and the resulting improvements in model performance.

Details

Paper ID
lrec2026-ws-dialres-19
Pages
pp. 188-194
BibKey
franzini-etal-2026-south
Editors
Antonis Anastasopoulos, Stella Markantonatou, Angela Ralli, Marcos Zampieri, Stavros Bompolas, Vivian Stamou
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • GF

    Greta H. Franzini

  • LD

    Luca Ducceschi

Links