Back to Main Conference 2026
LREC 2026main

WikIPA: Integrating WikiPron and Lingua Libre for Multilingual IPA Transcription

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/2am4iw3bfhjb

Abstract

We present WikIPA, a new multilingual benchmark designed for automatic speech-to-IPA (STIPA) transcription. By integrating human-curated IPA transcriptions from WikiPron with spoken recordings and metadata from Lingua Libre, WikIPA connects textual phonetic representations with real speech across 78 languages. This open resource supports both broad (phonemic) and narrow (phonetic) transcription tasks, enabling fine-grained evaluation of multilingual phonetic transcription systems. WikIPA provides over 289,000 paired entries and serves as a large-scale foundation for STIPA. We benchmark several state-of-the-art STIPA systems, including MultIPA, (Lo)WhIPA, and ZIPA. Results show that ZIPA achieves the lowest mean error rates across most languages, outperforming Whisper- and Wav2Vec-based baselines. Error analyses reveal that remaining discrepancies largely stem from minor phonetic confusions rather than complete transcription failures, emphasizing the challenge of modeling fine-grained articulatory variation. WikIPA thus establishes the first systematic, multilingual evaluation framework for speech-to-IPA transcription and highlights the potential of combining open, community-driven resources to advance STIPA evaluation.

Details

Paper ID
lrec2026-main-668
Pages
pp. 8465-8475
BibKey
cassotti-etal-2026-wikipa
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • PC

    Pierluigi Cassotti

  • JS

    Jacob Lee Suchardt

  • DC

    Domenico De Cristofaro

Links