Back to Main Conference 2024
LREC-COLING 2024main

An Evaluation of Croatian ASR Models for Čakavian Transcription

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/5h3kqeaojrye

Abstract

To assist in the documentation of Čakavian, an endangered language variety closely related to Croatian, we test four currently available ASR models that are trained with Croatian data and assess their performance in the transcription of Čakavian audio data. We compare the models’ word error rates, analyze the word-level error types, and showcase the most frequent Deletion and Substitution errors. The evaluation results indicate that the best-performing system for transcribing Čakavian was a CTC-based variant of the Conformer model.

Details

Paper ID
lrec2024-main-0098
Pages
pp. 1098-1104
BibKey
zhang-etal-2024-evaluation
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • SZ

    Shulin Zhang

  • JH

    John Hale

  • MR

    Margaret Renwick

  • ZV

    Zvjezdana Vrzić

  • KL

    Keith Langston

Links