Back to Main Conference 2024
LREC-COLING 2024main

Collecting Linguistic Resources for Assessing Children’s Pronunciation of Nordic Languages

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/3iqgkmrkxnwd

Abstract

This paper reports on the experience collecting a number of corpora of Nordic languages spoken by children. The aim of the data collection is providing annotated data to develop and evaluate computer assisted pronunciation assessment systems both for non-native children learning a Nordic language (L2) and for L1 children with speech sound disorder (SSD). The paper presents the challenges encountered recording and annotating data for Finnish, Swedish and Norwegian, as well as the ethical considerations related with making this data publicly available. We hope that sharing this experience will encourage others to collect similar data for other languages. Of the different data collections, we were able to make the Norwegian corpus publicly available in the hope that it will serve as a reference in pronunciation assessment research.

Details

Paper ID
lrec2024-main-0313
Pages
pp. 3529-3537
BibKey
olstad-etal-2024-collecting
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • AO

    Anne Marte Haug Olstad

  • AS

    Anna Smolander

  • SS

    Sofia Strömbergsson

  • SY

    Sari Ylinen

  • ML

    Minna Lehtonen

  • MK

    Mikko Kurimo

  • YG

    Yaroslav Getman

  • TG

    Tamás Grósz

  • XC

    Xinwei Cao

  • TS

    Torbjørn Svendsen

  • GS

    Giampiero Salvi

Links