Back to Main Conference 2026
LREC 2026main

Deep Learning-Based Multi-Aspect Pronunciation Assessment for Individuals with Down Syndrome

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4g3dwy2kmira

Abstract

This paper explores the use of an annotated speech corpus to assess multiple dimensions of speech quality—particularly phonetic, fluency and prosody—in individuals with Down syndrome, with the aim of informing the development of automated assessment tools. We conducted a series of experiments using the GOPT model, together with representations extracted from fine-tuning Wav2Vec models focused on phoneme classification. Model predictions were compared against expert annotations from a speech-language pathologist using Pearson correlation. Results demonstrate significant improvements over prior work, with correlations up to 0.49 in certain aspects, particularly for phonetic and fluency dimensions, while prosody remained more challenging to model. The study highlights the potential of Transformer-based architectures for atypical speech assessment and underscores the challenges inherent in assessing atypical speech, particularly due to variability linked to specific disfluency types.

Details

Paper ID
lrec2026-main-667
Pages
pp. 8455-8464
BibKey
fernndezgarca-etal-2026-deep
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • DF

    David Fernández-García

  • CG

    César González-Ferreras

  • VC

    Valentín Cardeñoso-Payo

  • MC

    Mario Corrales-Astorgano

Links