Back to Main Conference 2026
LREC 2026main

Object Realisation in Spoken Guadeloupan French: Evaluating NLP Models for an Under-Resourced Variety

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/29g57awqis6v

Abstract

This paper contributes to the evaluation of natural language parsing models applied to colloquial speech in lesser studied varieties of a language. We are reporting on the performance of speech recognition and of universal dependency (UD) parsing models in a radio corpus of colloquial French spoken in Guadaloupe (GuaFr), which is in contact with a typologically distant language, French-based Guadaloupean Creole (GuaCr). The corpus poses specific challenges due to phonetic and syntactic specifics of GuaFr, as well as the occurrence of code switching to GuaCr. We show weakening the ASR decoder’s language-model (LM) in various parameters avoids hallucination of null objects, which have been described as typical for spoken GuaFr, but not of non-standard object clitic positioning. For UD parsing, we investigate utterance segmentation as the primary lever to affect model performance and compare different segmentation sources (ASR punctuation, manual chunking, UD parser tokenization) and their combination. We highlight both strengths and pitfalls of the models, again focussing on the expression of syntactic objects.

Details

Paper ID
lrec2026-main-180
Pages
pp. 2299-2308
BibKey
npoles-etal-2026-object
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • AN

    Amalia Canes Nápoles

  • SR

    Sophie Repp

Links