Back to Main Conference 2026
LREC 2026main

The Added Value of Metadata and Annotations: Evidence from Two Large-Scale, Naturalistic Corpus Studies

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4c4triganae3

Abstract

This paper presents two case studies that highlight both the challenges and benefits of working with large-scale, naturalistic phonetic data. Our aim is to encourage researchers not to shy away from phonetic data found “in the wild”, even when such data are messy, noisy, or incomplete – because they can yield robust, novel insights beyond the reach of controlled laboratory studies. We focus on challenges that are endemic to large corpora, including degraded audio quality, sparse or inconsistent annotations, and missing speaker metadata. By comparing two corpus-based studies that diverge in methodology and statistical design, we show how different approaches can mitigate these limitations while still extracting meaningful patterns.

Details

Paper ID
lrec2026-main-455
Pages
pp. 5767-5775
BibKey
popescu-etal-2026-added
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • AP

    Anisia Popescu

  • JC

    Johanna Cronenberg

  • IV

    Ioana Vasilescu

  • IC

    Ioana Chitoran

  • LL

    Lori Lamel

  • MA

    Martine Adda-Decker

Links