Back to Main Conference 2026
LREC 2026main

spINAch: A Diachronic Corpus of French Broadcast Speech Controlled for Speakers' Age and Gender

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/58hgwvgvkz6g

Abstract

We present spINAch, a large diachronic corpus of French speech from radio and television archives, balanced by speakers’ gender, age (20-95 years old), and spanning 60 years from 1955 to 2015. The dataset includes over 320 hours of recordings from more than two thousand speakers. The methodology for building the corpus is described, focusing on the quality of collected samples in acoustic terms. The data were automatically transcribed and phonetically aligned to allow studies at a phonemic level. More than 3 million oral vowels have been analyzed to propose their fundamental frequency and formants. The corpus, available to the community for research purposes, is valuable for describing the evolution of Parisian French through the representation of gender and age. The presented analyses also demonstrate that the diachronic nature of the corpus allows the observation of various phonetic phenomena, such as the evolution of voice pitch over time (which does not differ by gender in our data) and the neutralization of the /a/-/ɑ/ opposition in Parisian French during this period.

Details

Paper ID
lrec2026-main-459
Pages
pp. 5805-5820
BibKey
devauchelle-etal-2026-spinach
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • SD

    Simon Devauchelle

  • DD

    David Doukhan

  • RU

    Remi Uro

  • LO

    Lucas Ondel

  • VP

    Valentin Pelloin

  • OI

    Olympia Imbert-Brégégère

  • VL

    Véronique Lefort

  • KP

    Kévin Picard

  • ES

    Emeline Seignobos

  • AR

    Albert Rilliard

Links