SOMVOICE: A First Dataset to Study the Effects of Sleep Deprivation on Voice Characteristics of Healthy French Speakers
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
Excessive sleepiness is a significant public health issue and a critical personal health indicator associated with various disorders. Given its high prevalence in the general population, clinicians need tools to regularly measure patients’ sleepiness levels in natural settings, such as automatic speech analysis. In this article, we introduce the SOMVOICE corpus, the first French corpus containing read-speech recordings from the same participants either after a normal night or after a night of total sleep deprivation. Participants were included according to strict inclusion and exclusion criteria based on both medical characteristics and reading proficiency. The recordings were labelled with both objective and subjective measures of sleepiness, as well as fatigue and anxiety. After introducing the data-collection methodology, we use linear mixed models to conduct a preliminary investigation of the effect of total sleep deprivation on the collected sleepiness-related measures and on participants’ reading behaviour. Doing so, we found that sleep deprivation strongly influences objective and subjective sleepiness measurements as well as fatigue self-reports, but has a lesser effect on anxiety. Regarding reading behaviour, sleep deprivation is associated with a lower speech rate (duration of the recordings and phoneme rate) and more pauses (number of pauses and pause ratio)