A Shoal of Voices: Parallel Read Speech from Professional Swedish Narrators
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
We present a shoal of voices in Storspigg–TBI, a legally cleared, professionally recorded Swedish speech corpus derived from talking-book production at the Swedish Agency for Accessible Media (MTM). The corpus contains 1 000 information messages read by 99 narrators under controlled studio conditions. The material has undergone full legal assessment and a three-sweep adoption process ensuring provenance, FAIR/FACT compliance, and reproducibility in collaboration with the national research infrastructure Språkbanken Tal. The paper describes the legal framework, data-selection and curation pipeline, as well as initial automatic transcription using Swedish Whisper and wav2vec 2.0 models. The resulting corpus provides a high-quality reference resource for speech science and technology, supporting research on inter-speaker variation, prosody, and evaluation under consistent acoustic and linguistic conditions.