Back to Main Conference 2026
LREC 2026main

Supplementary Resources and Analysis for Automatic Speech Recognition Systems Trained on the Loquacious Dataset

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4zsvhm25r7zf

Abstract

The recently published Loquacious dataset aims to be a replacement for established English automatic speech recognition (ASR) datasets such as LibriSpeech or TED-Lium. The main goal of Loquacious dataset is to provide properly defined training and test partitions across many acoustic and language domains, with an open license suitable for both academia and industry. To further promote the benchmarking and usability of this new dataset, we present additional resources in the form of n-gram language models (LMs), a grapheme-to-phoneme (G2P) model and pronunciation lexica, with open and public access. Utilizing those additional resources we show experimental results across a wide range of ASR architectures with different label units and topologies. Our initial experimental results indicate that the Loquacious dataset offers a valuable study case for a variety of common challenges in ASR.

Details

Paper ID
lrec2026-main-462
Pages
pp. 5839-5848
BibKey
rossenbach-etal-2026-supplementary
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • NR

    Nick Rossenbach

  • RS

    Robin Schmitt

  • TR

    Tina Raissi

  • SB

    Simon Berger

  • LK

    Larissa Kleppel

  • RS

    Ralf Schlüter

Links