Back to Main Conference 2026
LREC 2026main

MUSCAT: MUltilingual, SCientific ConversATion Benchmark

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/2q8inmh4zmpa

Abstract

The goal of multilingual speech technology is to facilitate seamless communication between individuals speaking different languages, creating the experience as though everyone were a multilingual speaker. To create this experience, speech technology needs to address several challenges: Handling mixed multilingual input, specific vocabulary, and code-switching. However, there is currently no dataset benchmarking this situation. We propose a new benchmark to evaluate current Automatic Speech Recognition (ASR) systems, whether they are able to handle these challenges. The benchmark consists of bilingual discussions on scientific papers between multiple speakers, each conversing in a different language. We provide a standard evaluation framework, beyond Word Error Rate (WER) enabling consistent comparison of ASR performance across languages. Experimental results demonstrate that the proposed dataset is still an open challenge for state-of-the-art ASR systems. The dataset is available in https://huggingface.co/datasets/goodpiku/muscat-eval

Details

Paper ID
lrec2026-main-471
Pages
pp. 5926-5937
BibKey
sinhamahapatra-etal-2026-muscat
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • SS

    Supriti Sinhamahapatra

  • TN

    Thai-Binh Nguyen

  • YO

    Yiğit Oğuz

  • EU

    Enes Yavuz Ugan

  • JN

    Jan Niehues

  • AW

    Alexander Waibel

Links