Request Correction
Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.
Correction Guidelines
- Click the edit button next to a field to report a correction.
- Fill in the suggested correction value for each field you want to correct.
- Provide your name and email so we can contact you if needed.
Paper Information
A Dataset for Evaluating ASR on Specialized Vocabulary
Paper Fields
Click the edit button next to a field to report a correction.
A Dataset for Evaluating ASR on Specialized Vocabulary
Evaluating the ability of Automatic Speech Recognition (ASR) models to transcribe specialized vocabulary remains a persistent challenge, as standard datasets predominantly feature common words and thus obscure weaknesses on rare or out-of-vocabulary (OOV) terms. To address this limitation, we introduce a linguistically curated bilingual dataset (English and Portuguese) comprising 13,846 utterances (18.7 hours) distributed across synthetic and literature-derived subsets, with OOV rates reaching up to 100%. We further propose a diagnostic evaluation framework that partitions recognition performance into Biased Word Error Rate (B-WER), targeting domain-specific jargon, and Unbiased Word Error Rate (U-WER), focusing on general vocabulary. Baseline evaluations using Whisper models (medium, large-v3, and large-v3-turbo) confirm the necessity of this framework. On the most challenging datasets, B-WER reaches 0.88–0.90, whereas U-WER remains as low as 0.06–0.19, demonstrating that conventional WER masks critical failure modes in jargon recognition. Additionally, an oracle upper bound experiment shows that providing correct jargon via prompting reduces B-WER by 0.50–0.70 absolute, quantifying the considerable potential for contextual biasing. We release the datasets and evaluation scripts as a reproducible benchmark to foster research on domain-aware contextual biasing and OOV handling in ASR systems.
Authors
Expand an author to correct their information. Use the remove button to request author removal, or add a new author.
PDF Attachment
You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.
Your Information
Author Declaration *
Select at least one field to correct using the edit buttons above.