Back to Main Conference 2026
LREC 2026main

ViMedCSS: A Vietnamese Medical Code-Switching Speech Dataset & Benchmark

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/58uwrquo3znb

Abstract

Code-switching (CS), which is when Vietnamese speech uses English words like drug names or procedures, is a common phenomenon in Vietnamese medical communication. This creates challenges for Automatic Speech Recognition (ASR) systems, especially in low-resource languages like Vietnamese. Current most ASR systems struggle to recognize correctly English medical terms within Vietnamese sentences, and no benchmark addresses this challenge. In this paper, we construct a 34-hour Vietnamese Medical Code-Switching Speech dataset (ViMedCSS) containing 16,576 utterances. Each utterance includes at least one English medical term drawn from a curated bilingual lexicon covering five medical topics. Using this dataset, we evaluate several state-of-the-art ASR models and examine different specific fine-tuning strategies for improving medical term recognition to investigate the best approach to solve in the dataset. Experimental results show that Vietnamese-optimized models perform better on general segments, while multilingual pretraining helps capture English insertions. The combination of both approaches yields the best balance between overall and code-switched accuracy. This work provides the first benchmark for Vietnamese medical code-switching and offers insights into effective domain adaptation for low-resource, multilingual ASR systems.

Details

Paper ID
lrec2026-main-445
Pages
pp. 5657-5665
BibKey
nguyen-etal-2026-vimedcss
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • TN

    Tung X. Nguyen

  • NV

    Nhu Vo

  • GN

    Giang Son Nguyen

  • DH

    Duy Mai Hoang

  • CH

    Chien Dinh Huynh

  • IU

    Inigo Jauregi Unanue

  • MP

    Massimo Piccardi

  • WB

    Wray Buntine

  • DL

    Dung D. Le

Links