MetaCORA: A Meta-Learned Curriculum for Adversarial and Contrastive Robustness in Speech Recognition
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
Pre-trained speech models like Whisper demonstrate impressive performance under ideal conditions but still face robustness challenges in low-resource language scenarios. We introduce Meta Curriculum Optimization for Robust ASR (MetaCORA), a novel meta-curriculum adaptive framework that improves speech recognition for low-resource Hong Kong Cantonese by integrating adversarial training with feature contrastive learning. Our approach dynamically adjusts three critical hyperparameters: adversarial perturbation magnitude, optimization step size, and contrastive learning temperature, allowing the model to adapt to varying training difficulties throughout the learning process. Unlike traditional meta-learning approaches, our framework does not rely on end-to-end differentiability but instead utilizes validation performance as a signal to guide hyperparameter adjustments. Experimental results demonstrate that our approach achieves lower WER than standard Whisper fine-tuning, commercial speech recognition systems, and LLM-based methods. Ablation studies confirm the necessity of each component, as removing any single element leads to a measurable drop in performance. The model also exhibits robustness under noisy conditions, achieving consistently lower WER than baseline systems. Further analysis shows that MetaCORA effectively compresses the distance between adversarial feature representations while maintaining well-separated class boundaries in the embedding space, providing a mechanistic explanation for its improvement.