Less can be More: Towards a Parameter-Efficient Fine-Tuning of Wav2Vec2 XLSR for Low-Resource Cape Verdean Creole ASR

Proceedings of Resources for African Indigenous Languages (RAIL) 2026 @ LREC 2026

Abstract

Automatic Speech Recognition (ASR) for low-resource languages remains challenging due to limited annotated data and high linguistic variability. In this work, we investigate parameter-efficient fine-tuning strategies for Cape Verdean Creole ASR using the Wav2Vec 2.0 XLSR model. We evaluate the impact of structured layer freezing on model performance, training stability, and computational efficiency. Experiments conducted on a newly curated Santiago-dialect dataset show that full fine-tuning achieves the best absolute performance (WER 0.212, CER 0.120). However, several freezing configurations achieve comparable recognition performance while substantially reducing the number of trainable parameters and exhibiting more stable convergence. These results highlight a trade-off between adaptability and efficiency, showing that selective freezing can serve as an effective regularization strategy in low-resource settings. This work provides practical insights into parameter-efficient adaptation for under-resourced Creole languages.