Eraserhead at OSACT7 Shared Task: ASR Consistency Filtering and Speaker-Adaptive Post-Processing for Arabic Speech Diacritization

The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) with 5 Shared Tasks

Abstract

Arabic speech diacritization is the task of restoring short vowel marks to undiacritized text derived from speech input. It remains difficult because ASR output can be noisy, dialectal variation is substantial, and speakers often differ in how they realize word-final diacritics. In this paper, we describe our submission to Task 2 of the KSAA-2026 Shared Task on Arabic Speech Dictation with Automatic Diacritization, where our system ranked 4th on the official leaderboard. Our approach builds on a pretrained ASR-aware diacritization model and adds three components: ASR Consistency Filtering, confidence-based ensembling of three checkpoints, and speaker-adaptive post-processing specifically for word-final diacritics. Rather than discarding problematic data, our filtering strategy replaces unreliable ASR transcripts with the undiacritized gold text rather than removing training examples, which makes training more stable. On the official test set, our system achieved a Diacritic Error Rate (DER) of 8.23, a Word Error Rate (WER) of 30.37, and a Sentence Error Rate (SER) of 80.79 under the With Case Endings (WCE), Including No Diacritic (Incl. 0) evaluation setting. It also outperformed the organizers’ fine-tuned Text+ASR baseline in three of the four main evaluation settings.