HomeLREC 2026WorkshopsOSACTlrec2026-ws-osact-30
Back to OSACT 2026
LREC 2026workshop

TantaArabNLP at KSAA-2026 Task 2: Adapting CATT-Whisper for Arabic Speech Dictation with Automatic Diacritization

The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) with 5 Shared Tasks

DOI:10.63317/46cm97fcekow

Abstract

We present our submission to the KSAA-2026 Shared Task (Subtask 2): Automatic Diacritization of Speech Dictation. Building upon the CATT-Whisper multimodal architecture, which fuses representations from a pre-trained CATT text encoder and the Whisper speech encoder, we fine-tune the model end-to-end on the official shared task training data. To further enhance performance on speech-dictated Arabic text, we apply careful post-processing to the model outputs. Our best submission achieves a Diacritic Error Rate (DER) of 7.04, a Word Error Rate (WER) of 24.39, and a Sentence Error Rate (SER) of 71.65 on the hidden test set, securing 2nd place in the competition. These results demonstrate the effectiveness of adapting a strong multimodal baseline to the speech-aware diacritization setting and highlight the value of task-specific fine-tuning and output refinement for bridging the gap between spoken transcripts and fully diacritized Arabic text.

Details

Paper ID
lrec2026-ws-osact-30
Pages
pp. 229-233
BibKey
esmaeil-etal-2026-tantaarabnlp
Editors
Hend Al-Khalifa, Mo El-Haj, Saad Ezzini
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) with 5 Shared Tasks
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • NE

    Nada Adel Esmaeil

  • RE

    Reda M. Elbasiony

  • MF

    Mohamed T. Faheem

Links