HomeLREC 2026WorkshopsCAWLlrec2026-ws-cawl-11
Back to CAWL 2026
LREC 2026workshop

Inverse Text Normalization for Arabic Numbers in Streaming ASR

Proceedings of the Third Workshop on Computation and Written Language (CAWL 2026) @ LREC 2026

DOI:10.63317/4wy5qhn7npqa

Abstract

Streaming multilingual speech recognition benefits from unified systems that produce numbers in their written form ‘34’ rather than their spoken form ‘thirty-four’. By generating digits directly, these systems eliminate the post-processing latency inherent in cascaded architectures that require a separate inverse text normalization (ITN) step. Arabic presents a formidable challenge for ITN; the system must not only determine the correct numerical value but also navigate complex rules for gender, number, and case marking that are determined by the counted noun. For instance, the digit ‘7’ (as in ‘47’) exhibits gender polarity: it must take a masculine form if modifying a feminine noun (e.g., Halala) and a feminine form if modifying a masculine noun (e.g., Riyal). While Arabic dialects typically exhibit simplified numeral systems by omitting case and gender markers, they vary significantly in verbalization patterns. This study explores the efficacy of a unified streaming Automatic Speech Recognition (ASR) system with integrated ITN features, comparing it against a traditional cascaded approach utilizing a post-processing rule-based ITN module. We utilize a FastConformer cache-aware streaming model trained on English and a diverse Arabic corpus spanning Modern Standard (MSA), dialectal, and Classical Arabic, while maintaining diacritics where contextually appropriate. We evaluate the system using Word Error Rate (WER) for ASR accuracy and exact match for ITN capability. Our results demonstrate that integrating ITN does not degrade core ASR performance and that the unified model achieves accuracy competitive with cascaded systems across Arabic variants. However, error analysis reveals that the primary failures in ITN are rooted in diacritization, gender polarity, and orthographic variation, highlighting the challenges of Arabic’s unique linguistic features in end-to-end modeling.

Details

Paper ID
lrec2026-ws-cawl-11
Pages
pp. 101-106
BibKey
albasiri-etal-2026-inverse
Editors
Kyle Gorman
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Third Workshop on Computation and Written Language (CAWL 2026) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • EA

    Enas Albasiri

  • MK

    Myungjong Kim

  • NF

    Nourchene Ferchichi

  • OO

    Oluwatobi Olabiyi

Links