Back to Home

Request Correction

Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.

Correction Guidelines

  1. Click the edit button next to a field to report a correction.
  2. Fill in the suggested correction value for each field you want to correct.
  3. Provide your name and email so we can contact you if needed.

Paper Information

lrec2026-ws-cawl-11

Inverse Text Normalization for Arabic Numbers in Streaming ASR

Paper Fields

Click the edit button next to a field to report a correction.

Title

Inverse Text Normalization for Arabic Numbers in Streaming ASR

Abstract

Streaming multilingual speech recognition benefits from unified systems that produce numbers in their written form ‘34’ rather than their spoken form ‘thirty-four’. By generating digits directly, these systems eliminate the post-processing latency inherent in cascaded architectures that require a separate inverse text normalization (ITN) step. Arabic presents a formidable challenge for ITN; the system must not only determine the correct numerical value but also navigate complex rules for gender, number, and case marking that are determined by the counted noun. For instance, the digit ‘7’ (as in ‘47’) exhibits gender polarity: it must take a masculine form if modifying a feminine noun (e.g., Halala) and a feminine form if modifying a masculine noun (e.g., Riyal). While Arabic dialects typically exhibit simplified numeral systems by omitting case and gender markers, they vary significantly in verbalization patterns. This study explores the efficacy of a unified streaming Automatic Speech Recognition (ASR) system with integrated ITN features, comparing it against a traditional cascaded approach utilizing a post-processing rule-based ITN module. We utilize a FastConformer cache-aware streaming model trained on English and a diverse Arabic corpus spanning Modern Standard (MSA), dialectal, and Classical Arabic, while maintaining diacritics where contextually appropriate. We evaluate the system using Word Error Rate (WER) for ASR accuracy and exact match for ITN capability. Our results demonstrate that integrating ITN does not degrade core ASR performance and that the unified model achieves accuracy competitive with cascaded systems across Arabic variants. However, error analysis reveals that the primary failures in ITN are rooted in diacritization, gender polarity, and orthographic variation, highlighting the challenges of Arabic’s unique linguistic features in end-to-end modeling.


Authors

Expand an author to correct their information. Use the remove button to request author removal, or add a new author.


PDF Attachment

You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.

Drag & drop a PDF here, or click to select

Your Information

Author Declaration *

Select at least one field to correct using the edit buttons above.