Misraj AI at AR-MS NAKBA-NLP 2026: A State-of-the-Art VLM in Arabic Handwritten Text Recognition

Proceedings of the 2nd International Workshop on Nakba Narratives as Language Resources @ LREC 2026

Abstract

Handwritten Text Recognition (HTR) for Arabic presents unique challenges due to the script’s cursive nature, varying writer styles, and morphological complexity. While modern Vision-Language Models (VLMs) have significantly advanced document parsing, their direct application to highly specific cursive domains requires strategic adaptation. This paper details our submission to the Nakba OCR competition, which adapts a 3B-parameter VLM to recognize historical Arabic manuscripts. We employ a progressive training pipeline that utilizes domain-matched data augmentation to bridge the gap between standard printed Arabic OCR and historical handwritten manuscripts. Moving beyond standard decoder-only Supervised Fine-Tuning (SFT), we fine-tune the entire encoder-decoder architecture using differential learning rates. This approach, followed by a final checkpoint merge, allows the model to better resolve the fine visual details of cursive Arabic script. Our final unified model (submitted under the team name Misraj AI) establishes a new state-of-the-art (SOTA) on the Nakba dataset, achieving a Word Er- ror Rate (WER) of 0.24 and a Character Error Rate (CER) of 0.08, and officially securing first place on the leaderboard.