Not Gemma at AR-MS NakbaNLP 2026: Mubsir OCR: End-to-End Recognition of Arabic Handwritten Text
Proceedings of the 2nd International Workshop on Nakba Narratives as Language Resources @ LREC 2026
Abstract
Historical Arabic handwritten OCR is difficult because of cursive script, fine diacritics, mixed numerals, and degraded media; classical segmentation pipelines compound errors, whereas end-to-end vision-language models can adapt when fine-tuned on in-domain data. We present Mubsir OCR, a systematic evaluation on the NAKBA dataset: an annotated set (15,962 training line crops and 2,095 val lines with ground truth, used for all nine experiments) and a separate blind AR-MS (Subtask 2) set (2,671 images; scores only via official submission). We compare external vs. in-house VLMs (Qwen2.5-VL 3B, Qwen3-VL-4B-Instruct, Gemma3), inference backends (vLLM/bf16 vs. HuggingFace/bf16), training length (16 vs. 32 epochs), and test-time preprocessing (CLAHE+unsharp). Best on the annotated val set: 8.59% CER / 25.87% WER (HuggingFace bf16); the same configuration attains 11.00% CER / 31.26% WER on the blind set. Domain-specific fine-tuning beats general-purpose checkpoints; preprocessing helps only marginally and is not recommended without train-time augmentation.