HomeLREC 2026WorkshopsNAKBANLPlrec2026-ws-nakbanlp-43
Back to NAKBANLP 2026
LREC 2026workshop

Latent Narratives at AR-MS NakbaNLP 2026: Reducing Character Errors in Arabic Manuscript Transcription: A CER Oriented System

Proceedings of the 2nd International Workshop on Nakba Narratives as Language Resources @ LREC 2026

DOI:10.63317/3gkghprpz2sb

Abstract

Historic Arabic handwritten texts present significant challenges due to varied handwriting styles, cursive structure, diverse diacritics, and inconsistent character and word sizes. In this work, we introduce Historic-Arabic-OCR, a vision-language OCR system built upon Qari-OCR, which itself is based on Qwen2-VL-2B-Instruct, and further fine- tuned using Low-Rank Adaptation (LoRA) for Arabic manuscript transcription. The proposed approach incorporates contrast enhancement using CLAHE and deterministic decoding strategies to reduce character-level errors. Our model achieves competitive performance, with a Word Error Rate (WER) of 0.28 and a Character Error Rate (CER) of 0.10 on historical Arabic texts, including low-resolution images. The final submitted system uses CLAHE prepro- cessing with deterministic greedy decoding to minimize character-level errors. Keywords: Arabic OCR, Vision-Language Models, Qwen2-VL, LoRA, CER Optimization

Details

Paper ID
lrec2026-ws-nakbanlp-43
Pages
pp. 275-279
BibKey
aldesouky-2026-latent
Editors
Mustafa Jarrar, Mo El-Haj, Amal Haddad, Serin Atiani, Shadi Abudalfa, Terry Regier, Paul Rayson, Khalil Sima’an, Camille Mansour
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 2nd International Workshop on Nakba Narratives as Language Resources @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • SA

    Sara Abdulmonem Al desouky

Links