HomeLREC 2026WorkshopsNAKBANLPlrec2026-ws-nakbanlp-26
Back to NAKBANLP 2026
LREC 2026workshop

Al-Warraq at AR-MS NAKBA-NLP 2026: Adapting Vision-Language and Transformer Models for Automatic Manuscript OCR/HTR

Proceedings of the 2nd International Workshop on Nakba Narratives as Language Resources @ LREC 2026

DOI:10.63317/2fv6385csb36

Abstract

We present our submission to the NAKBA NLP 2026 Automatic Manuscript OCR/HTR shared task on Arabic manuscripts. The task aims to transcribe manuscript line images into machine-readable Arabic text. Our approach followed an iterative pipeline including model selection, training, error analysis, test-time augmentation, and postprocessing. After evaluating several OCR/HTR models, we selected and trained the most suitable model on the provided manuscript line images and transcriptions. Error analysis showed better character-level performance than word-level performance, which motivated the use of test-time augmentation and text cleaning to improve robustness. The final system achieved a CER of 0.1142 and a WER of 0.378, placing fifth in the shared task. These results show that simple but targeted improvements can support effective Arabic manuscript transcription.

Details

Paper ID
lrec2026-ws-nakbanlp-26
Pages
pp. 191-195
BibKey
youssef-etal-2026-al
Editors
Mustafa Jarrar, Mo El-Haj, Amal Haddad, Serin Atiani, Shadi Abudalfa, Terry Regier, Paul Rayson, Khalil Sima’an, Camille Mansour
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 2nd International Workshop on Nakba Narratives as Language Resources @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • AY

    Ahmad Edris Youssef

  • AF

    Aya Hafiz Faris

  • AH

    Alhasan Hamood

  • ZK

    Zainab Kamil

  • JA

    Jana Alqasem

  • Sh

    SARA Ali hamed"

Links