HomeLREC 2026WorkshopsNAKBANLPlrec2026-ws-nakbanlp-07
Back to NAKBANLP 2026
LREC 2026workshop

NAKBA NLP 2026: Shared Task on Arabic Handwritten Manuscript Understanding (Palestine Memory–Omar Al-Saleh Memoir)

Proceedings of the 2nd International Workshop on Nakba Narratives as Language Resources @ LREC 2026

DOI:10.63317/3iakmct86er7

Abstract

Transcribing historical Arabic manuscripts into machine-readable text is essential for preserving cultural heritage and enabling computational research in the humanities, yet it remains a challenging task due to handwriting variability, page degradation, and the complexity of Arabic script. To advance research in this area, we introduce the NAKBA NLP 2026 shared task on Arabic manuscript understanding, comprising two complementary tracks: a manual transcription track, in which participating teams annotate unlabelled handwritten line images, and an automatic system track for handwritten text recognition (HTR). Both tracks use the Omar Al-Saleh Memoir Collection, a corpus of 6,395 scanned pages and approximately 1.6 million words, written between 1951 and 1965 and provided by the Palestine Memory Project. The dataset, evaluation scripts, and system outputs are publicly available.[1] In Subtask 1 (Transcription Track), three teams contributed manual line-level transcriptions; evaluation on hidden ground-truth samples yielded Character Error Rates (CER) between 0.06 and 0.11. In Subtask 2 (Systems Track), seven teams submitted HTR systems. The top-performing system, by Misraj AI, achieved a corpus-level CER of 0.079 and Word Error Rate (WER) of 0.244, outperforming the organiser baseline (CER 0.368, WER 0.691). Rankings shift between corpus-level and per-line evaluation: the 3reeq team achieved the lowest per-line CER (0.082). All contributed transcriptions and system outputs are released under CC-BY-4.0 to support continued research in Arabic manuscript recognition and digital humanities. [1] https://acr.ps/1L9BaeY

Details

Paper ID
lrec2026-ws-nakbanlp-07
Pages
pp. 70-79
BibKey
hamoud-etal-2026-nakba
Editors
Mustafa Jarrar, Mo El-Haj, Amal Haddad, Serin Atiani, Shadi Abudalfa, Terry Regier, Paul Rayson, Khalil Sima’an, Camille Mansour
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 2nd International Workshop on Nakba Narratives as Language Resources @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • HH

    Hadi Hamoud

  • AC

    Ahmad Ali Chamseddine

  • BS

    Bilal Shalash

  • FB

    Firas Ben Abid

  • MJ

    Mustafa Jarrar

  • CA

    Chadi Abou Chakra

  • BG

    Bernard Ghanem

  • FZ

    Fadi A. Zaraket

Links