Request Correction
Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.
Correction Guidelines
- Click the edit button next to a field to report a correction.
- Fill in the suggested correction value for each field you want to correct.
- Provide your name and email so we can contact you if needed.
Paper Information
NAKBA NLP 2026: Shared Task on Arabic Handwritten Manuscript Understanding (Palestine Memory–Omar Al-Saleh Memoir)
Paper Fields
Click the edit button next to a field to report a correction.
NAKBA NLP 2026: Shared Task on Arabic Handwritten Manuscript Understanding (Palestine Memory–Omar Al-Saleh Memoir)
Transcribing historical Arabic manuscripts into machine-readable text is essential for preserving cultural heritage and enabling computational research in the humanities, yet it remains a challenging task due to handwriting variability, page degradation, and the complexity of Arabic script. To advance research in this area, we introduce the NAKBA NLP 2026 shared task on Arabic manuscript understanding, comprising two complementary tracks: a manual transcription track, in which participating teams annotate unlabelled handwritten line images, and an automatic system track for handwritten text recognition (HTR). Both tracks use the Omar Al-Saleh Memoir Collection, a corpus of 6,395 scanned pages and approximately 1.6 million words, written between 1951 and 1965 and provided by the Palestine Memory Project. The dataset, evaluation scripts, and system outputs are publicly available.[1] In Subtask 1 (Transcription Track), three teams contributed manual line-level transcriptions; evaluation on hidden ground-truth samples yielded Character Error Rates (CER) between 0.06 and 0.11. In Subtask 2 (Systems Track), seven teams submitted HTR systems. The top-performing system, by Misraj AI, achieved a corpus-level CER of 0.079 and Word Error Rate (WER) of 0.244, outperforming the organiser baseline (CER 0.368, WER 0.691). Rankings shift between corpus-level and per-line evaluation: the 3reeq team achieved the lowest per-line CER (0.082). All contributed transcriptions and system outputs are released under CC-BY-4.0 to support continued research in Arabic manuscript recognition and digital humanities. [1] https://acr.ps/1L9BaeY
Authors
Expand an author to correct their information. Use the remove button to request author removal, or add a new author.
PDF Attachment
You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.
Your Information
Author Declaration *
Select at least one field to correct using the edit buttons above.