Doaa Sulaiman at AR-MS NakbaNLP 2026: Faithful Diplomatic Transcription of Arabic Manuscripts Using a Human-Centred Annotation Framework
Proceedings of the 2nd International Workshop on Nakba Narratives as Language Resources @ LREC 2026
Abstract
This paper describes my participation in the Human Transcription Track (Subtask 1) of the NAKBA-NLP 2026 Arabic Manuscript Understanding Shared Task, which focuses on historical handwritten documents related to Palestinian Nakba narratives. Participant was asked to manually transcribe approximately 500 cropped line images and to design a comprehensive transcription guideline from scratch. I adopted a faithful diplomatic transcription philosophy that preserves original spelling, punctuation, diacritics, and layout features without editorial normalisation, in order to create research-grade gold-standard data. Building on this philosophy, I developed a 26-convention annotation framework organised into three layers: editorial-structural symbols (11 conventions), faithful-copying rules (12 conventions), and documentation labels (3 types), supported by a four-step quality-control pipeline. My submission achieved full coverage of all 500 assigned lines and attained an official Character Error Rate (CER) of 0.02 and accuracy of 0.98, confirming the high precision of the proposed framework.