HomeLREC 2026WorkshopsNAKBANLPlrec2026-ws-nakbanlp-12
Back to NAKBANLP 2026
LREC 2026workshop

Doaa Sulaiman at AR-MS NakbaNLP 2026: Faithful Diplomatic Transcription of Arabic Manuscripts Using a Human-Centred Annotation Framework

Proceedings of the 2nd International Workshop on Nakba Narratives as Language Resources @ LREC 2026

DOI:10.63317/3ti98qkygayb

Abstract

This paper describes my participation in the Human Transcription Track (Subtask 1) of the NAKBA-NLP 2026 Arabic Manuscript Understanding Shared Task, which focuses on historical handwritten documents related to Palestinian Nakba narratives. Participant was asked to manually transcribe approximately 500 cropped line images and to design a comprehensive transcription guideline from scratch. I adopted a faithful diplomatic transcription philosophy that preserves original spelling, punctuation, diacritics, and layout features without editorial normalisation, in order to create research-grade gold-standard data. Building on this philosophy, I developed a 26-convention annotation framework organised into three layers: editorial-structural symbols (11 conventions), faithful-copying rules (12 conventions), and documentation labels (3 types), supported by a four-step quality-control pipeline. My submission achieved full coverage of all 500 assigned lines and attained an official Character Error Rate (CER) of 0.02 and accuracy of 0.98, confirming the high precision of the proposed framework.

Details

Paper ID
lrec2026-ws-nakbanlp-12
Pages
pp. 108-112
BibKey
sulaiman-2026-doaa
Editors
Mustafa Jarrar, Mo El-Haj, Amal Haddad, Serin Atiani, Shadi Abudalfa, Terry Regier, Paul Rayson, Khalil Sima’an, Camille Mansour
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 2nd International Workshop on Nakba Narratives as Language Resources @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • DS

    Doaa Bahjat Sulaiman

Links