AlSaifTeam at AR-MS NAKBA-NLP 2026: Building Expert-Quality Ground Truth for Arabic Handwritten Manuscripts
Proceedings of the 2nd International Workshop on Nakba Narratives as Language Resources @ LREC 2026
Abstract
This paper describes our participation in Subtask 1 of the NAKBA NLP 2026 Arabic Manuscript Understanding Shared Task, which focuses on the manual creation of expert-quality, line-level transcriptions for Arabic handwritten manuscripts. To ensure reliable ground truth, we adopt a protocol-driven methodology based on fixed transcription rules, collaborative verification, and confidence-based quality control. The proposed approach aims to improve consistency, reduce annotation bias, and support the creation of trustworthy benchmark resources for future Arabic OCR and HTR research. Keywords:Arabic handwritten manuscripts, ground truth construction, manual transcription, handwritten text recognition, optical character recognition, benchmark enrichment