Back to NAKBANLP 2026
LREC 2026workshop
Oblevit at AR-MS NAKBA NLP 2026 Subtask 2: Hybrid CNN–BiLSTM–CTC Framework with Linguistic Refinement for Arabic Handwritten Manuscript Recognition
Proceedings of the 2nd International Workshop on Nakba Narratives as Language Resources @ LREC 2026
Abstract
Arabic handwritten manuscript recognition is challenging due to the cursive nature of the script, dot ambiguity, and document degradation. In this work, we propose an end-to-end OCR system based on a CNN–BiLSTM–CTC architecture. The model extracts visual features, captures sequential dependencies, and performs alignment-free training. Arabic-specific decoding and post-processing techniques are applied to reduce character and spacing errors. Experimental results show competitive performance in recognizing complex handwritten Arabic text.