Request Correction
Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.
Correction Guidelines
- Click the edit button next to a field to report a correction.
- Fill in the suggested correction value for each field you want to correct.
- Provide your name and email so we can contact you if needed.
Paper Information
A Dataset of Wolof Ajami Manuscripts for HTR and OCR
Paper Fields
Click the edit button next to a field to report a correction.
A Dataset of Wolof Ajami Manuscripts for HTR and OCR
We present the first ever dataset of manually segmented and transcribed Ajami manuscripts written in Wolof. The term Ajami refers to modified Arabic-script orthographies used to transcribe African languages. Handwritten text recognition (HTR) and optical character recognition (OCR) models for Arabic-script languages perform poorly on African languages written in Ajami orthographies because these languages are not represented in the pre-training data of the models. This leads to recognition models being unable to extract unique Arabic-script letters and ubiquitous diacritics used in African languages, and struggling to adapt to various calligraphy styles used across Africa. We release the following as an open-source dataset: an ALTO formatting of high-quality images of handwritten and printed, 20th–century Wolof manuscripts; manual segmentation (region and line); and manual transcriptions. We extend our contribution by evaluating several Arabic-script recognition models intended for historical manuscripts and find they produce character error rates (CER) of 61–81%. Transcriptions produced by the evaluated recognition models, as well as a keyboard to transcribe Wolof Ajami manuscripts, are released as well. The digitally transcribed text in the dataset can also be utilized for various natural language processing (NLP) and historical linguistic tasks.
Authors
Expand an author to correct their information. Use the remove button to request author removal, or add a new author.
PDF Attachment
You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.
Your Information
Author Declaration *
Select at least one field to correct using the edit buttons above.