HomeLREC 2026WorkshopsRAILlrec2026-ws-rail-10
Back to RAIL 2026
LREC 2026workshop

Reclaiming African Voices: Surveying Indigenous Writing Systems for Inclusive NLP

Proceedings of Resources for African Indigenous Languages (RAIL) 2026 @ LREC 2026

DOI:10.63317/2zgo4ih7o2ch

Abstract

Multilingual NLP has expanded rapidly through large-scale pretraining and cross-lingual transfer, yet this progress remains structurally uneven across writing systems. This survey reframes multilingual NLP around scripts rather than languages, arguing that writing systems constitute an under-theorized axis of computational inequality. Focusing on African scripts — Indigenous (Vai, Ge’ez, Tifinagh), modern (ADLaM, N’Ko), and adapted Arabic-based (Ajami)—we analyze how script properties interact with digital infrastructure, tokenization, and downstream task performance. We organize the literature across four analytical layers: infrastructural (Unicode and input systems), representational (segmentation efficiency and vocabulary allocation), functional (task-level disparities), and epistemic (evaluation bias and the "low-resource" framing). Synthesizing evidence from 47 studies, we show that performance gaps across scripts arise primarily from engineering design choices rather than intrinsic linguistic complexity. We conclude by outlining a research agenda for native multiscript foundation models, including script-aware scaling laws, tokenizer equity metrics, and evaluation reform. We argue that multiscript equity is not a peripheral concern but a structural precondition for genuine multilingual inclusion

Details

Paper ID
lrec2026-ws-rail-10
Pages
pp. 96-106
BibKey
traore-etal-2026-reclaiming
Editors
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of Resources for African Indigenous Languages (RAIL) 2026 @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • MT

    Mamady Traore

  • NL

    Ngoc Tan Le

  • FS

    Fatiha Sadat

Links