The Hundzula Retreat-Based Infrastructure Model for African Natural Language Processing
Proceedings of Resources for African Indigenous Languages (RAIL) 2026 @ LREC 2026
Abstract
The development of Natural Language Processing (NLP) resources for African indigenous languages remains constrained by limited data availability, fragmented expertise, and a lack of sustainable, locally grounded infrastructures for enabling language research. While much existing work focuses on producing discrete resources such as corpora or lexicons, less attention has been paid to the social, institutional, and methodological conditions that enable such resources to be created, maintained, and sustained. This paper presents the Hundzula Retreat for NLP and Linguistics as a retreat-based resource infrastructure model that addresses these constraints. We conceptualise Hundzula not as a once-off event, but as a structured, upstream research infrastructure that facilitates human capacity development, interdisciplinary collaboration between linguistics and NLP, ethical data practices, and the early-stage incubation of language resources for African indigenous languages. Drawing on evidence from multiple iterations of the retreat, we describe the design principles, workflows, and governance mechanisms that support resource development, including training pipelines, human-in-the-loop methodologies, and collaborative project formation. Rather than focusing on already formalised outputs, the paper foregrounds the infrastructural conditions that make such outputs possible within under-resourced contexts. In doing so, the paper shifts attention from outputs to the enabling ecosystems required for their production. We argue that retreat-based infrastructures constitute an essential but under-recognised category of language resources and demonstrate how the Hundzula model can be adapted and replicated in other low-resourced language contexts. The paper contributes a transferable framework for sustainable NLP resource development grounded in African linguistic realities.