Back to Home

Request Correction

Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.

Correction Guidelines

  1. Click the edit button next to a field to report a correction.
  2. Fill in the suggested correction value for each field you want to correct.
  3. Provide your name and email so we can contact you if needed.

Paper Information

lrec2026-ws-rail-12

The Hundzula Retreat-Based Infrastructure Model for African Natural Language Processing

Paper Fields

Click the edit button next to a field to report a correction.

Title

The Hundzula Retreat-Based Infrastructure Model for African Natural Language Processing

Abstract

The development of Natural Language Processing (NLP) resources for African indigenous languages remains constrained by limited data availability, fragmented expertise, and a lack of sustainable, locally grounded infrastructures for enabling language research. While much existing work focuses on producing discrete resources such as corpora or lexicons, less attention has been paid to the social, institutional, and methodological conditions that enable such resources to be created, maintained, and sustained. This paper presents the Hundzula Retreat for NLP and Linguistics as a retreat-based resource infrastructure model that addresses these constraints. We conceptualise Hundzula not as a once-off event, but as a structured, upstream research infrastructure that facilitates human capacity development, interdisciplinary collaboration between linguistics and NLP, ethical data practices, and the early-stage incubation of language resources for African indigenous languages. Drawing on evidence from multiple iterations of the retreat, we describe the design principles, workflows, and governance mechanisms that support resource development, including training pipelines, human-in-the-loop methodologies, and collaborative project formation. Rather than focusing on already formalised outputs, the paper foregrounds the infrastructural conditions that make such outputs possible within under-resourced contexts. In doing so, the paper shifts attention from outputs to the enabling ecosystems required for their production. We argue that retreat-based infrastructures constitute an essential but under-recognised category of language resources and demonstrate how the Hundzula model can be adapted and replicated in other low-resourced language contexts. The paper contributes a transferable framework for sustainable NLP resource development grounded in African linguistic realities.


Authors

Expand an author to correct their information. Use the remove button to request author removal, or add a new author.


PDF Attachment

You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.

Drag & drop a PDF here, or click to select

Your Information

Author Declaration *

Select at least one field to correct using the edit buttons above.