Adja-French Parallel Corpus: A New Resource for Machine Translation of a West African Under-Resourced Language

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

We present the first parallel text corpus for Adja machine translation, an under-resourced Gbe language spoken by approximately 1,000,000 people in Benin and Togo. The corpus contains 10,000 French-Adja sentence pairs, providing a foundation for machine translation research. We establish baseline results using fine-tuned NLLB and ByT5 models, achieving a chrF++ of 28 in the French→Adja direction, and up to a chrF++ of 34 in the Adja→French direction. This work represents the first public machine translation resource for Adja. It provides benchmarks for future studies on this under-resourced West African language. The dataset is available at https://huggingface.co/datasets/JosueG/french-adja-parallel-corpus.