Back to Main Conference 2026
LREC 2026main

Adja-French Parallel Corpus: A New Resource for Machine Translation of a West African Under-Resourced Language

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/5bk4g2k7mpmu

Abstract

We present the first parallel text corpus for Adja machine translation, an under-resourced Gbe language spoken by approximately 1,000,000 people in Benin and Togo. The corpus contains 10,000 French-Adja sentence pairs, providing a foundation for machine translation research. We establish baseline results using fine-tuned NLLB and ByT5 models, achieving a chrF++ of 28 in the French→Adja direction, and up to a chrF++ of 34 in the Adja→French direction. This work represents the first public machine translation resource for Adja. It provides benchmarks for future studies on this under-resourced West African language. The dataset is available at https://huggingface.co/datasets/JosueG/french-adja-parallel-corpus.

Details

Paper ID
lrec2026-main-299
Pages
pp. 3742-3749
BibKey
godeme-etal-2026-adja
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • JG

    Josue Frejus Godeme

  • RC

    Rolando Coto-Solano

Links