KinyCOMET: Automatic Evaluation of Machine Translation Systems for Kinyarwanda-English

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

This paper presents KinyCOMET, a new automatic evaluation metric for Kinyarwanda–English machine translation (MT). Current MT evaluation in Rwanda relies mainly on BLEU and chrF, which have been shown to correlate poorly with human judgments. To address this gap, we created a Direct Assessment (DA) dataset for Kinyarwanda-English translations and used it to fine-tune COMET models for this language pair. We evaluate two variants: KinyCOMET XLM-RoBERTa, trained from a multilingual encoder without Kinyarwanda data, and KinyCOMET Unbabel, a fine-tuned version of the Unbabel COMET model. Both models achieve strong correlations with human evaluations, with KinyCOMET Unbabel outperforming all baselines, including AfriCOMET, chrF, and BLEU. Our results show that fine-tuning pre-trained multilingual models can yield high-quality evaluators even for low-resource languages that the base model was not trained on. We release both the models and the annotated dataset publicly to foster further research on African language evaluation.