Back to Main Conference 2010
LREC 2010main

English — Oromo Machine Translation: An Experiment Using a Statistical Approach

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/3qvtih8q3aoc

Abstract

This paper deals with translation of English documents to Oromo using statistical methods. Whereas English is the lingua franca of online information, Oromo, despite its relative wide distribution within Ethiopia and neighbouring countries like Kenya and Somalia, is one of the most resource scarce languages. The paper has two main goals: one is to test how far we can go with the available limited parallel corpus for the English ― Oromo language pair and the applicability of existing Statistical Machine Translation (SMT) systems on this language pair. The second goal is to analyze the output of the system with the objective of identifying the challenges that need to be tackled. Since the language is resource scarce as mentioned above, we cannot get as many parallel documents as we want for the experiment. However, using a limited corpus of 20,000 bilingual sentences and 163,000 monolingual sentences, translation accuracy in terms of BLEU Score of 17.74% was achieved.

Details

Paper ID
lrec2010-main-470
Pages
N/A
BibKey
adugna-eisele-2010-english
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • SA

    Sisay Adugna

  • AE

    Andreas Eisele

Links