Back to Main Conference 2016
LREC 2016main

Lexical Resources to Enrich English Malayalam Machine Translation

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/4azqkmazt5oo

Abstract

In this paper we present our work on the usage of lexical resources for the Machine Translation English and Malayalam. We describe a comparative performance between different Statistical Machine Translation (SMT) systems on top of phrase based SMT system as baseline. We explore different ways of utilizing lexical resources to improve the quality of English Malayalam statistical machine translation. In order to enrich the training corpus we have augmented the lexical resources in two ways (a) additional vocabulary and (b) inflected verbal forms. Lexical resources include IndoWordnet semantic relation set, lexical words and verb phrases etc. We have described case studies, evaluations and have given detailed error analysis for both Malayalam to English and English to Malayalam machine translation systems. We observed significant improvement in evaluations of translation quality. Lexical resources do help uplift performance when parallel corpora are scanty.

Details

Paper ID
lrec2016-main-098
Pages
pp. 620-627
BibKey
s-bhattacharyya-2016-lexical
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • SS

    Sreelekha S

  • PB

    Pushpak Bhattacharyya

Links