Back to Main Conference 2016
LREC 2016main

That’ll Do Fine!: A Coarse Lexical Resource for English-Hindi MT, Using Polylingual Topic Models

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/26ub639pd9es

Abstract

Parallel corpora are often injected with bilingual lexical resources for improved Indian language machine translation (MT). In absence of such lexical resources, multilingual topic models have been used to create coarse lexical resources in the past, using a Cartesian product approach. Our results show that for morphologically rich languages like Hindi, the Cartesian product approach is detrimental for MT. We then present a novel `sentential' approach to use this coarse lexical resource from a multilingual topic model. Our coarse lexical resource when injected with a parallel corpus outperforms a system trained using parallel corpus and a good quality lexical resource. As demonstrated by the quality of our coarse lexical resource and its benefit to MT, we believe that our sentential approach to create such a resource will help MT for resource-constrained languages.

Details

Paper ID
lrec2016-main-349
Pages
pp. 2199-2203
BibKey
kanojia-etal-2016-thatll
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • DK

    Diptesh Kanojia

  • AJ

    Aditya Joshi

  • PB

    Pushpak Bhattacharyya

  • MC

    Mark James Carman

Links