Back to Main Conference 2016
LREC 2016main

Using BabelNet to Improve OOV Coverage in SMT

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/5aphganta7fy

Abstract

Out-of-vocabulary words (OOVs) are a ubiquitous and difficult problem in statistical machine translation (SMT). This paper studies different strategies of using BabelNet to alleviate the negative impact brought about by OOVs. BabelNet is a multilingual encyclopedic dictionary and a semantic network, which not only includes lexicographic and encyclopedic terms, but connects concepts and named entities in a very large network of semantic relations. By taking advantage of the knowledge in BabelNet, three different methods ― using direct training data, domain-adaptation techniques and the BabelNet API ― are proposed in this paper to obtain translations for OOVs to improve system performance. Experimental results on English―Polish and English―Chinese language pairs show that domain adaptation can better utilize BabelNet knowledge and performs better than other methods. The results also demonstrate that BabelNet is a really useful tool for improving translation performance of SMT systems.

Details

Paper ID
lrec2016-main-002
Pages
pp. 9-15
BibKey
du-etal-2016-using
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • JD

    Jinhua Du

  • AW

    Andy Way

  • AZ

    Andrzej Zydron

Links