Back to Main Conference 2016
LREC 2016main

Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/3c5z2emsocx6

Abstract

Bilingual lexicon extraction from comparable corpora is usually based on distributional methods when dealing with single word terms (SWT). These methods often treat SWT as single tokens without considering their compositional property. However, many SWT are compositional (composed of roots and affixes) and this information, if taken into account can be very useful to match translational pairs, especially for infrequent terms where distributional methods often fail. For instance, the English compound \textit{xenograft} which is composed of the root \textit{xeno} and the lexeme \textit{graft} can be translated into French compositionally by aligning each of its elements (\textit{xeno} with \textit{x\'eno} and \textit{graft} with \textit{greffe}) resulting in the translation: \textit{x\'enogreffe}. In this paper, we experiment several distributional modellings at the morpheme level that we apply to perform compositional translation to a subset of French and English compounds. We show promising results using distributional analysis at the root and affix levels. We also show that the adapted approach significantly improve bilingual lexicon extraction from comparable corpora compared to the approach at the word level.

Details

Paper ID
lrec2016-main-496
Pages
pp. 3110-3115
BibKey
hazem-daille-2016-bilingual
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • AH

    Amir Hazem

  • BD

    Béatrice Daille

Links