Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
Abstract
There is a rich flora of word space models that have proven their efficiency in many different applications including information retrieval (Dumais, 1988), word sense disambiguation (Schutze, 1992}, various semantic knowledge tests (lund, 1995; Karlgren, 2001}, and text categorization (Sahlgren, 2005). Based on the assumption that each model captures some aspects of word meanings and provides its own empirical evidence, we present in this paper a systematic exploration of the principal corpus-based word space models for bilingual terminology extraction from comparable corpora. We find that, once we have identified the best procedures, a very simple combination approach leads to significant improvements compared to individual models.