LREC 2006 - Proceedings sorted by papers

Title	Exploiting Parallel Corpora for Supervised Word-Sense Disambiguation in English-Hungarian Machine Translation
Authors	M. Miháltz, G. Pohl
Abstract	In this paper we present an experiment to automatically generate annotated training corpora for a supervised word sense disambiguation module operating in an English-Hungarian and a Hungarian-English machine translation system. Training examples for the WSD module of the MT system are produced by annotating ambiguous lexical items in the source language (words having several possible translations) with their proper target language translations. Since manually annotating training examples is very costly, we are experimenting with a method to produce examples automatically from parallel corpora. Our algorithm relies on monolingual and bilingual lexicons and dictionaries in addition to statistical methods in order to annotate examples extracted from a large English-Hungarian parallel corpus accurately aligned at sentence level. In the paper, we present an experiment with the English noun state, where we categorized the different occurrences in the Hunglish parallel corpus. For this noun, most of the examples were covered by multiword lexical items originating from our lexical sources.
Keywords	word sense disambiguation, machine translation, parallel corpora
Full paper	Exploiting Parallel Corpora for Supervised Word-Sense Disambiguation in English-Hungarian Machine Translation

Title

Exploiting Parallel Corpora for Supervised Word-Sense Disambiguation in English-Hungarian Machine Translation

Authors

M. Miháltz, G. Pohl

Abstract

In this paper we present an experiment to automatically generate annotated training corpora for a supervised word sense disambiguation module operating in an English-Hungarian and a Hungarian-English machine translation system. Training examples for the WSD module of the MT system are produced by annotating ambiguous lexical items in the source language (words having several possible translations) with their proper target language translations. Since manually annotating training examples is very costly, we are experimenting with a method to produce examples automatically from parallel corpora. Our algorithm relies on monolingual and bilingual lexicons and dictionaries in addition to statistical methods in order to annotate examples extracted from a large English-Hungarian parallel corpus accurately aligned at sentence level. In the paper, we present an experiment with the English noun state, where we categorized the different occurrences in the Hunglish parallel corpus. For this noun, most of the examples were covered by multiword lexical items originating from our lexical sources.

Keywords

word sense disambiguation, machine translation, parallel corpora

Full paper

Exploiting Parallel Corpora for Supervised Word-Sense Disambiguation in English-Hungarian Machine Translation