Back to Main Conference 2004
LREC 2004main

Word Sense Disambiguation as a Wordnets’ Validation Method in Balkanet

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/4abt25cx4nhh

Abstract

BalkaNet is a European project which aims at the development of monolingual wordnets for five languages in the Balkans area (Bulgarian, Greek, Romanian Serbia, and Turkish) and at improvement of the Czech wordnet developed in the EuroWordNet project. The wordnets are aligned to the Princeton Wordnet, according to the principles established by the EuroWordNet consortium. One of the main concerns of this project is the interlingual validation of the wordnets alignment. To this end, we have developed a WSD system, based on parallel corpora, which exploits the common intuition according to which words that are reciprocal translations in a parallel texts should be linked to the same(or closely related) interlingual concepts. An embedded word aligner provides the wordnet-based algorithm, described in the paper, with pairs of words which are reciproca translations and which are subject to mutually disambiguate each other. With wordnets under construction, our WSD system is useful mainly for validation, pinpointing wrong interlingual alignments, incomplete or missing synsets in one or the other of the wordnets. With robust wordnets, the system is a proper word sense disambiguation tool for parallel corpora. The sense granularity at which the WSD is achieved is the one in the Princeton Wordnet. The challenge of this approach, besides its high accuracy and fine-grained disambiguation is that it may be used to automatically sense-tag corpora in not only one language, but rather several at once and by the same sense inventory. WSD is evaluated on an Romanian-English bitext, extracted form the multilingual parallel corpus "1984", against a hand sense-tagging used as a Gold-Standard.

Details

Paper ID
lrec2004-main-119
Pages
N/A
BibKey
tufis-etal-2004-word
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • DT

    Dan Tufis

  • RI

    Radu Ion

  • NI

    Nancy Ide

Links