Back to Main Conference 2002
LREC 2002main

Lexical token alignment: experiments, results and applications

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

DOI:10.63317/5bptnb3g7hxa

Abstract

Lexical alignment is one of the most challenging tasks in processing and exploiting  parallel texts. There are numerous applications that may benefit from an accurate multilingual lexical alignment of bi- and multi-language corpora. We describe in this paper a hypothesistesting approach to the problem of automatic extraction of translation equivalents from sentence-aligned and tagged parallel corpora. The algorithm was used for automatic extraction of 6 bi-lingual lexicons with English as source language and Bulgarian, Czech, Estonian, Hungarian, Romanian and Slovene as the target one, as well as a 7-language lexicon with English as a hub and the other 6 CEE languages. For the experiments described here we used the 7-language aligned corpus based on Orwell’s "1984" novel.

Details

Paper ID
lrec2002-main-032
Pages
N/A
BibKey
tufis-barbu-2002-lexical
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Third International Conference on Language Resources and Evaluation
Location
Las Palmas, Spain
Date
29 May 2002 31 May 2002

Authors

  • DT

    Dan Tufiş

  • AB

    Ana-Maria Barbu

Links