Back to Main Conference 2008
LREC 2008main

Harvesting Multi-Word Expressions from Parallel Corpora

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/4n54z9nkoscs

Abstract

The paper presents a set of approaches to extend the automatically created Slovene wordnet with nominal multi-word expressions. In the first approach multi-word expressions from Princeton WordNet are translated with a technique that is based on word-alignment and lexico-syntactic patterns. This is followed by extracting new terms from a monolingual corpus using keywordness ranking and contextual patterns. Finally, the multi-word expressions are assigned a hypernym and added to our wordnet. Manual evaluation and comparison of the results shows that the translation approach is the most straightforward and accurate. However, it is successfully complemented by the two monolingual approaches which are able to identify more term candidates in the corpus that would otherwise go unnoticed. Some weaknesses of the proposed wordnet extension techniques are also addressed.

Details

Paper ID
lrec2008-main-257
Pages
N/A
BibKey
vintar-fiser-2008-harvesting
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • ŠV

    Špela Vintar

  • DF

    Darja Fišer

Links