Back to Main Conference 2004
LREC 2004main

Using Cooccurrence Statistics and the Web to Discover Synonyms in a Technical Language

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/48yvhipoi6dv

Abstract

Turney 2001 has shown that computing the mutual information of a pair of words by using cooccurrence counts obtained via queries to the AltaVista search engine performs very effectively in a synonym detection task. Since manual synonym detection is a challenging task for terminologists, we investigate whether the AltaVista-based Mutual Information (AVMI) method can be applied to the task of finding pairs of synonyms in the lexicon of a specialized sub-language. In particular, we experiment with synonyms in the field of nautical terminology. Our results indicate that AVMI is very good at spotting synonym couples among pairs of unrelated terms (with precision close to 90% at 62.5% recall) and that it outperforms more standard methods based on contextual cosine similarity. However, AVMI is not able to distinguish between synonyms and other semantically related terms. Thus, AVMI can be used for synonym mining only if it is combined with techniques to filter out other semantic relations.

Details

Paper ID
lrec2004-main-130
Pages
N/A
BibKey
baroni-bisi-2004-using
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • MB

    Marco Baroni

  • SB

    Sabrina Bisi

Links