Back to Main Conference 2006
LREC 2006main

Second Order Co-occurrence PMI for Determining the Semantic Similarity of Words

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

DOI:10.63317/2aovvs53yg45

Abstract

This paper presents a new corpus-based method for calculating the semantic similarity of two target words. Our method, called Second Order Co-occurrencePMI (SOC-PMI), uses Pointwise Mutual Information to sort lists of important neighbor words of the two target words. Then we consider the words which are common in both lists and aggregate their PMI values (from the opposite list) to calculate the relative semantic similarity. Our method was empirically evaluated using Miller and Charler’s (1991) 30 noun pair subset, Ruben-stein and Goodenough’s (1965) 65 noun pairs, 80 synonym test questions from the Test of English as a Foreign Language (TOEFL), and 50 synonym test questions from a collection of English as a Second Language (ESL) tests. Evaluation results show that our method outperforms several competing corpus-based methods.

Details

Paper ID
lrec2006-main-134
Pages
N/A
BibKey
islam-inkpen-2006-second
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-2-4
Conference
Fifth International Conference on Language Resources and Evaluation
Location
Genoa, Italy
Date
24 May 2006 26 May 2006

Authors

  • MI

    Md. Aminul Islam

  • DI

    Diana Inkpen

Links