Back to Main Conference 2004
LREC 2004main

Using the Web as a Corpus for the Syntactic-Based Collocation Identification

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/3hjwepohqd4h

Abstract

This paper presents an experiment that uses a Web search engine and a robust parser for the Web-based identification of collocations (statistically significant word associations representing “a conventional way of saying things” (Manning and Schütze, 1999)). We identify the possible collocates of a given word by parsing the text snippets returned by the search engine when querying that word. Then, we rank the list of syntactic co-occurrences retrieved according to the collocational strength of each pair by using different statistical measures.

Details

Paper ID
lrec2004-main-387
Pages
N/A
BibKey
seretan-etal-2004-using
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • VS

    Violeta Seretan

  • LN

    Luka Nerima

  • EW

    Eric Wehrli

Links