Back to Main Conference 2008
LREC 2008main

An Inverted Index for Storing and Retrieving Grammatical Dependencies

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/45egmuj3j9is

Abstract

Web count statistics gathered from search engines have been widely used as a resource in a variety of NLP tasks. For some tasks, however, the information they exploit is not fine-grained enough. We propose an inverted index over grammatical relations as a fast and reliable resource to access more general and also more detailed frequency information. To build the index, we use a dependency parser to parse a large corpus. We extract binary dependency relations, such as he-subj-say (“he” is the subject of “say”) as index terms and construct the index using publicly available open-source indexing software. The unit we index over is the sentence. The index can be used to extract grammatical relations and frequency counts for these relations. The framework also provides the possibility to search for partial dependencies (say, the frequency of “he” occurring in subject position), words, strings and a combination of these. One possible application is the disambiguation of syntactic structures.

Details

Paper ID
lrec2008-main-336
Pages
N/A
BibKey
atterer-schutze-2008-inverted
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • MA

    Michaela Atterer

  • HS

    Hinrich Schütze

Links