Back to Main Conference 2010
LREC 2010main

Czech Information Retrieval with Syntax-based Language Models

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/3razd5sc9fko

Abstract

In recent years, considerable attention has been dedicated to language modeling methods in information retrieval. Although these approaches generally allow exploitation of any type of language model, most of the published experiments were conducted with a classical n-gram model, usually limited only to unigrams. A few works exploiting syntax in information retrieval can be cited in this context, but significant contribution of syntax based language modeling for information retrieval is yet to be proved. In this paper, we propose, implement, and evaluate an enrichment of language model employing syntactic dependency information acquired automatically from both documents and queries. Our experiments are conducted on Czech which is a morphologically rich language and has a considerably free word order, therefore a syntactic language model is expected to contribute positively to the unigram and bigram language model based on surface word order. By testing our model on the Czech test collection from Cross Language Evaluation Forum 2007 Ad-Hoc track, we show positive contribution of using dependency syntax in this context.

Details

Paper ID
lrec2010-main-147
Pages
N/A
BibKey
strakova-pecina-2010-czech
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • JS

    Jana Straková

  • PP

    Pavel Pecina

Links