Back to Main Conference 2004
LREC 2004main
A Search Tool for Corpora with Positional Tagsets and Ambiguities
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)
Abstract
This article describes POLIQARP, a corpus indexing and query tool, which understands positional tagsets and which does not assume that word forms are annotated with unique morphosyntactic tags. POLIQARP is designed to be applicable to a variety of languages and tagsets: it works with XML-encoded texts, uses the UTF-8 character set, and allows for an external specification of the tagset. Currently, POLIQARP is used for indexing and searching a morphosyntactically annotated corpus of Polish.