Back to Main Conference 2008
LREC 2008main

Tools for Collocation Extraction: Preferences for Active vs. Passive

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/4b5tair76zjc

Abstract

We present and partially evaluate procedures for the extraction of noun+verb collocation candidates from German text corpora, along with their morphosyntactic preferences, especially for the active vs. passive voice. We start from tokenized, tagged, lemmatized and chunked text, and we use extraction patterns formulated in the CQP corpus query language. We discuss the results of a precision evaluation, on administrative texts from the European Union: we find a considerable amount of specialized collocations, as well as general ones and complex predicates; overall the precision is considerably higher than that of a statistical extractor used as a baseline.

Details

Paper ID
lrec2008-main-076
Pages
N/A
BibKey
heid-weller-2008-tools
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • UH

    Ulrich Heid

  • MW

    Marion Weller

Links