Back to Main Conference 2008
LREC 2008main

Head or Non-head? Semi-automatic Procedures for Extracting and Classifying Subcategorisation Properties of Compounds.

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/2veqsvn6ai2v

Abstract

In this paper we discuss an approach to the semi-automatic extraction and classification of the compounds extracted from German corpora. Compound nominals are semi-automatically extracted from text corpora along with their sentential complements. In this study we concentrate on that­, wh­ or if subclauses although our methods can be applied to other complements as well. We elaborate an architecture using linguistic knowledge about the phenomena we extract, and aim at answering the following questions: how can data about subcategorisation properties of nominal compounds be extracted from text corpora, and how can compounds be classified according to their subcategorisation properties? Our classification is based on the relationships between the subcategorisation of nominal compounds, e.g. Grundfrage, Wettstreit and Beweismittel, and that of their constituent parts, such as Frage, Streit, Beweis, etc. We show that there are cases which do not match the commonly accepted assumption that the head of a compound is its valency bearer. Such cases should receive a specific treatment in NLP dictionary building. This calls for tools to identify and classify such cases by means of data extraction from corpora. We propose precision-oriented semi­automatic extraction which can operate on tokenized, tagged and lemmatized texts. In the future, we are going to extend the kinds of extracted complements beyond subclauses and analyze the nature of the non-head valency-bearer of compounds, as well as an extension of the kinds of extracted complements beyond subclauses.

Details

Paper ID
lrec2008-main-449
Pages
N/A
BibKey
lapshinova-koltunski-heid-2008-head
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • EL

    Ekaterina Lapshinova-Koltunski

  • UH

    Ulrich Heid

Links