Back to Main Conference 2000
LREC 2000main

Learning Verb Subcategorization from Corpora: Counting Frame Subsets

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/28xd7kymxz55

Abstract

We present some novel machine learning techniques for the identification of subcategorization information for verbs in Czech. We compare three different statistical techniques applied to this problem. We show how the learning algorithm can be used to discover previously unknown subcategorization frames from the Czech Prague Dependency Treebank. The algorithm can then be used to label dependents of a verb in the Czech treebank as either arguments or adjuncts. Using our techniques, we are able to achieve 88 % accuracy on unseen parsed text.

Details

Paper ID
lrec2000-main-107
Pages
N/A
BibKey
zeman-sarkar-2000-learning
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • DZ

    Daniel Zeman

  • AS

    Anoop Sarkar

Links