Back to Main Conference 2008
LREC 2008main

Induction of Treebank-Aligned Lexical Resources

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/2e6g3wrj9r3p

Abstract

We describe the induction of lexical resources from unannotated corpora that are aligned with treebank grammars, providing a systematic correspondence between features in the lexical resource and a treebank syntactic resource. We first describe a methodology based on parsing technology for augmenting a treebank database with linguistic features. A PCFG containing these features is created from the augmented treebank. We then use a procedure based on the inside-outside algorithm to learn lexical resources aligned with the treebank PCFG from large unannotated corpora. The method has been applied in creating a feature-annotated English treebank based on the Penn Treebank. The unsupervised estimation procedure gives a substantial error reduction (up to 31.6%) on the task of learning the subcategorization preference of novel verbs that are not present in the annotated training sample.

Details

Paper ID
lrec2008-main-151
Pages
N/A
BibKey
deoskar-rooth-2008-induction
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • TD

    Tejaswini Deoskar

  • MR

    Mats Rooth

Links