Back to Main Conference 2010
LREC 2010main

Hybrid Constituent and Dependency Parsing with Tsinghua Chinese Treebank

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/43r2e2g3kyw8

Abstract

In this paper, we describe our hybrid parsing model on the Mandarin Chinese processing. In particular, we work on the Tsinghua Chinese Treebank (TCT), whose annotation has both constitutes and the head information of each constitute. The model we design combines the mainstream constitute parsing and dependency parsing. We present in detail 1) how to (partially) encode the head information into the constitute parsing, 2) how to encode constitute information into the dependency parsing, and 3) how to restore the head information using the dependency structure. For each of them, we take different strategies to deal with different cases. In an open shared task evaluation, we achieve an f1-score of 85.23% for the constitute parsing, 82.35% with partial head information, and 74.27% with complete head information. The error analysis shows the challenge of restoring multiple-headed constitutes and also some potentials to use the dependency structure to guide the constitute parsing, which will be our future work to explore.

Details

Paper ID
lrec2010-main-583
Pages
N/A
BibKey
wang-zhang-2010-hybrid
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • RW

    Rui Wang

  • YZ

    Yi Zhang

Links