Back to Main Conference 2006
LREC 2006main

Skeleton Parsing in Chinese: Annotation Scheme and Guidelines

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

DOI:10.63317/5gd74xujyr2o

Abstract

This paper presents my manual skeleton parsing on a sample text of approximately 100,000 word tokens (or about 2,500 sentences) taken from the PFR Chinese Corpus with a clearly defined parsing scheme of 17 constituent labels. The manually-parsed sample skeleton treebank is one of the very few extant Chinese treebanks. While Chinese part-of-speech tagging and word segmentation have been the subject of concerted research for many years, the syntactic annotation of Chinese corpora is a comparatively new field. The difficulties that I encountered in the production of this treebank demonstrate some of the peculiarities of Chinese syntax. A noteworthy syntactic property is that some serial verb constructions tend to be used as if they were compound verbs. The two transitive verbs in series, unlike common transitive verbs, do not take an object separately within the construction; rather, the serial construction as a whole is able to take the same direct object and the perfective aspect marker le. The skeleton-parsed sample treebank is evaluated against Eyes & Leech (1993)’s criteria and proves to be accurate, uniform and linguistically valid.

Details

Paper ID
lrec2006-main-022
Pages
N/A
BibKey
wong-2006-skeleton
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-2-4
Conference
Fifth International Conference on Language Resources and Evaluation
Location
Genoa, Italy
Date
24 May 2006 26 May 2006

Authors

  • MW

    May Lai-Yin Wong

Links