HomeLREC 2022WorkshopsLAWlrec2022-ws-law-04
Back to LAW 2022
LREC 2022workshop

Converting the Sinica Treebank of Mandarin Chinese to Universal Dependencies

Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022

DOI:10.63317/3pt2nyh69dpv

Abstract

This paper describes the conversion of the Sinica Treebank, one of the major Mandarin Chinese treebanks, to Universal Dependencies. The conversion is rule-based and the process involves POS tag mapping, head adjusting in line with the UD scheme and the dependency conversion. Linguistic insights into Mandarin Chinese alongwith the conversion are also discussed. The resulting corpus is the UD Chinese Sinica Treebank which contains more than fifty thousand tree structures according to the UD scheme. The dataset can be downloaded at https://github.com/ckiplab/ud.

Details

Paper ID
lrec2022-ws-law-04
Pages
pp. 23-30
BibKey
hsieh-etal-2022-converting
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022
Location
undefined, undefined
Date
20 June 2022 25 June 2022

Authors

  • YH

    Yu-Ming Hsieh

  • YS

    Yueh-Yin Shih

  • WM

    Wei-Yun Ma

Links