Back to Main Conference 2016
LREC 2016main

A Dependency Treebank of the Chinese Buddhist Canon

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/3gf8gpnufck7

Abstract

We present a dependency treebank of the Chinese Buddhist Canon, which contains 1,514 texts with about 50 million Chinese characters. The treebank was created by an automatic parser trained on a smaller treebank, containing four manually annotated sutras (Lee and Kong, 2014). We report results on word segmentation, part-of-speech tagging and dependency parsing, and discuss challenges posed by the processing of medieval Chinese. In a case study, we exploit the treebank to examine verbs frequently associated with Buddha, and to analyze usage patterns of quotative verbs in direct speech. Our results suggest that certain quotative verbs imply status differences between the speaker and the listener.

Details

Paper ID
lrec2016-main-265
Pages
pp. 1679-1683
BibKey
wong-lee-2016-dependency
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • TW

    Tak-sum Wong

  • JL

    John Lee

Links