Back to Main Conference 2000
LREC 2000main

Developing Guidelines and Ensuring Consistency for Chinese Text Annotation

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/48iewajwhmd3

Abstract

With growing interest in Chinese Language Processing, numerous NLP tools (e.g. word segmenters, part-of-speech taggers, and parsers) for Chinese have been developed all over the world. However, since no large-scale bracketed corpora are available to the public, these tools are trained on the corpora with different segmentation criteria, part-of-speech tagsets and bracketing guidelines, and therefore, comparisons are difficult. As a first step towards addressing this issue, we have been preparing a 100-thousand-word bracketed corpus since late 1998 and plan to release it to the public summer 2000. In this paper, we will address several challenges in building the corpus, namely, creating annotation guidelines, ensuring annotation accuracy and maintaining a high level of community involvement.

Details

Paper ID
lrec2000-main-217
Pages
N/A
BibKey
xia-etal-2000-developing
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • FX

    Fei Xia

  • MP

    Martha Palmer

  • NX

    Nianwen Xue

  • MO

    Mary Ellen Okurowski

  • JK

    John Kovarik

  • FC

    Fu-Dong Chiou

  • SH

    Shizhe Huang

  • TK

    Tony Kroch

  • MM

    Mitch Marcus

Links