Back to Main Conference 2000
LREC 2000main

Hua Yu: A Word-segmented and Part-Of-Speech Tagged Chinese Corpus

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/3h64u9ibfcai

Abstract

As the outcome of a 3-year joint effort of Department of Computer Science, Tsinghua University and Language Information Processing Institute, Beijing Language and Culture University, Beijing, China, a word-segmented and part-of-speech tagged Chinese corpus with size of 2 million Chinese characters, named HuaYu, has been established. This paper firstly introduces some basics about HuaYu in brief, as its genre distribution, fundamental considerations in designing it, word segmentation and part-of-speech tagging standards. Then the complete list of tag set used in HuaYu is given, along with typical examples for each tag accordingly. Several pieces of annotated texts in each genre are also included at last for reader's reference.

Details

Paper ID
lrec2000-main-277
Pages
N/A
BibKey
sun-etal-2000-hua
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • MS

    Maosong Sun

  • HS

    Honglin Sun

  • CH

    Changning Huang

  • PZ

    Pu Zhang

  • HX

    Hongbing Xing

  • QZ

    Qiang Zhou

Links