Back to Main Conference 2008
LREC 2008main

Quality Assurance of Automatic Annotation of Very Large Corpora: a Study based on heterogeneous Tagging System

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/57n65geyq5bt

Abstract

We propose a set of heuristics for improving annotation quality of very large corpora efficiently. The Xinhua News portion of the Chinese Gigaword Corpus was tagged independently with both the Peking University ICL tagset and the Academia Sinica CKIP tagset. The corpus-based POS tags mapping will serve as the basis of the possible contrast in grammatical systems between PRC and Taiwan. And it can serve as the basic model for mapping between the CKIP and ICL tagging systems for any data.

Details

Paper ID
lrec2008-main-106
Pages
N/A
BibKey
huang-etal-2008-quality
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • CH

    Chu-Ren Huang

  • LL

    Lung-Hao Lee

  • WQ

    Wei-guang Qu

  • JH

    Jia-Fei Hong

  • SY

    Shiwen Yu

Links