Back to Main Conference 2012
LREC 2012main

A Grammar-informed Corpus-based Sentence Database for Linguistic and Computational Studies

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/4tsdtb39fuc6

Abstract

We adopt the corpus-informed approach to example sentence selections for the construction of a reference grammar. In the process, a database containing sentences that are carefully selected by linguistic experts including the full range of linguistic facts covered in an authoritative Chinese Reference Grammar is constructed and structured according to the reference grammar. A search engine system is developed to facilitate the process of finding the most typical examples the users need to study a linguistic problem or prove their hypotheses. The database can also be used as a training corpus by computational linguists to train models for Chinese word segmentation, POS tagging and sentence parsing.

Details

Paper ID
lrec2012-main-207
Pages
pp. 3140-3144
BibKey
xu-etal-2012-grammar
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • HX

    Hongzhi Xu

  • HC

    Helen Kaiyun Chen

  • CH

    Chu-Ren Huang

  • QL

    Qin Lu

  • DS

    Dingxu Shi

  • TC

    Tin-Shing Chiu

Links