Back to Main Conference 2022
LREC 2022main

CTAP for Chinese:A Linguistic Complexity Feature Automatic Calculation Platform

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/3wk3vkyg4y6v

Abstract

The construct of linguistic complexity has been widely used in language learning research. Several text analysis tools have been created to automatically analyze linguistic complexity. However, the indexes supported by several existing Chinese text analysis tools are limited and different because of different research purposes. CTAP is an open-source linguistic complexity measurement extraction tool, which prompts any research purposes. Although it was originally developed for English, the Unstructured Information Management (UIMA) framework it used allows the integration of other languages. In this study, we integrated the Chinese component into CTAP, describing the index sets it incorporated and comparing it with three linguistic complexity tools for Chinese. The index set includes four levels of 196 linguistic complexity indexes: character level, word level, sentence level, and discourse level. So far, CTAP has implemented automatic calculation of complexity characteristics for four languages, aiming to help linguists without NLP background study language complexity.

Details

Paper ID
lrec2022-main-592
Pages
pp. 5525-5538
BibKey
cui-etal-2022-ctap
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • YC

    Yue Cui

  • JZ

    Junhui Zhu

  • LY

    Liner Yang

  • XF

    Xuezhi Fang

  • XC

    Xiaobin Chen

  • YW

    Yujie Wang

  • EY

    Erhong Yang

Links