Back to Main Conference 2018
LREC 2018main

Improving Crowdsourcing-Based Annotation of Japanese Discourse Relations

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/28vbj6p6xyoq

Abstract

Although discourse parsing is an important and fundamental task in natural language processing, few languages have corpora annotated with discourse relations and if any, they are small in size. Creating a new corpus of discourse relations by hand is costly and time-consuming. To cope with this problem, Kawahara et al. (2014) constructed a Japanese corpus with discourse annotations through crowdsourcing. However, they did not evaluate the quality of the annotation. In this paper, we evaluate the quality of the annotation using expert annotations. We find out that crowdsourcing-based annotation still leaves much room for improvement. Based on the error analysis, we propose improvement techniques based on language tests. We re-annotated the corpus with discourse annotations using the improvement techniques, and achieved approximately 3% improvement in F-measure. We will make re-annotated data publicly available.

Details

Paper ID
lrec2018-main-637
Pages
N/A
BibKey
kishimoto-etal-2018-improving
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • YK

    Yudai Kishimoto

  • SS

    Shinnosuke Sawada

  • YM

    Yugo Murawaki

  • DK

    Daisuke Kawahara

  • SK

    Sadao Kurohashi

Links