The CRECIL Corpus: a New Dataset for Extraction of Relations between Characters in Chinese Multi-party Dialogues

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

Abstract

We describe a new freely available Chinese multi-party dialogue dataset for automatic extraction of dialogue-based character relationships. The data has been extracted from the original TV scripts of a Chinese sitcom called “I Love My Home” with complex family-based human daily spoken conversations in Chinese. First, we introduced human annotation scheme for both global Character relationship map and character reference relationship. And then we generated the dialogue-based character relationship triples. The corpus annotates relationships between 140 entities in total. We also carried out a data exploration experiment by deploying a BERT-based model to extract character relationships on the CRECIL corpus and another existing relation extraction corpus (DialogRE (CITATION)).The results demonstrate that extracting character relationships is more challenging in CRECIL than in DialogRE.

Resources

Details

Paper ID

lrec2022-main-250

Pages

pp. 2337-2344

DOI

10.63317/2xwrtnbc5pyr

BibKey

jiang-etal-2022-crecil

Editors

Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis2020

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

79-10-95546-38-2

Conference

Thirteenth Language Resources and Evaluation Conference

Location

Marseille, France

Date

20 - 25 June 2022

Authors

YJ
Yuru Jiang
YX
Yang Xu
YZ
Yuhang Zhan
WH
Weikai He
YW
Yilin Wang
ZX
Zixuan Xi
MW
Meiyun Wang
XL
Xinyu Li
YL
Yu Li
YY
Yanchao Yu

Links

URL

DOI