Back to Main Conference 2012
LREC 2012main

Extending the MPC corpus to Chinese and Urdu - A Multiparty Multi-Lingual Chat Corpus for Modeling Social Phenomena in Language

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/37mzudcdyibb

Abstract

In this paper, we report our efforts in building a multi-lingual multi-party online chat corpus in order to develop a firm understanding in a set of social constructs such as agenda control, influence, and leadership as well as to computationally model such constructs in online interactions. These automated models will help capture the dialogue dynamics that are essential for developing, among others, realistic human-machine dialogue systems, including autonomous virtual chat agents. In this paper, we first introduce our experiment design and data collection method in Chinese and Urdu, and then report on the current stage of our data collection. We annotated the collected corpus on four levels: communication links, dialogue acts, local topics, and meso-topics. Results from the analyses of annotated data on different languages indicate some interesting phenomena, which are reported in this paper.

Details

Paper ID
lrec2012-main-491
Pages
pp. 2868-2873
BibKey
liu-etal-2012-extending
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • TL

    Ting Liu

  • SS

    Samira Shaikh

  • TS

    Tomek Strzalkowski

  • AB

    Aaron Broadwell

  • JS

    Jennifer Stromer-Galley

  • ST

    Sarah Taylor

  • UB

    Umit Boz

  • XR

    Xiaoai Ren

  • JW

    Jingsi Wu

Links