MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Abstract

We introduce a large dataset of narrative texts and questions about these texts, intended to be used in a machine comprehension task that requires reasoning using commonsense knowledge. Our dataset complements similar datasets in that we focus on stories about everyday activities, such as going to the movies or working in the garden, and that the questions require commonsense knowledge, or more specifically, script knowledge, to be answered. We show that our mode of data collection via crowdsourcing results in a substantial amount of such inference questions. The dataset forms the basis of a shared task on commonsense and script knowledge organized at SemEval 2018 and provides challenging test cases for the broader natural language understanding community.

Resources

Details

Paper ID

lrec2018-main-564

Pages

N/A

DOI

10.63317/2x36g84ob95p

BibKey

ostermann-etal-2018-mcscript

Editors

Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

79-10-95546-00-9

Conference

Eleventh International Conference on Language Resources and Evaluation

Location

Miyazaki, Japan

Date

7 - 12 May 2018

Authors

SO
Simon Ostermann
AM
Ashutosh Modi
MR
Michael Roth
ST
Stefan Thater
MP
Manfred Pinkal

Links

URL

DOI