Back to Main Conference 2012
LREC 2012main

Evaluating Machine Reading Systems through Comprehension Tests

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/2tqr7jf622r8

Abstract

This paper describes a methodology for testing and evaluating the performance of Machine Reading systems through Question Answering and Reading Comprehension Tests. The methodology is being used in QA4MRE (QA for Machine Reading Evaluation), one of the labs of CLEF. The task was to answer a series of multiple choice tests, each based on a single document. This allows complex questions to be asked but makes evaluation simple and completely automatic. The evaluation architecture is completely multilingual: test documents, questions, and their answers are identical in all the supported languages. Background text collections are comparable collections harvested from the web for a set of predefined topics. Each test received an evaluation score between 0 and 1 using c@1. This measure encourages systems to reduce the number of incorrect answers while maintaining the number of correct ones by leaving some questions unanswered. 12 groups participated in the task, submitting 62 runs in 3 different languages (German, English, and Romanian). All runs were monolingual; no team attempted a cross-language task. We report here the conclusions and lessons learned after the first campaign in 2011.

Details

Paper ID
lrec2012-main-148
Pages
pp. 1143-1147
BibKey
penas-etal-2012-evaluating
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • AP

    Anselmo Peñas

  • EH

    Eduard Hovy

  • PF

    Pamela Forner

  • ÁR

    Álvaro Rodrigo

  • RS

    Richard Sutcliffe

  • CF

    Corina Forascu

  • CS

    Caroline Sporleder

Links