A Benchmark Corpus for the Diagnostic Assessment of Content in L2 English Speech

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

When evaluating second language (L2) learners’ speech, human raters pay significant attention to its content, and diagnostic feedback on content helps improve learners’ speaking ability. Since human scoring and feedback are time-consuming and costly, automatic models aiming to provide such feedback have been developed, specifically models that detect whether certain content, i.e., key points, is included in learner’s speech. However, previous studies target only integrated test items where learners speak based on listened or read materials, and the data used are not publicly available. In this study, we construct a speech corpus for key point detection. We extend the target to test items where learners speak based on their own experiences and opinions, which show greater content diversity than integrated test items, using an approach that annotates content along with its connections. Analysis of the constructed data demonstrated that the annotated elements are associated with the speech content scores. We also found that large language models are generally successful at locating content element spans, although their predicted spans are often broader than human-annotated ones. The corpus and annotation guidelines are available at https://language.sakura.ne.jp/icnale/download.html.