Back to Main Conference 2026
LREC 2026main

A Benchmark Corpus for the Diagnostic Assessment of Content in L2 English Speech

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/56kmiu3fnmbt

Abstract

When evaluating second language (L2) learners’ speech, human raters pay significant attention to its content, and diagnostic feedback on content helps improve learners’ speaking ability. Since human scoring and feedback are time-consuming and costly, automatic models aiming to provide such feedback have been developed, specifically models that detect whether certain content, i.e., key points, is included in learner’s speech. However, previous studies target only integrated test items where learners speak based on listened or read materials, and the data used are not publicly available. In this study, we construct a speech corpus for key point detection. We extend the target to test items where learners speak based on their own experiences and opinions, which show greater content diversity than integrated test items, using an approach that annotates content along with its connections. Analysis of the constructed data demonstrated that the annotated elements are associated with the speech content scores. We also found that large language models are generally successful at locating content element spans, although their predicted spans are often broader than human-annotated ones. The corpus and annotation guidelines are available at https://language.sakura.ne.jp/icnale/download.html.

Details

Paper ID
lrec2026-main-146
Pages
pp. 1869-1877
BibKey
doi-etal-2026-benchmark
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • KD

    Kosuke Doi

  • JV

    Justin Vasselli

  • TW

    Taro Watanabe

Links