Back to Main Conference 2026
LREC 2026main

Using LLMs to Extract Instances of Schematic Constructions from Unannotated L2 Learner Corpora

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/3ieeohd75wkj

Abstract

Our previous study found that generative LLMs can be successfully used to identify instances of schematic constructions (as defined in Construction Grammar) in unannotated L1 corpus data. This study tests the applicability of LLMs to also identify instances of constructions in unannotated L2 data. L2 learner corpora are notoriously difficult to annotate and query since they contain errors. Using LLMs can thus simplify the retrieval of construction data from L2 corpora. The identification of instances of constructions in L2 learner data has many possible uses in pedagogical applications of Construction Grammar and constructicography, like the identification of error-prone (properties of) constructions and the distribution of constructional instances across CEFR levels. Using the Estonian Nominal Quantifier Construction as the example construction and an Estonian CEFR-graded learner corpus as the source of L2 data, we tested several prompts and several models (OpenAI’s o3-mini, o3, gpt-5-mini and gpt-5, Google DeepMind’s Gemini Flash 2.5, Anthropic’s Claude Sonnet 4.5 and Opus 4.1). We found that the best model, gpt-5, achieved F1-scores from 0.90 to 0.96, depending on the level of detail of the prompt.

Details

Paper ID
lrec2026-main-824
Pages
pp. 10517-10524
BibKey
kallas-etal-2026-llms
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • JK

    Jelena Kallas

  • AK

    Ahto Kiil

  • HS

    Heete Sahkai

  • GP

    Geda Paulsen

  • KS

    Kertu Saul

Links