HomeLREC 2026WorkshopsNSLPlrec2026-ws-nslp-23
Back to NSLP 2026
LREC 2026workshop

The Linguist’s Lie Detector: Linguistic Knowledge in Large Language Models

Proceedings of Natural Scientific Language Processing (NSLP) @ LREC 2026

DOI:10.63317/2f7zobe6yo7h

Abstract

We present a benchmark and evaluation pipeline for assessing how well large language models (LLMs) handle linguistic knowledge. Starting from a curated subcorpus of 11 syntax-focused articles published in Glossa: A Journal of General Linguistics (2016–2026), we design a pipeline that (1) segments article text into sentences, (2) extracts atomic, verifiable statements, and (3) classifies them into linguistic categories (language-specific, typological, theoretical, citation, or structural). Each stage is evaluated against human gold annotations produced by three annotators, with inter-annotator agreement measured via Krippendorff’s α and Cohen’s κ. We compare several LLMs on extraction and classification, using BERTScore-style similarity for extraction and macro F1 for classification. Finally, we generate contradictions of the true linguistic statements and test whether LLMs can distinguish true from false claims. On a challenge set of 705 linguistic statements, we compare eight LLMs, with Gemini 3 Flash achieving the highest F1 score of 0.66, indicating that current models possess limited but non-trivial linguistic knowledge.

Details

Paper ID
lrec2026-ws-nslp-23
Pages
pp. 235-246
BibKey
catalngris-etal-2026-linguist
Editors
Georg Rehm, Stefan Dietze, Danilo Dessi, Diana Maynard, Sonja Schimmler
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of Natural Scientific Language Processing (NSLP) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • LC

    Lucía Catalán Gris

  • KG

    Kim Gerdes

  • JL

    John S. Y. Lee

Links