Back to Main Conference 2026
LREC 2026main

LLMs as Annotators: Evaluating Model–Human Alignment in Detecting Contentious Language in Historical Corpora

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/3dhy55mxo9zb

Abstract

Historical texts often contain terminology that reflects outdated or harmful social values. Identifying such contentious terms is essential for the Galleries, Libraries, Archives, and Museums (GLAM) community, but manual annotation requires cultural expertise and is difficult to scale. This study evaluates whether large language models (LLMs) can support this process by aligning with human judgments of contentiousness in historical Dutch corpora. Using the Dutch Contentious Contexts Corpus (ConConCor), we formalize the task as context-dependent binary classification and compare two LLMs across multiple prompt configurations and evaluation scenarios. The models achieve near-human-level agreement on explicit cases but diverge when contextual or historical reasoning is required. Analysis of disagreement patterns shows that LLMs capture overtly harmful expressions yet tend to over-predict contentiousness for identity-related and colonial terms and under-predict for semantically shifted or figurative uses. These findings suggest that LLMs can act as auxiliary annotators for sensitive language detection in historical materials, provided that human oversight and contextual interpretation remain central to annotation workflows.

Details

Paper ID
lrec2026-main-852
Pages
pp. 10883-10896
BibKey
zhao-etal-2026-llms
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • YZ

    Yahui Zhao

  • CS

    Clemencia Siro

  • LH

    Laura Hollink

Links