Corpus-Linguists’ Little Helpers? Evaluating LLMs for Linguistic Annotation: The Case of Sensationalist Headlines Corpus
Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages
Abstract
Manual annotation of pragmastylistic features in sensationalist media is a resource-intensive bottleneck for corpus- based research, particularly for lower-resource languages. This paper evaluates whether Large Language Models (LLMs) can reliably automate this process. We benchmark two proprietary models, OpenAI’s GPT-5 and Google’s Gemini 2.5 Pro, on annotating eight sensationalist linguistic and orthographic features within a corpus of 508 Serbian celebrity magazine headlines. Our methodology involves a systematic comparison of five prompting strategies: zero-shot, few-shot (1, 3, and 5 examples), and chain-of-thought. Results demonstrate that LLMs can achieve high alignment with a manually curated gold standard, reaching a peak macro-F1 score of 98.76%. Notably, the most effective and cost-efficient configuration was GPT-5 using a simple zero-shot prompt. Qualitative error analysis reveals that remaining inaccuracies are systematic, primarily involving pragmatic conventions, discourse scope, and quoted speech. We conclude that LLMs are viable for first-pass annotation of well-defined features in Serbian, though implicit and genre-dependent cues require further study. To support reproducibility and future research on underrepresented languages, we provide our full prompting setup, evaluation procedures, and a detailed cost comparison.