HomeLREC 2026WorkshopsREADIXTSARlrec2026-ws-readixtsar-01
Back to READIXTSAR 2026
LREC 2026workshop

Revisiting German Complex Word Identification: Contextualized LLMs and Feature Injection

Proceedings of the Joint Workshop on Readability and Text Simplification (READIxTSAR) @ LREC 2026

DOI:10.63317/4z33nthezmjs

Abstract

Complex word identification (CWI) is essential in text simplification, yet work on German CWI remains comparatively limited. To address this gap, we investigate the capabilities of three state-of-the-art LLMs and compare them to previously proposed baseline systems. We fine-tune the LLMs in three setups: (i) using the target expression only, (ii) using the target expression together with its sentence-level context, and (iii) using the context and injection of classical machine learning features. Our results show that while pretrained-only LLMs fall short, fine-tuned LLMs set new benchmarks for both binary and probabilistic CWI. In addition, embedding the target in its context sentence improves performance, whereas feature injection has no clearly measurable effect. All models in this paper are trained on the probabilistic CWI task and additionally evaluated on the binary task; thus, we publish a single model that supports both evaluation views We released all accompanying resources (https://github.com/tschomacker/german-cwi-llm) and model checkpoints (https://huggingface.co/collections/tschomacker/german-cwi-llm).

Details

Paper ID
lrec2026-ws-readixtsar-01
Pages
pp. 1-11
BibKey
schomacker-etal-2026-revisiting
Editors
Matthew Shardlow, Thomas François, Raquel Amaro, Jorge Baptista, Rémi Cardon, Eugénio Ribeiro, Horacio Saggion, Regina Stodden, Amalia Todirascu, Rodrigo Wilkens
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Joint Workshop on Readability and Text Simplification (READIxTSAR) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • TS

    Thorben Schomacker

  • SY

    Seid Muhie Yimam

  • CB

    Chris Biemann

  • MT

    Marina Tropmann-Frick

Links