HomeLREC 2026WorkshopsDETERMITlrec2026-ws-determit-02
Back to DETERMIT 2026
LREC 2026workshop

A Benchmark for Overgeneration Detection in Biomedical Text Simplification

Proceedings of the 2nd Workshop on Evaluating Text Difficulty in a Multilingual Context (DeTermIt! 2026)

DOI:10.63317/3aodve3ow6a7

Abstract

Large Language Models deployed for biomedical text simplification frequently produce overgeneration: extraneous content appended beyond the faithful simplification, including leaked model instructions, ungrounded medical claims, and repetitive text. Despite its prevalence, this failure mode remains largely unaddressed. We present a benchmark for document-level overgeneration detection, releasing two resources: SimpleOG-manual, 500 abstract-level examples with human-validated positive labels, and SimpleOG-auto, over 46,000 automatically labeled abstract-level examples derived from submissions to the CLEF 2025 SimpleText Track. Our method exploits the positional regularity of overgeneration in simplification output through sequence alignment, identifying trailing content that lacks a corresponding segment in the source. Human validation of 117 automatically flagged positives confirms ∼95% precision, with leaked model instructions accounting for 75.7% of confirmed cases. Analysis across teams and models reveals that overgeneration is primarily driven by system-level choices, such as prompting and post-processing, rather than by model architecture. We evaluate three detection paradigms and find that sentence similarity (F1 = 0.731, ROC-AUC = 0.915) surprisingly outperforms both NLI-based and LLM-based approaches, suggesting that overgenerated content occupies distinct semantic regions from source material.

Details

Paper ID
lrec2026-ws-determit-02
Pages
pp. 12-21
BibKey
chakar-etal-2026-benchmark
Editors
Giorgio Maria Di Nunzio, Federica Vezzani, Liana Ermakova, Hosein Azarbonyad, Jaap Kamps
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 2nd Workshop on Evaluating Text Difficulty in a Multilingual Context (DeTermIt! 2026)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • BC

    Berkay Chakar

  • LE

    Liana Ermakova

  • JK

    Jaap Kamps

Links