A Meta-evaluation of Automatic Metrics for Elaborative Simplification

Proceedings of the Joint Workshop on Readability and Text Simplification (READIxTSAR) @ LREC 2026

Abstract

Elaborative simplification aims to improve the readability of texts by adding content that helps the readers. However, evaluating these elaborations remains challenging due to their subjective nature and the lack of suitable annotated datasets. To support the evaluation of elaborative simplification models, we introduce a new dataset with human ratings of elaborations generated by Large Language Models (LLMs), focusing on two quality criteria: cohesion and informativeness. Using these human judgments as a reference, we conduct a meta-evaluation of existing automatic evaluation approaches, with a focus on LLM-as-a-judge strategies. Our experiments suggest that evaluations made by smaller LLMs correlate poorly with human judgments, while larger models with structured prompting exhibit higher agreement. Informativeness evaluation proved to be challenging due to its subjectivity, as evidenced by the low inter-annotator agreement compared to cohesion.

Resources

Details

Paper ID

lrec2026-ws-readixtsar-15

Pages

pp. 193-209

DOI

10.63317/3bhnb2uoif7o

BibKey

alshatti-etal-2026-meta

Editors

Matthew Shardlow, Thomas François, Raquel Amaro, Jorge Baptista, Rémi Cardon, Eugénio Ribeiro, Horacio Saggion, Regina Stodden, Amalia Todirascu, Rodrigo Wilkens

Publisher

European Language Resources Association (ELRA)

ISSN

N/A

ISBN

N/A

Workshop

Proceedings of the Joint Workshop on Readability and Text Simplification (READIxTSAR) @ LREC 2026

Location

Palma, Mallorca, Spain

Date

11 - 16 May 2026

Authors

AA
Abdullah Alshatti
SS
Steven Schockaert
FA
Fernando Alva-Manchego

Links

URL

DOI