HomeLREC 2026WorkshopsREADIXTSARlrec2026-ws-readixtsar-03
Back to READIXTSAR 2026
LREC 2026workshop

Taming CATS: Controllable Automatic Text Simplification through Instruction Fine-Tuning with Control Tokens

Proceedings of the Joint Workshop on Readability and Text Simplification (READIxTSAR) @ LREC 2026

DOI:10.63317/2zbasvvoqiae

Abstract

Controllable Automatic Text Simplification (CATS) produces user-tailored outputs, yet controllability is often treated as a decoding problem and evaluated with metrics that are not reflective to the measure of control. We observe that controllability in ATS is significantly constrained by data and evaluation. To this end, we introduce a domain-agnostic CATS framework based on instruction fine-tuning with discrete control tokens, steering open-source models to target readability levels and compression rates. Across three model families with different model sizes (Llama, Mistral, Qwen; 1-14B) and four domains (medicine, public administration, news, encyclopedic text), we find that smaller models (1-3B) can be competitive, but reliable controllability strongly depends on whether the training data encodes sufficient variation in the target attribute. Readability control (FKGL, ARI, Dale-Chall) is learned consistently, whereas compression control underperforms due to limited signal variability in the existing corpora. We further show that standard simplification and similarity metrics are insufficient for measuring control, motivating error-based measures for target-output alignment. Finally, our sampling and stratification experiments demonstrate that naive splits can introduce distributional mismatch that undermines both training and evaluation.

Details

Paper ID
lrec2026-ws-readixtsar-03
Pages
pp. 26-48
BibKey
hubarava-etal-2026-taming
Editors
Matthew Shardlow, Thomas François, Raquel Amaro, Jorge Baptista, Rémi Cardon, Eugénio Ribeiro, Horacio Saggion, Regina Stodden, Amalia Todirascu, Rodrigo Wilkens
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Joint Workshop on Readability and Text Simplification (READIxTSAR) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • HH

    Hanna Hubarava

  • YG

    Yingqiang Gao

Links