Cross-linguistic Readability and Controllable Difficulty: A Corpus-Based Comparison of Human and LLM Translations of Children’s Literature in Romanian
Proceedings of the 2nd Workshop on Evaluating Text Difficulty in a Multilingual Context (DeTermIt! 2026)
Abstract
Translation can systematically alter text difficulty, particularly when moving into morphologically rich languages. This study examines whether readability-constrained Large Language Models (LLMs) can mitigate difficulty shifts observed in English–Romanian translation of children’s literature. We construct a paired four-condition corpus comprising English originals, published Romanian translations, readability-constrained LLM translations, and human readability adaptations (12 aligned passages; approx. 23,000 words). Readability is assessed using a Romanian grade-level index (LEMI) designed to be educationally comparable to Flesch–Kincaid Grade Level (FKGL), the cross-linguistic LIX metric, and morphologically informed measures derived from spaCy. Published Romanian translations are significantly more difficult than their English originals, showing higher LIX scores, grade-level estimates, and increased morphological variation. Readability-constrained LLM translation substantially reduces difficulty relative to the published versions (median delta approx. −1.46 grade levels), with significant decreases in LIX, morphological feature density, and lexical diversity (MTLD). Human adaptation yields a smaller reduction (median delta approx. −0.26). Although the direct comparison between LLM and human adaptation is marginal (p = .055, r = 0.64), LLM outputs generally produce larger reductions. These findings demonstrate that translation-induced difficulty shifts are measurable and that controllable LLM translation can modulate readability across structural, lexical, and morphological dimensions in multilingual educational contexts.