SENTA: Sentence Simplification System for Slovene
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Abstract
Ensuring universal access to written content, regardless of users’ language proficiency and cognitive abilities, is of paramount importance. Sentence simplification, which involves converting complex sentences into more accessible forms while preserving their meaning, plays a crucial role in enhancing text accessibility. This paper introduces SENTA, a system for sentence simplification in Slovene. The system consists of two components. First, a neural classifier identifies sentences that require simplification, and second, a large Slovene language model based on T5 architecture is fine-tuned to transform complex texts into a simpler form, achieving an excellent SARI score of 41. Both automatic and qualitative evaluations provide important insights into the problem, highlighting areas for future research in multilingual applications, and fluency maintenance. Finally, SENTA is integrated into a freely accessible, user-friendly user interface, offering a valuable service to less-fluent Slovene users.