CEFR Level Prediction for Short Russian L2 Texts: Evaluating Classifiers and Instruction-Based LLMs

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

This study explores the automated prediction of text complexity levels for short Russian texts on the Common European Framework of Reference for Languages (CEFR) scale. The dataset consists of 7,322 nonfictional fragments (15–30 words) extracted from textbooks for learners of Russian as a second language and filtered according to linguistic feature distributions typical of each CEFR level, with additional validation conducted by 4 human experts. Each text fragment was annotated with 127 linguistic features, including lexical, morphological, syntactic, and length-based characteristics. We evaluate several approaches to text complexity assessment: traditional machine learning classifiers, fine-tuned transformer models, and instruction-based large language models (LLMs). Among all models, RuBERT achieved the best strict F1-score (47.8%) and the lowest mean absolute error (0.56), while instruction-based LLMs such as YandexGPT captured overall complexity trends but underperformed in exact classification. Feature ablation experiments demonstrated that lexical features are the most informative for CEFR prediction. Our findings confirm that fine-tuned language models currently offer the most reliable results for short-text CEFR assessment in Russian, whereas instruction-based LLMs show potential for qualitative analysis of text difficulty patterns.