Getting Close to Cloze: Investigating Language Model and Human Cloze-test Performance in Afrikaans
Proceedings of Resources for African Indigenous Languages (RAIL) 2026 @ LREC 2026
Abstract
Models that can estimate the readability of a given text automatically are a valuable resource for any language. There are however many languages for which such models do not work well or simply do not exist yet. In this paper, we lay the groundwork for developing a high-quality application for Afrikaans by having encoder-only language models (LMs) complete a set of cloze tests already completed by humans. Strong correlation between the cloze-test performance of humans and an LM is an indication that the LM could possibly serve as a proxy for human participants. We show that the output of models trained on (some) Afrikaans correlates reasonably well with human answers, underscoring the potential of LMs to be used in automatic readability assessment. A more fine-grained analysis confirms that the correlation is not driven by only a few strongly correlating word classes, but spread relatively evenly over all word classes. We further establish by means of a manual evaluation that, in cases where the cloze-test performance of humans and an LM correlate strongly because both were wrong, LM answers tend to be further off than human answers for the same cloze items. It is noteworthy that the model with the best correlation, afRoBERTa (r=0.62; Spearman’s ρ=0.62), is neither the most accurate nor the largest model, but a model trained on Afrikaans only, showing the benefit of small, monolingual LMs compared to large, multilingual models for specific purposes.