Request Correction

Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.

Correction Guidelines

Click the edit button next to a field to report a correction.
Fill in the suggested correction value for each field you want to correct.
Provide your name and email so we can contact you if needed.

View all submitted correction requests

Paper Information

lrec2026-ws-rail-11

Getting Close to Cloze: Investigating Language Model and Human Cloze-test Performance in Afrikaans

View lrec2026-ws-rail-11.pdf

Paper Fields

Click the edit button next to a field to report a correction.

Title

Getting Close to Cloze: Investigating Language Model and Human Cloze-test Performance in Afrikaans

Abstract

Models that can estimate the readability of a given text automatically are a valuable resource for any language. There are however many languages for which such models do not work well or simply do not exist yet. In this paper, we lay the groundwork for developing a high-quality application for Afrikaans by having encoder-only language models (LMs) complete a set of cloze tests already completed by humans. Strong correlation between the cloze-test performance of humans and an LM is an indication that the LM could possibly serve as a proxy for human participants. We show that the output of models trained on (some) Afrikaans correlates reasonably well with human answers, underscoring the potential of LMs to be used in automatic readability assessment. A more fine-grained analysis confirms that the correlation is not driven by only a few strongly correlating word classes, but spread relatively evenly over all word classes. We further establish by means of a manual evaluation that, in cases where the cloze-test performance of humans and an LM correlate strongly because both were wrong, LM answers tend to be further off than human answers for the same cloze items. It is noteworthy that the model with the best correlation, afRoBERTa (r=0.62; Spearman’s ρ=0.62), is neither the most accurate nor the largest model, but a model trained on Afrikaans only, showing the benefit of small, monolingual LMs compared to large, multilingual models for specific purposes.

Authors

Expand an author to correct their information. Use the remove button to request author removal, or add a new author.

PDF Attachment

You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.

Drag & drop a PDF here, or click to select

Your Information

Name

Comment

Author Declaration *

I declare that I have notified all co-authors of the proposed corrections and obtained their consent, and that all modifications adhere to research ethics standards and the LREC correction policy.

Select at least one field to correct using the edit buttons above.