Back to Main Conference 2026
LREC 2026main

Reading Time in the Wild: An Assessment of Readability Predictors Based on Naturally-Observed Reading Times

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/56xa82ywv9us

Abstract

Reading time has surfaced as a viable proxy for readability and comprehension. However, most studies used reading times obtained in controlled experimental settings with eye-tracking or self-paced reading tasks, which differs from uncontrolled, more naturalistic reading behaviour in the wild. Through a collaboration with a newspaper, we have access to a dataset of Dutch news articles with corresponding clickstream reading times averaged across thousands of readers. To address the issue, we evaluate how well common proxies for readability and comprehension hold on data from online readers. We first group the proxies in four dimensions and compute the correlation between the proxies and the average reading time per token for each dimension. Then we assess if the proxies can meaningfully predict reading time per token. The results are surprising: we find no meaningful correlation between any proxy and the average reading time per token, nor can any proxy be used for reliable prediction. Additionally, we rerun the prediction on corresponding, automatically simplified texts and surprisingly find increased predicted reading times per token. These results imply that clickstream reading time must be considered with caution as a proxy for readability or comprehension.

Details

Paper ID
lrec2026-main-572
Pages
pp. 7209-7224
BibKey
vaals-etal-2026-reading
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • SV

    Sijbren van Vaals

  • RN

    Rik van Noord

  • MN

    Malvina Nissim

Links