Lexical and Discourse Semantics in a Reading-time Corpus of English

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

We present a novel language resource that combines a reading-time corpus, constructed in psycholinguistics, with rich lexical, compositional, and discourse meaning representation annotations. While existing psycholinguistic corpora typically provide morphological and syntactic annotations, no comparable corpora with comprehensive semantic information have been made available until now. We enriched the UCL corpus (361 sentences of self-paced reading, eye-tracking, and EEG data) with annotations in the style of the Parallel Meaning Bank (PMB) project, including WordNet synsets, VerbNet thematic roles, Combinatory Categorial Grammar (CCG) parses, and Discourse Representation Theory (DRT) structures. We demonstrate the utility of this resource through two case studies examining (1) encoding interference effects due to gender similarity and (2) integration costs in semantic role assignment. Both studies reveal processing patterns consistent with established psycholinguistic theories and/or previous findings. This resource fills a significant gap in psycholinguistic research, enabling the evaluation of semantic processing theories on naturalistic corpus data and extending the existing pool of annotated reading-time corpora. It should be useful to psycholinguists, as well as to cognitive scientists interested in language processing.