MultiCoS: A Multilingual Dataset of Connective Semantics with Context–Sentence Compatibility
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
We present a multilingual dataset of connective semantics. The dataset contains the semantic annotations of clausal connectives (e.g. and and or in English) from 24 languages, based on our original native-speaker elicitation data. Unlike existing lexica on connectives, the dataset includes systematic evidence for the annotations in the form of context-sentence compatibility judgments, including negative evidence. The paper describes the methodology of data collection and the format of the dataset. We also discuss its potential use cases for the validation of cross-linguistic generalizations, examinations of their potential counterexamples, and for benchmarking felicity judgments by NLU systems.