A Corpus of Misunderstood Irony on Turkish Social Media

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

We present a new Turkish social media corpus annotated for verbal irony. The ironic post candidates are identified by a distant supervision method relying on reports of misunderstood irony in social media platforms. The data collected through this method, as well as irony-tagged posts and a random sample of posts are annotated by three annotators, resulting in a corpus of 3000 tweets with high quality annotations that may be useful for linguistic analysis as well as for training automatic irony detection systems or testing irony understanding of large language models. Since irony interpretation typically involves context, our dataset also includes the preceding conversational context of the potentially ironic expression. Besides the description of the corpus and the annotation process, this paper presents an analysis of the corpus. Our findings indicate that relying on distant supervision alone may result in suboptimal labels for irony/sarcasm corpora. We also investigate the usefulness of context for the annotators in identifying irony.