Back to Main Conference 2026
LREC 2026main

A Corpus of Misunderstood Irony on Turkish Social Media

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/3kehaa7yjjqc

Abstract

We present a new Turkish social media corpus annotated for verbal irony. The ironic post candidates are identified by a distant supervision method relying on reports of misunderstood irony in social media platforms. The data collected through this method, as well as irony-tagged posts and a random sample of posts are annotated by three annotators, resulting in a corpus of 3000 tweets with high quality annotations that may be useful for linguistic analysis as well as for training automatic irony detection systems or testing irony understanding of large language models. Since irony interpretation typically involves context, our dataset also includes the preceding conversational context of the potentially ironic expression. Besides the description of the corpus and the annotation process, this paper presents an analysis of the corpus. Our findings indicate that relying on distant supervision alone may result in suboptimal labels for irony/sarcasm corpora. We also investigate the usefulness of context for the annotators in identifying irony.

Details

Paper ID
lrec2026-main-879
Pages
pp. 11252-11259
BibKey
ltekin-etal-2026-corpus
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • ÇÇ

    Çağrı Çöltekin

  • GG

    Güliz Güneş

Links