Back to Main Conference 2026
LREC 2026main

Can LLMs Understand Punchlines? LLMs' Narrative Understanding Evaluation with Short-shorts

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4n2p36736i24

Abstract

In this study, we constructed a narrative comprehension benchmark using the works of Shinichi Hoshi to examine the extent to which Large Language Models (LLMs) can understand twist endings, or punchlines, in short-short stories. Specifically, story endings were categorized into six types—such as Revelation, Apocalypse, and Sarcasm—and a classification task was designed in which LLMs were prompted with the story text and asked to select the appropriate ending category. We collected human annotations from eight native Japanese speakers to establish a reference benchmark. Experimental comparisons were conducted across multiple LLMs (GPT-4, Claude, Gemini, and Grok), assessing their performance both at the metric level and at the discourse level against human judgments. The results revealed that although certain models approached human performance in specific categories, overall accuracy remained notably lower than the human baseline. Through quantitative and qualitative analyses, this study highlights the challenges LLMs face in capturing narrative subtleties such as irony, implication, and emotional reversal. The proposed benchmark provides a novel framework for evaluating narrative understanding and the deeper semantic reasoning capabilities of LLMs.

Details

Paper ID
lrec2026-main-159
Pages
pp. 2024-2034
BibKey
cheng-etal-2026-can
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • JC

    Jiashi Cheng

  • TU

    Takehito Utsuro

Links