Artful Writing, Authentic Emotions: Distinguishing Human-Written from LLM-Generated Metaphors by Annotation and Classification

Proceedings of Learning Non-Literal Expressions with Small Data @ LREC 2026

Abstract

We analyze differences between human-written and automatically generated metaphors. Using two syntactically standardized datasets containing novel metaphors from poetry and science communication, we generate new figurative expressions with LLMs that describe the same concepts as human-written texts. Using crowdsourcing, we conduct extensive annotation across multiple dimensions (e.g., writing quality and creativity) and ask annotators to judge whether the metaphor was generated automatically. For the poetry set, we also asked annotators for the emotions conveyed by the metaphor. We find that, consistent with prior work, the authorship of scientific metaphors is difficult to determine. However, our results reveal that human-written poetic metaphors stand out by their capacity to convey emotion. We also analyze which types of metaphors are merely perceived as human. Finally, we show that, while human annotators cannot distinguish human from machine metaphors, automated approaches achieve high accuracy in identifying human writers, which suggests substantial differences in text structure.