DIDECO: An Annotated Dataset for Intent Detection in Digital Communications
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
This paper presents DIDECO, the first annotated dataset specifically designed for detecting both explicit and implicit intents in digital communications. We address a critical gap in cybersecurity research by developing a comprehensive taxonomy that distinguishes between explicit communicative goals (what is requested) and implicit persuasion mechanisms (how compliance is engineered). Grounded in Speech Act Theory and persuasion psychology principles, our taxonomy encompasses 20 distinct intent categories across explicit and implicit intents. We annotated 220 LLM-generated spear-phishing emails using a multi-label protocol with six trained annotators, yielding 2,162 intent annotations that reveal the layered complexity of malicious communications. Our analysis demonstrates that sophisticated attacks employ multiple concurrent intents, combining explicit communicative goals with implicit persuasion strategies. This dataset provides resources for developing intent-aware detection systems capable of identifying sophisticated social engineering attacks through semantic analysis.