Back to Main Conference 2026
LREC 2026main

SynBullying: A Multi-LLM Synthetic Conversational Dataset for Cyberbullying Detection

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4np8biner769

Abstract

We introduce SynBullying, a synthetic multi-LLM conversational dataset for studying and detecting cyberbullying (CB). SynBullying provides a scalable and ethically safe alternative to human data collection by leveraging large language models (LLMs) to simulate realistic bullying interactions. The dataset offers (i) conversational structure, capturing multi-turn exchanges rather than isolated posts; (ii) context-aware annotations, where harmfulness is assessed within the conversational flow considering context, intent, and discourse dynamics; and (iii) fine-grained labeling, covering various CB categories for detailed linguistic and behavioral analysis. We evaluate SynBullying across five dimensions, including conversational structure, lexical patterns, sentiment/toxicity, role dynamics, harm intensity, and CB-type distribution. We further examine its utility by testing its performance as standalone training data and as an augmentation source for CB classification.

Details

Paper ID
lrec2026-main-578
Pages
pp. 7292-7306
BibKey
kazemi-etal-2026-synbullying
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • AK

    Arefeh Kazemi

  • HQ

    Hamza Qadeer

  • JW

    Joachim Wagner

  • HH

    Hossein Hosseini

  • SK

    Sri Balaaji Natarajan Kalaivendan

  • BD

    Brian Davis

Links