Back to Main Conference 2026
LREC 2026main

Harnessing Synergy in Context and Emoji for Joint Detection of Harmful Online Content in Multi-turn Conversations

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4rii4qtzbpew

Abstract

Detecting harmful content, such as cyberbullying, self-harm, and grooming, in self-generated content or conversations is an emerging research area with significant potential for positive social impact. However, challenges such as the scarcity of real-world conversational data, labor-intensive annotation processes, and inconsistent content policies hinder understanding and evaluating the performance of harmful content detection systems. In this study, we utilize openly available forum data to construct conversation proxies, facilitating the analysis and detection of harmful content. We undertook extensive efforts to label the conversational data using a consistent content policy developed by experts, with ten annotators contributing to the labeling process. Our experiments investigated the impact of context window size and found that performance in joint detection improved gradually up to a context window of 16 sentences, after which performance plateaued. Additionally, experiments with emojis demonstrated that using a tokenizer capable of decoding emojis yielded the best performance, while either removing emojis or converting them to text resulted in inferior outcomes.

Details

Paper ID
lrec2026-main-791
Pages
pp. 10082-10092
BibKey
hu-etal-2026-harnessing
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • FH

    Feiyan Hu

  • CB

    Ciara Anne Byrne

  • JZ

    Jiang Zhou

  • RM

    Rena Maycock

  • ML

    Mark Langan

Links