Back to Main Conference 2026
LREC 2026main

Mute Cods: A Multilingual Telegram Dataset with Benchmark Models for Conspiracy Theory Detection

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/48xmoewrde3v

Abstract

The proliferation of conspiracy theories and hateful messages on social media poses significant challenges for content moderation and public discourse. Despite their societal impact, existing datasets for automated conspiracy detection remain limited in scope and language coverage. We present a multilingual dataset of conspiracy content on Telegram comprising 5750 messages across English, Dutch, Italian, Spanish and Portuguese from 87 channels documented as disseminating conspiracist and extremist content. Domain experts annotated messages for conspiracist tone, population replacement conspiracy theories, vaccine conspiracies, and hate speech. We extensively report on difficulties and caveats when creating and annotating this type of dataset. We establish classification baselines by evaluating six models in zero-shot fashion and fine-tuning three encoder models, achieving F1 scores up to 0.800 for conspiracist tone, 0.846 for PRCT, 0.843 for vaccine-related conspiracy theories, and 0.734 for hate speech. Inter-annotator agreement was moderate, consistent with the complexity documented in similar annotation tasks.

Details

Paper ID
lrec2026-main-582
Pages
pp. 7345-7358
BibKey
laken-etal-2026-mute
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • KL

    Katarina Laken

  • EM

    Erik Bran Marino

  • PP

    Paloma Piot

  • DB

    Davide Bassi

  • SF

    Søren Kirkegaard Fomsgaard

  • MM

    Michele Joshua Maggini

  • RV

    Renata Vieira

  • MG

    Marcos Garcia

  • ST

    Sara Tonelli

Links