Back to Main Conference 2026
LREC 2026main

MindSET: Advancing Mental Health Benchmarking through Large-Scale Social Media Data

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4cdunjq3bziz

Abstract

Social media data has become a vital resource for studying mental health, offering real-time insights into thoughts, emotions, and behaviors that traditional methods often miss. Progress in this area has been facilitated by benchmark datasets for mental health analysis; however, most existing benchmarks have become outdated due to limited data availability, inadequate cleaning, and the inherently diverse nature of social media content (e.g., multilingual and harmful material). We present a new benchmark dataset, MindSET, curated from Reddit using self-reported diagnoses to address these limitations. The annotated dataset contains over 13M annotated posts across seven mental health conditions—more than twice the size of previous benchmarks. To ensure data quality, we applied rigorous preprocessing steps, including language filtering, and removal of Not Safe for Work (NSFW) and duplicate content. We further performed a linguistic analysis using LIWC to examine psychological term frequencies across the eight groups represented in the dataset. To demonstrate the dataset’s utility, we conducted binary classification experiments for diagnosis detection using both fine-tuned language models and Bag-of-Words (BoW) features. Models trained on MindSET consistently outperformed those trained on previous benchmarks, achieving up to an 18-point improvement in F1 for Autism detection. Overall, MindSET provides a robust foundation for researchers exploring the intersection of social media and mental health, supporting both early risk detection and deeper analysis of emerging psychological trends.

Details

Paper ID
lrec2026-main-878
Pages
pp. 11241-11251
BibKey
mankarious-etal-2026-mindset
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • SM

    Saad Mankarious

  • EK

    Edward Kempa

  • DW

    Daniel Wiechmann

  • EK

    Elma Kerz

  • YQ

    Yu Qiao

  • AZ

    Ayah Zirikly

Links