Exploring BERT-Based Classification Models for Detecting Phobia Subtypes: A Novel Tweet Dataset and Comparative Analysis

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/442f44giniib

Abstract

Phobias, characterized by irrational fears of specific objects or situations, can profoundly affect an individual’s quality of life. This research presents a comprehensive investigation into phobia classification, where we propose a novel dataset of 811,569 English tweets from user timelines spanning 102 phobia subtypes over six months, including 47,614 self-diagnosed phobia users. BERT models were leveraged to differentiate non-phobia from phobia users and classify them into 65 specific phobia subtypes. The study produced promising results, with the highest f1-score of 78.44% in binary classification (phobic user or not phobic user) and 24.01% in a multi-class classification (detecting the specific phobia subtype of a user). This research provides insights into people with phobias on social media and emphasizes the capacity of natural language processing and machine learning to automate the evaluation and support of mental health.