RU-ADEPT: Russian Anonymized Dataset with Eight Personality Traits
Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)
Abstract
Social media has provided a platform for many individuals to easily express themselves naturally and publicly, and researchers have had the opportunity to utilize large quantities of this data to improve author trait analysis techniques and to improve author trait profiling systems. The majority of the work in this area, however, has been narrowly spent on English and other Western European languages, and generally focuses on a single social network at a time, despite the large quantity of data now available across languages and differences that have been found across platforms. This paper introduces RU-ADEPT, a dataset of Russian authors’ personality trait scores–Big Five and Dark Triad, demographic information (e.g. age, gender), with associated corpus of the authors’ cross-contributions to (up to) four different social media platforms–VKontakte (VK), LiveJournal, Blogger, and Moi Mir. We believe this to be the first publicly-available dataset associating demographic and personality trait data with Russian-language social media content, the first paper to describe the collection of Dark Triad scores with texts across multiple Russian-language social media platforms, and to a limited extent, the first publicly-available dataset of personality traits to author content across several different social media sites.