Back to Main Conference 2022
LREC 2022main

RED v2: Enhancing RED Dataset for Multi-Label Emotion Detection

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/29uucsxbh9hd

Abstract

RED (Romanian Emotion Dataset) is a machine learning-based resource developed for the automatic detection of emotions in Romanian texts, containing single-label annotated tweets with one of the following emotions: joy, fear, sadness, anger and neutral. In this work, we propose REDv2, an open-source extension of RED by adding two more emotions, trust and surprise, and by widening the annotation schema so that the resulted novel dataset is multi-label. We show the overall reliability of our dataset by computing inter-annotator agreements per tweet using a formula suitable for our annotation setup and we aggregate all annotators’ opinions into two variants of ground truth, one suitable for multi-label classification and the other suitable for text regression. We propose strong baselines with two transformer models, the Romanian BERT and the multilingual XLM-Roberta model, in both categorical and regression settings.

Details

Paper ID
lrec2022-main-149
Pages
pp. 1392-1399
BibKey
ciobotaru-etal-2022-red
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • AC

    Alexandra Ciobotaru

  • MC

    Mihai Vlad Constantinescu

  • LD

    Liviu P. Dinu

  • SD

    Stefan Dumitrescu

Links