HomeLREC 2026WorkshopsCASlrec2026-ws-cas-01
Back to CAS 2026
LREC 2026workshop

Quality and Agreement in Multilabel Emotion Annotation: A Case Study and Evaluation Framework

Proceedings of Computational Affective Science (CAS) @ LREC 2026

DOI:10.63317/3mad4saen3x8

Abstract

Emotion annotation is inherently subjective, yet most NLP pipelines still assume “gold” labels, typically produced by majority voting, and treat annotator variation as noise. In this paper, we present a multilabel emotion annotation case study and use it to examine how annotator behavior and aggregation choices affect both agreement estimates and downstream emotion classifiers. Rather than collapsing disagreement into a single label, we represent targets as soft vote-share labels (including an intensity-weighted variant) and evaluate models using both thresholded metrics (macro-/micro-F1) and probabilistic alignment (Bernoulli cross-entropy SoftBCE), alongside data-derived disagreement diagnostics. Across annotation regimes, we show that disagreement is structured and leaves measurable traces in model behavior: hard labels may maximize F1 metrics, while soft supervision yields predictions that better reflect empirical annotator variance and uncertainty. Our results provide practical guidance for designing, aggregating, and evaluating multilabel emotion datasets when multiple interpretations are plausible.

Details

Paper ID
lrec2026-ws-cas-01
Pages
pp. 1-15
BibKey
ohman-etal-2026-quality
Editors
Christopher Bagdon, Krishnapriya Vishnubhotla, Kristen A. Lindquist, Lyle Ungar, Roman Klinger, Saif M. Mohammad
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of Computational Affective Science (CAS) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • EO

    Emily Sofi Ohman

  • AK

    Anna Koufakou

Links