HomeLREC 2026WorkshopsNLPERSPECTIVESlrec2026-ws-nlperspectives-04
Back to NLPERSPECTIVES 2026
LREC 2026workshop

Quantifying and Predicting Disagreement in Graded Human Ratings

Proceedings of the the fifth edition of NLPerspectives

DOI:10.63317/4qy8nuowzhpy

Abstract

It is increasingly recognized that humans do not always agree, and disagreement is inherent in many annotation tasks. However, not all items in a given task elicit the same level of opinion divergence. In this paper, we study the extent to which item-level annotation variation and variation structure can be captured from text features, focusing on inappropriate language detection, including offensive language, hate speech, and toxic language detection. We model annotation variation to assess whether the degree of annotation divergence can be predicted from item-level textual features. We also propose the Opposition Index, a metric that quantifies the extent of opposing stances among annotators based on their Likert ratings.

Details

Paper ID
lrec2026-ws-nlperspectives-04
Pages
pp. 33-43
BibKey
zhang-etal-2026-quantifying
Editors
Shiran Dudy, Gavin Abercrombie, Valerio Basile, Elisa Leonardelli, Simona Frenda
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the the fifth edition of NLPerspectives
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • LZ

    Leixin Zhang

  • ÇÇ

    Çağrı Çöltekin

Links