HomeLREC 2026WorkshopsNLPERSPECTIVESlrec2026-ws-nlperspectives-09
Back to NLPERSPECTIVES 2026
LREC 2026workshop

SubData: Bridging Heterogeneous Datasets to Enable Theory-Driven Evaluation of Political and Demographic Perspectives in LLMs

Proceedings of the the fifth edition of NLPerspectives

DOI:10.63317/2uppkbro3uvq

Abstract

As increasingly capable large language models (LLMs) emerge, researchers have begun exploring their potential for subjective tasks. While recent work demonstrates that LLMs can be aligned with diverse human perspectives, evaluating this alignment on downstream tasks (e.g., hate speech detection) remains challenging due to the use of inconsistent datasets across studies. To address this issue, in this resource paper we propose a two-step framework: we (1) introduce SubData, an open-source Python library designed for standardizing heterogeneous datasets to evaluate LLMs perspective alignment; and (2) present a theory-driven approach leveraging this library to test how differently-aligned LLMs (e.g., aligned with different political viewpoints) classify content targeting specific demographics. SubData’s flexible mapping and taxonomy enable customization for diverse research needs, distinguishing it from existing resources. We illustrate its usage with an example application and invite contributions to extend our initial release into a multi-construct benchmark suite for evaluating LLMs perspective alignment on natural language processing tasks.

Details

Paper ID
lrec2026-ws-nlperspectives-09
Pages
pp. 84-97
BibKey
bernardelle-etal-2026-subdata
Editors
Shiran Dudy, Gavin Abercrombie, Valerio Basile, Elisa Leonardelli, Simona Frenda
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the the fifth edition of NLPerspectives
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • PB

    Pietro Bernardelle

  • LF

    Leon Froehling

  • SC

    Stefano Civelli

  • GD

    Gianluca Demartini

Links