From Consensus to Split Decisions: ABC-Stratified Sentiment in Holocaust Oral Histories

Proceedings of The Second Workshop on Holocaust Testimonies as Language Resources (HTRes)

Abstract

Polarity detection becomes substantially more challenging under domain shift, particularly in heterogeneous long-form narratives with complex discourse structure, such as Holocaust oral histories. This paper presents a corpus-scale diagnostic study of off-the-shelf sentiment classifiers on Holocaust oral histories, using three pretrained transformer-based polarity classifiers over a corpus comprising 107,304 utterances and 579,013 sentences. After assembling model outputs, we introduce an agreement-based stability taxonomy (ABC) to stratify inter-model output stability. We report pairwise percent agreement, Cohen’s κ, Fleiss’ κ, and row-normalized confusion matrices to localize systematic disagreement. As an external convergent descriptive signal, we apply a T5-based emotion classifier to stratified samples from each agreement stratum to compare emotion distributions across strata. The combination of multi-model label triangulation and the ABC taxonomy provides a cautious, interpretable framework for characterizing where and how sentiment models diverge in sensitive historical narratives. Inter-model agreement is low to moderate overall and is driven primarily by boundary decisions around neutrality.