Back to Main Conference 2026
LREC 2026main

Push and Pull: Training Sentence Encoders with Contrastive Losses for Distance-Based Multi-Label Text Classification

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4fiaod5mcdsr

Abstract

Despite the potential of Distance-Based Classification (DBC), a method that assigns labels to text by measuring semantic similarity between the text and the label representations, it has received very little attention for Multi-Label Text Classification (MLTC). Previous studies have focused on determining optimal thresholds, reaching promising results with contextual sentence encoders. We demonstrate that the performance of these models can be further improved by training them with contrastive losses, i.e., by bringing text representations closer to the corresponding true label representations in an embedding space. Using three supervised contrastive losses and three sentence encoders (Stella, GIST-Large, and BGE), we evaluated our approach on five English datasets (SemEval, BioTech, Reuters, AAPD, and LitCovid) and one Dutch dataset (EventDNA). The results show consistent substantial improvements over base sentence encoders, thereby narrowing the gap between DBC methods and fine-tuned or zero-shot approaches.

Details

Paper ID
lrec2026-main-583
Pages
pp. 7359-7379
BibKey
nooten-etal-2026-push
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • JN

    Jens Van Nooten

  • AK

    Andriy Kosar

Links