AdabEval 2026 Task B : Multi-Label Classification of Arabic Politeness Criteria in Social Media Media

The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) with 5 Shared Tasks

Abstract

We address the problem of multi-label classification of politeness and impoliteness criteria in Arabic social media posts, as defined in Subtask B of an Arabic politeness shared task The goal is to assign up to four labels from nine pragmatic categories, including Insult, Criticism, Respect, Prayers, and Hospitality, to each post. We first construct consistent multi-label annotations by mapping heterogeneous criterion strings into the official label set and analyzing their skewed distribution. To mitigate severe class imbalance, especially for rare categories such as Hospitality and Racism/Discrimination, we apply targeted oversampling of minority instances. Our modelling pipeline combines a TF–IDF + Logistic Regression baseline with two transformer-based encoders, MARBERT and AraBERT-twitter, trained for multi-label classification with Focal Loss. We then aggregate model outputs through a weighted ensemble and optimize per-class decision thresholds on a held-out validation set to improve macro-averaged F1. Experiments on the shared-task train/validation split show that the ensemble substantially outperforms the TF–IDF baseline and individual transformers, particularly on underrepresented categories, while maintaining competitive performance on frequent labels.