Back to Main Conference 2026
LREC 2026main

ADAB: Arabic Dataset for Automated Politeness Benchmarking - a Large-Scale Resource for Computational Sociopragmatics

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/559a7pqqchr2

Abstract

The growing importance of culturally-aware natural language processing systems has led to an increasing demand for resources that capture sociopragmatic phenomena across diverse languages. Nevertheless, Arabic-language resources for politeness detection remain severely under-explored, despite the rich and complex politeness expressions deeply embedded in Arabic communication. In this paper, a new annotated Arabic dataset, called ADAB/أدب (Arabic Politeness Dataset), was generated and carefully collected from four diverse online platforms including social media, e-commerce, and customer service domains, encompassing both Modern Standard Arabic (MSA) and multiple dialectal varieties (Gulf, Egyptian, Levantine, and Maghrebi). This dataset has undergone a thorough annotation process guided by Arabic linguistic traditions and contemporary pragmatic theory, resulting in three-way politeness classifications: polite, impolite, and neutral. The generated dataset contains 10,000 samples with detailed linguistic feature annotations across 16 politeness categories, achieving substantial inter-annotator agreement (κ = 0.703). A comprehensive benchmarking of this dataset was conducted utilizing 40 model configurations spanning traditional machine learning (12 models), transformer-based architecture (10 models), and large language models (18 configurations), thereby effectively demonstrating its practical utility and inherent challenges. This generated resource aims to bridge the gap in Arabic sociopragmatic NLP and encourage further research into politeness-aware applications for the Arabic language.

Details

Paper ID
lrec2026-main-244
Pages
pp. 3128-3137
BibKey
alkhalifa-etal-2026-adab
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • HA

    Hend Al-Khalifa

  • NG

    Nadia Ghezaiel

  • MB

    Maria Bounnit

  • HA

    Hend Hamed Alhazmi

  • NA

    Noof Abdullah Alfear

  • RA

    Reem Fahad Alqifari

  • AA

    Ameera Masoud Almasoud

  • SA

    Sharefah Ahmed Al-Ghamdi

Links