Back to Main Conference 2026
LREC 2026main

ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/363iym4fo7cx

Abstract

We present ArabDiscrim, a decade-long lexical resource and corpus of 293K public Arabic Facebook posts (2014–2024) discussing racism and discrimination. Unlike existing Twitter-centric datasets, ArabDiscrim integrates platform-native engagement signals, including reactions, shares, comments, and page metadata, enabling joint analysis of language and audience response. The resource includes 200 curated terms (100 racism, 100 discrimination) with morphological regex families (13+ inflections per lemma), and 20 discrimination axes capturing identity-based grounds for unequal treatment. It also provides explicit attribution patterns. Released under a restricted research-use license for ethical compliance with platform terms, ArabDiscrim supports weak supervision, axis-aware sampling, and platform ecology research. By bridging lexical depth and ecological validity, it establishes a foundation for fairness-oriented, platform-aware Arabic NLP.

Details

Paper ID
lrec2026-main-929
Pages
pp. 11874-11884
BibKey
zaghouani-etal-2026-arabdiscrim
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • WZ

    Wajdi Zaghouani

  • SI

    Shimaa Amer Ibrahim

  • MB

    Mabrouka Bessghaier

  • HB

    Houda Bouamor

Links