Nakba Discourse 2025: A Bilingual Social Media Dataset for Collective Trauma Analysis

Proceedings of the 2nd International Workshop on Nakba Narratives as Language Resources @ LREC 2026

Abstract

We introduce Nakba Discourse 2025, a bilingual full-year social media dataset capturing Arabic and English discourse about the 1948 Palestinian Nakba across Twitter/X and Facebook from January to December 2025. The corpus contains 70,312 unique posts organized into intersecting sub-corpora by language, sentiment, gender, geography, and platform, with engagement metadata and automatically extracted rhetorical features. Analyses reveal systematic variation in engagement and framing across communities. Per-post engagement is highest in Israel and UK subsets (50.62 and 49.08 average likes respectively), while Arabic-language discourse shows markedly lower per-post engagement. Sentiment distribution is strongly skewed, with negative sentiment posts outnumbering positive ones at an 11:1 ratio (54,424 vs. 4,827 posts). Despite dramatic variation in absolute engagement levels, virality rates remain structurally constant at approximately 10% across all Twitter/X sub-corpora, regardless of language, gender, or geography, pointing to platform-level amplification regularities. Gender analysis reveals that women achieve proportional virality equal to men despite producing roughly one-third the volume of posts. Temporal patterns align with cultural calendars, including Thursday peaks associated with Jumu’ah across Arabic and female subsets, and Sunday peaks in English-language subsets reflecting Western media cycles. The dataset will be released for research use and supports multilingual stance detection, virality modeling, rhetorical analysis, and computational studies of digital political memory.