HomeLREC 2026WorkshopsNAKBANLPlrec2026-ws-nakbanlp-01
Back to NAKBANLP 2026
LREC 2026workshop

The NakbaEcho Dataset: From Oral Testimonies to a Transcribed Arabic History Corpus

Proceedings of the 2nd International Workshop on Nakba Narratives as Language Resources @ LREC 2026

DOI:10.63317/4zvtrpg8sm2s

Abstract

We present NakbaEcho, a dataset derived from Palestinian testimonies about the 1948 Nakba. The resource is constructed from transcribing over 2,180 hours of recorded interviews gathered through the Palestine Remembered Oral History index and linked to multiple repositories, including the Palestinian Oral History Archive (POHA) and YouTube-hosted interviews. We harmonize interview-level metadata and generate timestamp-aligned transcripts from the original Arabic recordings using an automatic transcription pipeline configured for Palestinian Arabic. The dataset includes speaker-labeled segments and auxiliary annotations designed to support downstream research in Arabic speech processing, natural language processing, digital humanities, and oral-history analysis. NakbaEcho contributes a structured computational resource for studying Palestinian oral testimony while expanding the availability of dialectal Arabic materials for speech, text, and social research.

Details

Paper ID
lrec2026-ws-nakbanlp-01
Pages
pp. 1-22
BibKey
balah-etal-2026-nakbaecho
Editors
Mustafa Jarrar, Mo El-Haj, Amal Haddad, Serin Atiani, Shadi Abudalfa, Terry Regier, Paul Rayson, Khalil Sima’an, Camille Mansour
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 2nd International Workshop on Nakba Narratives as Language Resources @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • BB

    Batool Najeh Balah

  • MF

    Mahmoud Fawzi

  • HE

    Houda Elmimouni

  • WM

    Walid Magdy

Links