Back to Main Conference 2018
LREC 2018main

You Tweet What You Speak: A City-Level Dataset of Arabic Dialects

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/3wos9hprfv6b

Abstract

Arabic has a wide range of varieties or dialects. Although a number of pioneering works have targeted some Arabic dialects, other dialects remain largely without investigation. A serious bottleneck for studying these dialects is lack of any data that can be exploited in computational models. In this work, we aim to bridge this gap: We present a considerably large dataset of > 1=4 billion tweets representing a wide range of dialects. Our dataset is more nuanced than previously reported work in that it is labeled at the fine-grained level of city. More specifically, the data represent 29 major Arab cities from 10 Arab countries with varying dialects (e.g., Egyptian, Gulf, KSA, Levantine, Yemeni).

Details

Paper ID
lrec2018-main-577
Pages
N/A
BibKey
abdul-mageed-etal-2018-tweet
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • MA

    Muhammad Abdul-Mageed

  • HA

    Hassan Alhuzali

  • ME

    Mohamed Elaraby

Links