Back to Main Conference 2014
LREC 2014main
YouDACC: the Youtube Dialectal Arabic Comment Corpus
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)
Abstract
This paper presents YOUDACC, an automatically annotated large-scale multi-dialectal Arabic corpus collected from user comments on Youtube videos. Our corpus covers different groups of dialects: Egyptian (EG), Gulf (GU), Iraqi (IQ), Maghrebi (MG) and Levantine (LV). We perform an empirical analysis on the crawled corpus and demonstrate that our location-based proposed method is effective for the task of dialect labeling.