Back to Main Conference 2018
LREC 2018main

Automatic Identification of Maghreb Dialects Using a Dictionary-Based Approach

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/3a4wixcre2vd

Abstract

Automatic identification of Arabic dialects in a text is a difficult task, especially for Maghreb languages and when they are written in Arabic or Latin characters (Arabizi). These texts are characterized by the use of code-switching between the Modern Standard Arabic (MSA) and the Arabic Dialect (AD) in the texts written in Arabic, or between Arabizi and foreign languages for those written in Latin. This paper presents the specific resources and tools we have developed for this purpose, with a focus on the transliteration of Arabizi into Arabic (using the dedicated tools for Arabic dialects). A dictionary-based approach to detect the dialectal origin of a text is described, it exhibits satisfactory results.

Details

Paper ID
lrec2018-main-575
Pages
N/A
BibKey
saadane-etal-2018-automatic
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • HS

    Houda Saâdane

  • HS

    Hosni Seffih

  • CF

    Christian Fluhr

  • KC

    Khalid Choukri

  • NS

    Nasredine Semmar

Links