HomeLREC 2022WorkshopsDIGITAMlrec2022-ws-digitam-2
Back to DIGITAM 2022
LREC 2022workshop

Dialects Identification of Armenian Language

Proceedings of the Workshop on Processing Language Variation: Digital Armenian (DigitAm) within the 13th Language Resources and Evaluation Conference

DOI:10.63317/5jx8wsh5qyse

Abstract

The Armenian language has many dialects that differ from each other syntactically, morphologically, and phonetically. In this work, we implement and evaluate models that determine the dialect of a given passage of text. The proposed models are evaluated for the three major variations of the Armenian language: Eastern, Western, and Classical. Previously, there were no instruments of dialect identification in the Armenian language. The paper presents three approaches: a statistical which relies on a stop words dictionary, a modified statistical one with a dictionary of most frequently encountered words, and the third one that is based on Facebook’s fastText language identification neural network model. Two types of neural network models were trained, one with the usage of pre-trained word embeddings and the other without. Approaches were tested on sentence-level and document-level data. The results show that the neural network-based method works sufficiently better than the statistical ones, achieving almost 98% accuracy at the sentence level and nearly 100% at the document level.

Details

Paper ID
lrec2022-ws-digitam-2
Pages
pp. 8-12
BibKey
avetisyan-2022-dialects
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Workshop on Processing Language Variation: Digital Armenian (DigitAm) within the 13th Language Resources and Evaluation Conference
Location
undefined, undefined
Date
20 June 2022 25 June 2022

Authors

  • KA

    Karen Avetisyan

Links