Back to Main Conference 2018
LREC 2018main

A Deep Neural Network based Approach for Entity Extraction in Code-Mixed Indian Social Media Text

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/4kkwj57kxfv7

Abstract

The rise in accessibility of web to the masses has led to a spurt in the use of social media making it convenient and powerful way to express and exchange information in their own language(s). India, being enormously diversified country have more than 168 millions users on social media. This diversity is also reflected in their scripts where a majority of users often switch between their native language to be more expressive. These linguistic variations make automatic entity extraction both a necessary and a challenging problem. In this paper, we report our work for entity extraction in a code-mixed environment. Entity extraction is a fundamental component in many natural language processing (NLP) applications. The task of entity extraction faces more challenges while dealing with unstructured and informal texts, and mixing of scripts (i.e., code-mixing) further adds complexities to the process. Our proposed approach is based on the popular deep neural network based Gated Recurrent Unit (GRU) units that discover the higher level features from the text automatically. It does not require handcrafted features or rules, unlike the existing systems. To the best of our knowledge, it is the first attempt for entity extraction from code mixed data using the deep neural network. The proposed system achieves the F-scores of 66.04% and 53.85% for English-Hindi and English-Tamil language pairs, respectively.

Details

Paper ID
lrec2018-main-278
Pages
N/A
BibKey
gupta-etal-2018-deep
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • DG

    Deepak Gupta

  • AE

    Asif Ekbal

  • PB

    Pushpak Bhattacharyya

Links