Back to Main Conference 2018
LREC 2018main

No more beating about the bush : A Step towards Idiom Handling for Indian Language NLP

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/4ywaxwj2mz6o

Abstract

One of the major challenges in the field of Natural Language Processing (NLP) is the handling of idioms; seemingly ordinary phrases which could be further conjugated or even spread across the sentence to fit the context. Since idioms are a part of natural language, the ability to tackle them brings us closer to creating efficient NLP tools. This paper presents a multilingual parallel idiom dataset for seven Indian languages in addition to English and demonstrates its usefulness for two NLP applications - Machine Translation and Sentiment Analysis. We observe significant improvement for both the subtasks over baseline models trained without employing the idiom dataset.

Details

Paper ID
lrec2018-main-048
Pages
N/A
BibKey
agrawal-etal-2018-beating
Editors
Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 - 12 May 2018

Authors

  • RA

    Ruchit Agrawal

  • VC

    Vighnesh Chenthil Kumar

  • VM

    Vigneshwaran Muralidharan

  • DS

    Dipti Sharma

Links