HomeLREC 2022WorkshopsSIGULlrec2022-ws-sigul-05
Back to SIGUL 2022
LREC 2022workshop

Corpus Creation for Sentiment Analysis in Code-Mixed Tulu Text

Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages

DOI:10.63317/59egdevourju

Abstract

Sentiment Analysis (SA) employing code-mixed data from social media helps in getting insights to the data and decision making for various applications. One such application is to analyze users’ emotions from comments of videos on YouTube. Social media comments do not adhere to the grammatical norms of any language and they often comprise a mix of languages and scripts. The lack of annotated code-mixed data for SA in a low-resource language like Tulu makes the SA a challenging task. To address the lack of annotated code-mixed Tulu data for SA, a gold standard trlingual code-mixed Tulu annotated corpus of 7,171 YouTube comments is created. Further, Machine Learning (ML) algorithms are employed as baseline models to evaluate the developed dataset and the performance of the ML algorithms are found to be encouraging.

Details

Paper ID
lrec2022-ws-sigul-05
Pages
pp. 33-40
BibKey
hegde-etal-2022-corpus
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages
Location
undefined, undefined
Date
20 June 2022 25 June 2022

Authors

  • AH

    Asha Hegde

  • MA

    Mudoor Devadas Anusha

  • SC

    Sharal Coelho

  • HS

    Hosahalli Lakshmaiah Shashirekha

  • BC

    Bharathi Raja Chakravarthi

Links