Back to Main Conference 2024
LREC-COLING 2024main

RoBERTa Low Resource Fine Tuning for Sentiment Analysis in Albanian

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/3uea3f2tush4

Abstract

The education domain has been a popular area of collaboration with NLP researchers for decades. However, many recent breakthroughs, such as large transformer based language models, have provided new opportunities for solving interesting, but difficult problems. One such problem is assigning sentiment to reviews of educators’ performance. We present EduSenti: a corpus of 1,163 Albanian and 624 English reviews of educational instructor’s performance reviews annotated for sentiment, emotion and educational topic. In this work, we experiment with fine-tuning several language models on the EduSenti corpus and then compare with an Albanian masked language trained model from the last XLM-RoBERTa checkpoint. We show promising results baseline results, which include an F1 of 71.9 in Albanian and 73.8 in English. Our contributions are: (i) a sentiment analysis corpus in Albanian and English, (ii) a large Albanian corpus of crawled data useful for unsupervised training of language models, and (iii) the source code for our experiments.

Details

Paper ID
lrec2024-main-1233
Pages
pp. 14146-14151
BibKey
nuci-etal-2024-roberta
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • KN

    Krenare Pireva Nuci

  • PL

    Paul Landes

  • BD

    Barbara Di Eugenio

Links