Back to Main Conference 2026
LREC 2026main

SentiMalti: A Maltese Sentiment Analysis Dataset and Models

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4kw8df57bza3

Abstract

We present SentiMalti, a new Maltese social media sentiment resource and accompanying baselines. We scrape user‑generated content from YouTube, Reddit, and Facebook, then apply a Maltese‑aware preprocessing pipeline (cleaning, personally identifiable information anonymisation, sentence splitting, and sentence‑level language filtering) to retain Maltese sentences while tolerating realistic code‑switching. The resulting crowdsourced dataset contains 2,327 sentences annotated for positive (39%), negative (31%), and neutral (30%) sentiment. We integrate prior Maltese datasets to create a combined benchmark of 3,772 instances. We evaluate fine‑tuned encoder models (BERTu, Glot500) and few‑shot prompting with instruction‑tuned multilingual LLMs (Aya‑101, Gemma 2 Instruct 9B). On the full test set, five‑shot Aya‑101 attains 68.65 macro‑F1, closely followed by a fine‑tuned BERTu at 68.36 macro‑F1. Error analysis reveals complementary strengths: BERTu better separates polarised classes, while Aya‑101 tends to over‑predict the neutral class. We release the dataset splits, code, and a fine‑tuned BERTu model to facilitate further work in Maltese NLP and sentiment analysis.

Details

Paper ID
lrec2026-main-630
Pages
pp. 7927-7936
BibKey
caruana-etal-2026-sentimalti
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • IC

    Ian Caruana

  • MV

    Matthew Vella

  • FZ

    Fabio Zammit

  • KM

    Kurt Micallef

  • CB

    Claudia Borg

Links