Comparing Approaches to Automatic Summarization in Less-Resourced Languages
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
Automatic text summarization has achieved high performance in higher-resourced languages like English, but comparatively less attention has been given to summarization in less-resourced languages. This work compares a variety of approaches to summarization from zero-shot prompting of LLMs large and small to fine-tuning smaller models like mT5 with and without three data augmentation approaches and multilingual transfer. We also explore an LLM translation pipeline approach, translating from the source language to English, summarizing and translating back. Evaluating with five different metrics, we find that there is variation across LLMs in their performance at similar model sizes, that our multilingual fine-tuned mT5 baseline outperforms most other approaches including zero-shot LLM performance for most metrics, and that LLM as judge may be unreliable on less-resourced languages.