MeteoGalEus: An Iberian Multilingual Weather Dataset in Galician, Euskera, and Spanish
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
This paper introduces MeteoGalEus, a multilingual weather dataset that combines meteorological observations from two Spanish regional agencies, Euskalmet and MeteoGalicia. The dataset contains daily records spanning 4 years and 6 months, with aligned observations for both sources. MeteoGalEus captures key meteorological variables including temperature, wind and state of the sky. The dataset is provided in a structured format, facilitating data analysis and integration, with textual forecasts available in the official languages for each region (i.e., Galician and Spanish for MeteoGalicia; Euskera and Spanish for Euskalmet). By merging and harmonizing data from two regional agencies, MeteoGalEus is a unique resource for cross-regional weather analysis and multilingual climate studies. This dataset is suited for tasks requiring high-quality, aligned, and standardized weather data across multiple languages and regions. We conducted baseline experiments using LLaMA-based models in both zero-shot and fine-tuned settings to illustrate the use of MeteoGalEus for natural language generation (NLG). Fine-tuning led to consistent improvements across all metrics, with BERTScore increasing from 0.68 to 0.79, ROUGE from 0.20 to 0.35, and BLEU from 0.02 to 0.17 in the best-performing model. The experiments show how MeteoGalEus can be taken as a benchmark for multilingual and cross-regional NLG tasks.