From Bones to Rocks: A Systematic Evaluation of Specialized Definition Generation for Portuguese

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

This work presents a systematic evaluation of Large Language Models (LLMs) for generating specialized definitions in Portuguese, focusing on the medical and geological domains. We introduce a robust benchmark and employ a rigorous, statistically grounded evaluation framework, including 5-fold cross-validation and significance testing, to ensure the reliability and generalizability of our findings. Our comprehensive experiments with various open-source, decoder-only LLMs explore in-context learning (ICL) with diverse prompting strategies, ranging from zero-shot to few-shot and contextual information. The evaluated models include multilingual architectures and one model that underwent continued pretraining specifically for Portuguese, allowing us to assess the impact of language adaptation on definition generation quality. The results indicate that most evaluated models perform effectively in this task, with relatively small performance differences among the top models. Statistical analyses confirmed that these differences are not consistently significant, suggesting that several open LLMs, regardless of their size, multilingual capacity, or language specialization, offer comparable effectiveness for Portuguese definition generation. These findings provide valuable insights for selecting and adapting models for specialized NLP tasks in low-resource languages like Portuguese.