Back to Main Conference 2026
LREC 2026main

Beyond Literal Meaning: How LLMs Interpret Yemeni Proverbs

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4hxnxxxq5iu2

Abstract

We present a benchmark Yemeni proverbs dataset paired with expert-annotated explanations, designed to evaluate the cultural reasoning abilities of large language models (LLMs). Using zero-shot and few-shot prompting, we assess seven LLMs through both automatic and human evaluation. Results show that instruction-tuned models like GPT-4o and Gemini 1.5 Pro outperform smaller models in both automatic and human evaluations. Few-shot prompting significantly improves performance across all models, underscoring its value for figurative and culturally grounded language tasks. Notably, ALLaM, a bilingual model trained on Arabic and English, achieves competitive results, demonstrating the potential of regionally adapted models for low-resource cultural tasks. LLM-as-a-Judge evaluation correlates strongly with human assessment (Kendall’s τ up to 0.98). Error analysis identifies recurring literal interpretation and cultural misalignment as key failure modes.

Details

Paper ID
lrec2026-main-083
Pages
pp. 1071-1080
BibKey
thmer-etal-2026-beyond
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • NT

    Nasser Thmer

  • AA

    Ali Al-Laith

  • MS

    Muhammad Shoaib

Links