Empathy in Greek Exam-Related Support Conversations: A Comparative Evaluation of LLM Responses

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

Recent advancements in Large Language Models (LLMs) have significantly enhanced Natural Language Processing (NLP), particularly in generating human-like responses and engaging in social interactions. Research in natural language generation involves assessing AI-generated text across multiple dimensions, including accuracy, relevance, and robustness. This paper focuses on evaluating an LLM that puts emphasis on the Greek language and comparing it to two multilingual LLMs across four key dimensions: Understanding, Empathy, Harm, and Reasoning. We analyze the models’ responses to expressions of stress and anxiety from teenagers preparing for the Greek State’s Panhellenic exams for university entrance, assessing not only their ability to comprehend, reason, and respond empathetically but also possible unintended harm that they may cause, such as reinforcing stress or offering inappropriate advice. We, thus, introduce the GEAR (Greek Empathy Assessment Resource) dataset of student issues and exam-related forum posts along with LLM-generated empathetic responses. By prompting each model with contextual cues about its role as a recipient of these messages, this research aims to provide insights into the models’ conversational capabilities, emotional intelligence, and ethical implications in sensitive interactions.