HomeLREC 2026WorkshopsCLINICALNLPlrec2026-ws-clinicalnlp-29
Back to CLINICALNLP 2026
LREC 2026workshop

hgkai26 at MEDIQA-EVAL 2026: Automated Evaluation of Visual Medical Question Answering Using LLM-as-a-Judge

Proceedings of the 8th Workshop on Clinical Natural Language Processing (Clinical NLP) @ LREC 2026

DOI:10.63317/4n9skmf9rive

Abstract

As there is a rise in the use of multimodal large language models (LLMs) for medical response generation, it is necessary to have reliable automated evaluation mechanisms that can assess the quality of model-generated outputs. The MediQA-Eval 2026 shared task focuses on grading AI-generated dermatology and wound care responses using structured human-aligned rubrics. In this work, we explore a zero-shot multimodal LLM-as-a-Judge framework to assess candidate responses across multiple quality dimensions. System performance is evaluated using the official task metrics designed to reflect alignment with human judgments. Our findings provide preliminary insights into the feasibility and limitations of LLM-based evaluators for rubric-guided medical response assessment.

Details

Paper ID
lrec2026-ws-clinicalnlp-29
Pages
pp. 257-261
BibKey
gangavarapu-2026-hgkai26
Editors
Asma Ben Abacha, Steven Bethard, Danielle Bitterman, Tristan Naumann, Kirk Roberts
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 8th Workshop on Clinical Natural Language Processing (Clinical NLP) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • HG

    Haritha Gangavarapu

Links