HomeLREC 2026WorkshopsCL4HEALTHlrec2026-ws-cl4health-24
Back to CL4HEALTH 2026
LREC 2026workshop

Structured Radiology Intelligence: Extracting Structured Data from MRI Reports Using LLMs

Proceedings of the Third Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC 2026

DOI:10.63317/4vob5zztrfso

Abstract

This study presents efforts focused on extracting and structuring doctor notes, specifically Magnetic Resonance Imaging (MRI) reports, into a standardized format using large language models (LLMs). We introduce a novel benchmark dataset comprising of 55 clinically relevant variables given by doctors, making it the first of its kind in the automated processing of unstructured medical texts. The annotations to the dataset were generated using a systematic prompt-tuning approach that was manually validated. It was then evaluated across three experimental stages: baseline, intermediate, and fine-tuned. Each stage assessed the impact of different prompt strategies on the performance of various LLMs (LLaMA, Qwen, and DeepSeek). Among the models tested, LLaMA 3.1 8B Instruct consistently achieved the highest composite Score in both the intermediate and final phases, resulting in an 18.42% improvement in performance.

Details

Paper ID
lrec2026-ws-cl4health-24
Pages
pp. 268-280
BibKey
marimuthu-etal-2026-structured
Editors
Deepak Gupta, Paul Thompson, Sophia Ananiadou, Dina Demner-Fushman
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Third Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • SM

    Sushvin Marimuthu

  • PK

    Parameswari Krishnamurthy

  • DS

    Dipti Misra Sharma

  • GH

    Goldwin H

  • AE

    Anu Eapen

  • BS

    Betty Simon

  • AC

    Anuradha Chandramohan

Links