Back to Main Conference 2026
LREC 2026main

DeepICD-R1: Medical Reasoning through Hierarchical Rewards and Unsupervised Distillation

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/5ntn3gnmv9cy

Abstract

Large language models (LLMs) show strong reasoning abilities, but full retraining for the medical domain is often infeasible because of lacking data or compute resources. We present DeepICD-R1, a framework for efficient medical reasoning fine-tuning that unites hierarchical rewards with distilled supervision. We reformulate ICD-10-CM prediction as a reinforcement learning problem and design a hierarchical outcome-based reward that reflects the ICD code structure across chapter, category, and full-code levels. In parallel, we publish a large-scale distilled dataset of over 90k reasoning traces derived from MIMIC-IV admission notes, integrating clinical validation and official coding guidelines. Fine-tuning smaller instruction-tuned LLMs with this data and GRPO reinforcement yields consistent gains in diagnostic accuracy and reasoning coherence. Extensive ablations confirm that hierarchical supervision and verifiable outcome rewards enable competitive, domain-specialized reasoning models without additional pretraining, providing a reproducible foundation for clinical NLP research. Keywords: Clinical NLP, Large Reasoning Model, GRPO, Supervised Fine-Tuning

Details

Paper ID
lrec2026-main-843
Pages
pp. 10764-10775
BibKey
rhr-etal-2026-deepicd
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • TR

    Tom Röhr

  • TS

    Thomas Maximilian Josef Steffek

  • RT

    Roman Teucher

  • KB

    Keno Bressem

  • AF

    Alexei Figueroa

  • PG

    Paul Grundmann

  • PT

    Peter Troeger

  • FG

    Felix Alexander Gers

  • AL

    Alexander Löser

Links