TabMedQA: From Structured Data to Question-Answer Datasets in Early Clinical Decision-Making
Proceedings of the Third Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC 2026
Abstract
The rising adoption of Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) in clinical general practice demands datasets that capture realistic early-stage clinical decision-making, where experts must decide on follow-up actions based on sparse, structured patient data. Existing medical Question–Answering (QA) resources primarily address post-diagnostic or specialist settings and rarely reflect how General Practitioners (GPs) document and justify early decisions based on clinical observations from Electronic Health Records (EHRs) and grounded on clinical guidelines. We present TabMedQA, a framework for synthesizing QA collections that emulate how GPs formulate and document decisions in encounter notes during early patient assessments. TabMedQA leverages instruction-tuned LLMs, guided by disease-specific clinical guidelines, to generate full encounter notes composed of a guideline-grounded justification and a corresponding follow-up recommendation directly from structured EHR inputs. The framework further supports RAG-based evaluation, simulating how GPs might consult previous patient encounters to inform new consultations. We demonstrate the application and resulting resource use of TabMedQA on prostate cancer using the publicly available PI-CAI collection and release the resulting PI-CAI QA collection, resource generation templates, and TabMedQA code. To the best of our knowledge, TabMedQA provides the first open framework for creating guideline-grounded, EHR-based QA collections that enable the generation and holistic evaluation of LLM-produced clinical encounter notes, bridging decision-making accuracy with clinical encounter quality in general practice