tt501 at ArchEHR-QA 2026: Few-Shot Prompting with Retrieval-Augmented Generation for Grounded Clinical EHR Question Answering
Proceedings of the Third Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC 2026
Abstract
We present the ArchEHR-QA 2026 shared task system of team tt501, which addresses evidence identification (Subtask 2), answer generation (Subtask 3), and evidence alignment (Subtask 4) from electronic health record notes. Our approach relies entirely on prompt engineering with xAI’s Grok models, without any task-specific fine-tuning or external knowledge. For evidence identification we compare a hybrid BM25 plus large language model (LLM) reranker with a full-context chain-of-thought ensemble and refinement step, finding that full-note reasoning yields higher recall and F1. For answer generation we implement a retrieval-augmented generation pipeline that conditions on predicted evidence sentences and few-shot examples, improving lexical and semantic faithfulness over a zero-shot baseline. For evidence alignment we design a recall-oriented few-shot prompt enriched with explicit rationales that teach the model how to map each answer sentence back to its supporting note sentences. We report official shared task results and analyse the impact of these design choices across the three subtasks.