TRUMEDIQA: A Modular Trustworthy RAG Pipeline for Multilingual Medical Question Answering

Proceedings of the 8th Workshop on Clinical Natural Language Processing (Clinical NLP) @ LREC 2026

Abstract

Medical question answering systems must balance usefulness with safety, particularly in low-resource linguistic settings where robustness is limited and hallucinations can cause harm. We present TRUMEDIQA, a reproducible multilingual medical QA pipeline for Moroccan Darija, Arabic, French, and English, deployed on WhatsApp with text and voice interactions. TRUMEDIQA uses layered decision-making: (i) language identification, (ii) a pre-retrieval intent router that maps queries to one of 38 clinical FAQ categories to constrain retrieval, and (iii) post-retrieval LLM-based re-ranking that selects the best candidate answer or returns a null decision to trigger a safe fallback (abstention). Answers are retrieved from a curated FAQ knowledge base validated by medical professionals. We evaluate TRUMEDIQA with 21 participants submitting 290 questions across four languages. An expert annotator labels each interaction as relevant, acceptable, or irrelevant, and we also measure correct abstentions when no suitable answer exists in the knowledge base. An ablation study shows that routing and re-ranking improve the weighted relevance score from 0.25 to 0.94 and precision from 0.53 to 0.98 versus a naïve retrieval baseline, while increasing correct abstention on unanswerable queries from 4.38% to 69.77%.