Datasets for a Chatbot for Clinical Trial Search

Proceedings of the Third Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC 2026

Abstract

Matching patients to clinical trials is a critical bottleneck hindered by complex eligibility criteria. While conversational AI offers a promising solution, its safe deployment depends on high-quality, domain specific data. This paper introduces three benchmark datasets designed to support the development and evaluation of conversational agents for clinical trial pre-screening. First, a manually-annotated paired-criterion dataset provides a gold standard for structuring raw criteria, which we used to objectively group 12,596 criteria. Second, we curated a human-authored question benchmark to validate the clinical fidelity and patient-centric clarity of questions generated by a medical LLM, ensuring the AI’s dialogue is accurate and understandable. Third, we constructed a human-validated assessment corpus of criterion-question-answer tuples with human-labeled outcomes to evaluate criterion classification based on a patient’s answer to a generated question. The primary contribution of this work is a foundational set of benchmark datasets, designed to support and evaluate key components for a chatbot for clinical trial search.