HomeLREC 2026WorkshopsCL4HEALTHlrec2026-ws-cl4health-13
Back to CL4HEALTH 2026
LREC 2026workshop

Datasets for a Chatbot for Clinical Trial Search

Proceedings of the Third Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC 2026

DOI:10.63317/3pk9cracmxy7

Abstract

Matching patients to clinical trials is a critical bottleneck hindered by complex eligibility criteria. While conversational AI offers a promising solution, its safe deployment depends on high-quality, domain specific data. This paper introduces three benchmark datasets designed to support the development and evaluation of conversational agents for clinical trial pre-screening. First, a manually-annotated paired-criterion dataset provides a gold standard for structuring raw criteria, which we used to objectively group 12,596 criteria. Second, we curated a human-authored question benchmark to validate the clinical fidelity and patient-centric clarity of questions generated by a medical LLM, ensuring the AI’s dialogue is accurate and understandable. Third, we constructed a human-validated assessment corpus of criterion-question-answer tuples with human-labeled outcomes to evaluate criterion classification based on a patient’s answer to a generated question. The primary contribution of this work is a foundational set of benchmark datasets, designed to support and evaluate key components for a chatbot for clinical trial search.

Details

Paper ID
lrec2026-ws-cl4health-13
Pages
pp. 139-148
BibKey
yang-etal-2026-datasets
Editors
Deepak Gupta, Paul Thompson, Sophia Ananiadou, Dina Demner-Fushman
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Third Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • YY

    Yumeng Yang

  • EL

    Ethan Ludmir

  • KR

    Kirk Roberts

Links