Overview of the CT-DEB’26 Shared Task on Predicting Dosing Errors in Interventional Clinical Trials

Proceedings of the Third Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC 2026

Abstract

Dosing errors represent an important source of medication-related risk in interventional clinical trials, potentially affecting both participant safety and the validity of study outcomes. Despite their importance, systematic methods for predicting dosing error risk from trial design information remain largely unexplored. To address this gap, we organized the Clinical Trial Dosing Error Benchmark 2026 (CT-DEB’26) shared task, hosted at the CL4Health workshop at LREC 2026. The task focuses on predicting the risk of dosing errors in interventional clinical trials using heterogeneous information extracted from ClinicalTrials.gov, including structured protocol metadata and long-form textual descriptions. The released benchmark dataset contains over 42,000 clinical trial records spanning multiple study phases and therapeutic areas, annotated with binary labels indicating a significant high rate of dosing errors. Participants were asked to develop ML models capable of estimating trial-level dosing error risk, evaluated primarily using the ROC-AUC metric under strong class imbalance. The shared task was conducted in two phases and attracted 15 submissions in the development stage and 4 submissions in the final evaluation phase. This paper provides an overview of the shared task, describing the dataset construction, evaluation protocol, and participating systems. In addition, we present a schema-aware CatBoost baseline that leverages structured trial metadata and simple textual statistics, achieving ROC-AUC scores of 0.8606 and 0.8624 on the Phase 1 and Phase 2 leaderboards, respectively. We further summarize the approaches proposed by participating teams, which explore both feature-engineering pipelines and transformer-based text representations. The results highlight the importance of structured trial design variables and hybrid modeling strategies combining tabular and textual information. Finally, we discuss limitations of the benchmark and outline future directions for applying natural language processing and ML to improve medication safety in clinical trial design.