HomeLREC 2026WorkshopsCL4HEALTHlrec2026-ws-cl4health-30
Back to CL4HEALTH 2026
LREC 2026workshop

CaresAI at CT-DEB’26: Detecting Dosing Errors In Clinical Trials Using Domain-Specific Transformer Embeddings and Classification Models

Proceedings of the Third Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC 2026

DOI:10.63317/4nqo2dmobwrq

Abstract

Medication errors, particularly dosing errors in clinical trials (CT), can lead to patient harm, adverse drug events and worse patient outcomes. Dosing errors are preventable, and early identification can improve trial integrity and mitigate subsequent clinical and financial burden. This study aims to detect dosing errors within CT protocols by evaluating text representations of trial information using transformer-based language models trained on biomedical corpora. CT textual data was encoded using several models, including ClinicalBERT, PubMedBERT, BioBERT, and MedCPT, and integrated with categorical features. These text embeddings were used as input to classical machine learning models and neural network architectures within an experimental framework. Performance was primarily assessed using ROC-AUC with respect to predicting dosage error. Under a logistic regression baseline, BioBERT consistently outperformed alternative encoders, achieving an ROC-AUC of 0.794, a 3.95 % improvement over the ClinicalBERT baseline. Combining multiple embeddings did not yield improvements, indicating that domain alignment outweighs representational stacking. Gradient boosting models, support vector classifiers, logistic regression, and residual neural networks achieved the strongest performance for predicting dosage error, achieving ROC-AUCs: 0.821 to 0.853. Overall, the integration of domain-specific transformer embeddings with structured metadata enables discrimination of trials meeting a predefined elevated dosing error risk criterion, advancing safety monitoring and supporting informed regulatory decision-making.

Details

Paper ID
lrec2026-ws-cl4health-30
Pages
pp. 344-361
BibKey
hamnett-etal-2026-caresai
Editors
Deepak Gupta, Paul Thompson, Sophia Ananiadou, Dina Demner-Fushman
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Third Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • LH

    Leon Hamnett

  • FI

    Favour Igwezeke

  • JA

    Joseph Itopa Abubakar

  • MA

    Mary Adetutu Adewunmi

Links