Back to Main Conference 2026
LREC 2026main

A Cheap Lunch: Synthetic Annotation With Reduced Human Effort for Medical Text Mining

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/43rc447ycaeu

Abstract

Electronic Health Records are rich resources of patient knowledge and information among which knowledge about the functioning of patients as defined in the International Classification of Functioning (ICF) by the WHO. However, the patient notes have yet to be explored as the knowledge is packaged in sometimes cryptic language exchanged between caretakers. Recent research started to use NLP techniques to extract this knowledge but often requires laborious annotation. In this paper, we report on how the annotation can (partly) be done by a generative LLM, both for ICF categories that were previously manually annotated and for new ICF categories for which there was no annotation. We show that a domain specific encoder finetuned with both manual and synthetic annotations outperforms finetuning with just the manual annotations on a dedicated test set that was adapted for the new categories with minimal manual effort. We also assessed the quality of the synthetic annotations of the training data. Our process shows how competitive text classifiers for medical text mining can be developed and extended to new categories with minimal manual effort by experts.

Details

Paper ID
lrec2026-main-813
Pages
pp. 10353-10364
BibKey
chen-etal-2026-cheap
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • SC

    Shutao Chen

  • PV

    Piek T.J.M. Vossen

Links