Temporal Expression Recognition in Legal Transcripts
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
Before working with clinical text data, it is critical and necessary to blind, remove or substitute any personal information in clinical reports. This information may contain named entities, contact details and biographical information, all of which could lead to direct conclusions about an individual. However, there are certain scenarios in which clinical documentation cannot be anonymized, such as when it concerns a rare disease. These records contain information such as mentions of genetic peculiarities or the name of the treating physician. At first glance, this information does not appear to allow conclusions to be drawn about individuals, but it can. In this paper, we address the task of predicting whether a medical report (or a sentence therein) refers to a rare disease or not. Records of rare diseases may contain references to relatives and certain indications that can help reveal whether a rare disease is present. We design a pattern-based approach and a TF-IDF-based predictor, as well as two supervised learning experiments (one at document level and one at sentence level), achieving an F1-score of up to 98%. Our research is the first step towards a larger endeavor in which we aim to support experts involved in documenting medical narratives of rare diseases with automated processes.