SocialStep: Fast Prediction of Social Determinants of Health
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
Given thousands of medical documents, how can we automatically uncover patients’ social risk factors? Social Determinants of Health (SDoH) constitute a growing class of non-clinical risk factors that shape patient trajectories. While clinically significant, automatic detection of SDoH from free text remains understudied due to scarce and imbalanced training data. Current approaches often rely on monolithic large language models. We present SocialStep, a two-step hybrid pipeline that first uses a lightweight classifier to triage sentences and then applies a Large Language Model (LLM) for multilabel classification to the relevant subset. On the Medical Information Mart for Intensive Care III (MIMIC-III) dataset, SocialStep improves macro F1 by 5 points over the state-of-the-art baseline while running 12.2× faster. These findings demonstrate that integrating compact neural encoders with large language models provides a scalable and highly accurate framework for clinical NLP tasks, including SDoH extraction. Notably, we also observe some unexpected patterns in LLM performance. SocialStep offers a practical blueprint for hybrid model deployment that identifies critical social risk factors without prohibitive computational cost.