ALBA: An Automated Framework for Benchmarking Clinical Language Biomarkers against Standardized Corpora

Proceedings of the Sixth Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments in cooperation with the MENTAL.ai consortium

DOI:10.63317/449fwsyjuzbp

Abstract

Patients with diverse neurocognitive conditions frequently exhibit measurable language deficits that serve as biomarkers for differential diagnosis and therapy decision making. Discourse analysis can offer reliable ecological measures of human communication, yet manual discourse analysis is cumbersome. Recent advances in automated analysis software provide quick and easy extraction of raw language metrics in the clinic. Nevertheless, transforming these measures into actionable clinical insights remains a significant challenge. The aim of this paper is to present the Automated Language Biomarker Application (ALBA), an integrated framework developed within the Open Brain AI ecosystem to bridge the gap between feature extraction and clinical interpretation. ALBA provides clinicians with a robust statistical infrastructure to benchmark individual patient measures against standardized, large-scale clinical corpora. By utilizing a shared elicitation and processing pipeline, the application ensures that user-provided data are directly comparable to population norms for conditions including Aphasia, Mild Cognitive Impairment (MCI), Dementia, and other neurological conditions. The system implements adaptive statistical logic, employing one-sample t-tests and robust non-parametric alternatives to provide real-time significance testing and dynamic visualizations (box, bar, and violin plots). By automating the comparison of "Language Signatures" against healthy controls and specific clinical phenotypes, ALBA facilitates rapid, evidence-based decision-making in both research and rehabilitation contexts.