The Icelandic Language Biobank: Data Collection through a Clinical Analysis Platform
Proceedings of the Sixth Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments in cooperation with the MENTAL.ai consortium
Abstract
Recent work on clinical applications of language technology shows considerable potential for people with speech and language symptoms and disorders, including for the diagnosis and monitoring of diseases and disorders as well as the development of novel communication aids. This has resulted in a variety of digital health tools becoming accessible, including personalized automatic speech recognition for disordered speech and the monitoring of disease progression in neurodegeneration through language samples. Currently, these tools are almost exclusively accessible to speakers of high-resource languages. A major hurdle for small, lower-resourced language communities in this context is the creation of clinical language corpora. We describe ongoing efforts to build the necessary infrastructure for clinical speech and language data collection in Iceland through the Icelandic Language Biobank, a resource that leverages collaboration with clinicians and robust linguistically-informed data collection against data scarcity.