Back to Main Conference 2024
LREC-COLING 2024main

MedQA-SWE - a Clinical Question & Answer Dataset for Swedish

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/4ckcc2pepdd3

Abstract

Considering the rapid improvement of large generative language models, it is important to measure their ability to encode clinical domain knowledge in order to help determine their potential utility in a clinical setting. To this end we present MedQA-SWE – a novel multiple choice, clinical question & answering (Q&A) dataset in Swedish consisting of 3,180 questions. The dataset was created from a series of exams aimed at evaluating doctors’ clinical understanding and decision making and is the first open-source clinical Q&A dataset in Swedish. The exams – originally in PDF format – were parsed and each question manually checked and curated in order to limit errors in the dataset. We provide dataset statistics along with benchmark accuracy scores of seven large generative language models on a representative sample of questions in a zero-shot setting, with some models showing impressive performance given the difficulty of the exam the dataset is based on.

Details

Paper ID
lrec2024-main-0975
Pages
pp. 11178-11186
BibKey
hertzberg-lokrantz-2024-medqa
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • NH

    Niclas Hertzberg

  • AL

    Anna Lokrantz

Links