Back to Main Conference 2010
LREC 2010main

A Swedish Scientific Medical Corpus for Terminology Management and Linguistic Exploration

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/22teor9emrgn

Abstract

This paper describes the development of a new Swedish scientific medical corpus. We provide a detailed description of the characteristics of this new collection as well results of an application of the corpus on term management tasks, including terminology validation and terminology extraction. Although the corpus is representative for the scientific medical domain it still covers in detail a lot of specialised sub-disciplines such as diabetes and osteoporosis which makes it suitable for facilitating the production of smaller but more focused sub-corpora. We address this issue by making explicit some features of the corpus in order to demonstrate the usability of the corpus particularly for the quality assessment of subsets of official terminologies such as the Systematized NOmenclature of MEDicine - Clinical Terms (SNOMED CT). Domain-dependent language resources, labelled or not, are a crucial key components for progressing R&D in the human language technology field since such resources are an indispensable, integrated part for terminology management, evaluation, software prototyping and design validation and a prerequisite for the development and evaluation of a number of sublanguage dependent applications including information extraction, text mining and information retrieval.

Details

Paper ID
lrec2010-main-030
Pages
N/A
BibKey
kokkinakis-gerdin-2010-swedish
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • DK

    Dimitrios Kokkinakis

  • UG

    Ulla Gerdin

Links