Back to Main Conference 2012
LREC 2012main

A Corpus of Scientific Biomedical Texts Spanning over 168 Years Annotated for Uncertainty

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/2thim8no8g7m

Abstract

Uncertainty language permeates biomedical research and is fundamental for the computer interpretation of unstructured text. And yet, a coherent, cognitive-based theory to interpret Uncertainty language and guide Natural Language Processing is, to our knowledge, non-existing. The aim of our project was therefore to detect and annotate Uncertainty markers ― which play a significant role in building knowledge or beliefs in readers' minds ― in a biomedical research corpus. Our corpus includes 80 manually annotated articles from the British Medical Journal randomly sampled from a 168-year period. Uncertainty markers have been classified according to a theoretical framework based on a combined linguistic and cognitive theory. The corpus was manually annotated according to such principles. We performed preliminary experiments to assess the manually annotated corpus and establish a baseline for the automatic detection of Uncertainty markers. The results of the experiments show that most of the Uncertainty markers can be recognized with good accuracy.

Details

Paper ID
lrec2012-main-489
Pages
pp. 2009-2014
BibKey
bongelli-etal-2012-corpus
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • RB

    Ramona Bongelli

  • CC

    Carla Canestrari

  • IR

    Ilaria Riccioni

  • AZ

    Andrzej Zuczkowski

  • CB

    Cinzia Buldorini

  • RP

    Ricardo Pietrobon

  • AL

    Alberto Lavelli

  • BM

    Bernardo Magnini

Links