Back to Main Conference 2008
LREC 2008main

Exploiting Multiply Annotated Corpora in Biomedical Information Extraction Tasks

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/2bv8vga4yhbj

Abstract

This paper discusses the problem of utilising multiply annotated data in training biomedical information extraction systems. Two corpora, annotated with entities and relations, and containing a number of multiply annotated documents, are used to train named entity recognition and relation extraction systems. Several methods of automatically combining the multiple annotations to produce a single annotation are compared, but none produces better results than simply picking one of the annotated versions at random. It is also shown that adding extra singly annotated documents produces faster performance gains than adding extra multiply annotated documents.

Details

Paper ID
lrec2008-main-072
Pages
N/A
BibKey
haddow-alex-2008-exploiting
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • BH

    Barry Haddow

  • BA

    Beatrice Alex

Links