Back to Main Conference 2016
LREC 2016main

Annotating Logical Forms for EHR Questions

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/5abhwgueouzg

Abstract

This paper discusses the creation of a semantically annotated corpus of questions about patient data in electronic health records (EHRs). The goal is provide the training data necessary for semantic parsers to automatically convert EHR questions into a structured query. A layered annotation strategy is used which mirrors a typical natural language processing (NLP) pipeline. First, questions are syntactically analyzed to identify multi-part questions. Second, medical concepts are recognized and normalized to a clinical ontology. Finally, logical forms are created using a lambda calculus representation. We use a corpus of 446 questions asking for patient-specific information. From these, 468 specific questions are found containing 259 unique medical concepts and requiring 53 unique predicates to represent the logical forms. We further present detailed characteristics of the corpus, including inter-annotator agreement results, and describe the challenges automatic NLP systems will face on this task.

Details

Paper ID
lrec2016-main-598
Pages
pp. 3772-3778
BibKey
roberts-demner-fushman-2016-annotating
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • KR

    Kirk Roberts

  • DD

    Dina Demner-Fushman

Links