Back to Main Conference 2016
LREC 2016main

A Corpus of Word-Aligned Asked and Anticipated Questions in a Virtual Patient Dialogue System

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/5cak39rrjyjy

Abstract

We present a corpus of virtual patient dialogues to which we have added manually annotated gold standard word alignments. Since each question asked by a medical student in the dialogues is mapped to a canonical, anticipated version of the question, the corpus implicitly defines a large set of paraphrase (and non-paraphrase) pairs. We also present a novel process for selecting the most useful data to annotate with word alignments and for ensuring consistent paraphrase status decisions. In support of this process, we have enhanced the earlier Edinburgh alignment tool (Cohn et al., 2008) and revised and extended the Edinburgh guidelines, in particular adding guidance intended to ensure that the word alignments are consistent with the overall paraphrase status decision. The finished corpus and the enhanced alignment tool are made freely available.

Details

Paper ID
lrec2016-main-506
Pages
pp. 3174-3179
BibKey
gokcen-etal-2016-corpus
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • AG

    Ajda Gokcen

  • EJ

    Evan Jaffe

  • JE

    Johnsey Erdmann

  • MW

    Michael White

  • DD

    Douglas Danforth

Links