Summary of the paper

Title Constructing the CODA Corpus: A Parallel Corpus of Monologues and Expository Dialogues
Authors Svetlana Stoyanchev and Paul Piwek
Abstract We describe the construction of the CODA corpus, a parallel corpus of monologues and expository dialogues. The dialogue part of the corpus consists of expository, i.e., information-delivering rather than dramatic, dialogues written by several acclaimed authors. The monologue part of the corpus is a paraphrase in monologue form of these dialogues by a human annotator. The annotator-written monologue preserves all information present in the original dialogue and does not introduce any new information that is not present in the original dialogue. The corpus was constructed as a resource for extracting rules for automated generation of dialogue from monologue. Using authored dialogues allows us to analyse the techniques used by accomplished writers for presenting information in the form of dialogue. The dialogues are annotated with dialogue acts and the monologues with rhetorical structure. We developed annotation and translation guidelines together with a custom-developed tool for carrying out translation, alignment and annotation of the dialogues. The final parallel CODA corpus consists of 1000 dialogue turns that are tagged with dialogue acts and aligned with monologue that expresses the same information and has been annotated with rhetorical structure relations.
Topics Corpus (creation, annotation, etc.), Dialogue, Natural Language Generation
Full paper Constructing the CODA Corpus: A Parallel Corpus of Monologues and Expository Dialogues
Slides Constructing the CODA Corpus: A Parallel Corpus of Monologues and Expository Dialogues
Bibtex @InProceedings{STOYANCHEV10.127,
  author = {Svetlana Stoyanchev and Paul Piwek},
  title = {Constructing the CODA Corpus: A Parallel Corpus of Monologues and Expository Dialogues},
  booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA