Back to Main Conference 2022
LREC 2022main

The Project Dialogism Novel Corpus: A Dataset for Quotation Attribution in Literary Texts

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/5jxttmawhqw2

Abstract

We present the Project Dialogism Novel Corpus, or PDNC, an annotated dataset of quotations for English literary texts. PDNC contains annotations for 35,978 quotations across 22 full-length novels, and is by an order of magnitude the largest corpus of its kind. Each quotation is annotated for the speaker, addressees, type of quotation, referring expression, and character mentions within the quotation text. The annotated attributes allow for a comprehensive evaluation of models of quotation attribution and coreference for literary texts.

Details

Paper ID
lrec2022-main-628
Pages
pp. 5838-5848
BibKey
vishnubhotla-etal-2022-project
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • KV

    Krishnapriya Vishnubhotla

  • AH

    Adam Hammond

  • GH

    Graeme Hirst

Links