Back to Main Conference 2022
LREC 2022main

DDisCo: A Discourse Coherence Dataset for Danish

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/3i3bkg2bme7q

Abstract

To date, there has been no resource for studying discourse coherence on real-world Danish texts. Discourse coherence has mostly been approached with the assumption that incoherent texts can be represented by coherent texts in which sentences have been shuffled. However, incoherent real-world texts rarely resemble that. We thus present DDisCo, a dataset including text from the Danish Wikipedia and Reddit annotated for discourse coherence. We choose to annotate real-world texts instead of relying on artificially incoherent text for training and testing models. Then, we evaluate the performance of several methods, including neural networks, on the dataset.

Details

Paper ID
lrec2022-main-260
Pages
pp. 2440-2445
BibKey
flansmose-mikkelsen-etal-2022-ddisco
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • LF

    Linea Flansmose Mikkelsen

  • OK

    Oliver Kinch

  • AJ

    Anders Jess Pedersen

  • OL

    Ophélie Lacroix

Links