Back to Main Conference 2024
LREC-COLING 2024main

CLAUSE-ATLAS: A Corpus of Narrative Information to Scale up Computational Literary Analysis

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/34wjjnqt76mt

Abstract

We introduce CLAUSE-ATLAS, a resource of XIX and XX century English novels annotated automatically. This corpus, which contains 41,715 labeled clauses, allows to study stories as sequences of eventive, subjective and contextual information. We use it to investigate if recent large language models, in particular gpt-3.5-turbo with 16k tokens of context, constitute promising tools to annotate large amounts of data for literary studies (we show that this is the case). Moreover, by analyzing the annotations so collected, we find that our clause-based approach to literature captures structural patterns within books, as well as qualitative differences between them.

Details

Paper ID
lrec2024-main-0292
Pages
pp. 3283-3296
BibKey
troiano-vossen-2024-clause
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • ET

    Enrica Troiano

  • PV

    Piek T.J.M. Vossen

Links