Back to Main Conference 2022
LREC 2022main

Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/5af6dqunqpfv

Abstract

In this paper we present the Serbian part of the ELTeC multilingual corpus of novels written in the time period 1840-1920. The corpus is being built in order to test various distant reading methods and tools with the aim of re-thinking the European literary history. We present the various steps that led to the production of the Serbian sub-collection: the novel selection and retrieval, text preparation, structural annotation, POS-tagging, lemmatization and named entity recognition. The Serbian sub-collection was published on different platforms in order to make it freely available to various users. Several use examples show that this sub-collection is usefull for both close and distant reading approaches.

Details

Paper ID
lrec2022-main-356
Pages
pp. 3337-3345
BibKey
stankovic-etal-2022-distant
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • RS

    Ranka Stanković

  • CK

    Cvetana Krstev

  • Branislava Šandrih Todorović

  • DV

    Dusko Vitas

  • MS

    Mihailo Skoric

  • MI

    Milica Ikonić Nešić

Links