Back to Main Conference 2022
LREC 2022main

ELTE Poetry Corpus: A Machine Annotated Database of Canonical Hungarian Poetry

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/448tnsvas4fq

Abstract

ELTE Poetry Corpus is a database that stores canonical Hungarian poetry with automatically generated annotations of the poems’ structural units, grammatical features and sound devices, i.e. rhyme patterns, rhyme pairs, rhythm, alliterations and the main phonological features of words. The corpus has an open access online query tool with several search functions. The paper presents the main stages of the annotation process and the tools used for each stage. The TEI XML format of the different versions of the corpus, each of which contains an increasing number of annotation layers, is presented as well. We have also specified our own XML format for the corpus, slightly different from TEI, in order to make it easier and faster to execute queries on the corpus. We discuss the results of a manual evaluation of the quality of automatic annotation of rhythm, as well as the results of an automatic evaluation of different rule sets used for the automatic annotation of rhyme patterns. Finally, the paper gives an overview of the main functions of the online query tool developed for the corpus.

Details

Paper ID
lrec2022-main-372
Pages
pp. 3471-3478
BibKey
horvath-etal-2022-elte
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • PH

    Péter Horváth

  • PK

    Péter Kundráth

  • BI

    Balázs Indig

  • ZF

    Zsófia Fellegi

  • ES

    Eszter Szlávich

  • TB

    Tímea Borbála Bajzát

  • ZS

    Zsófia Sárközi-Lindner

  • BV

    Bence Vida

  • AK

    Aslihan Karabulut

  • MT

    Mária Timári

  • GP

    Gábor Palkó

Links