Back to Main Conference 2022
LREC 2022main

AGILe: The First Lemmatizer for Ancient Greek Inscriptions

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/4qe92nog7w74

Abstract

To facilitate corpus searches by classicists as well as to reduce data sparsity when training models, we focus on the automatic lemmatization of ancient Greek inscriptions, which have not received as much attention in this sense as literary text data has. We show that existing lemmatizers for ancient Greek, trained on literary data, are not performant on epigraphic data, due to major language differences between the two types of texts. We thus train the first inscription-specific lemmatizer achieving above 80% accuracy, and make both the models and the lemmatized data available to the community. We also provide a detailed error analysis highlighting peculiarities of inscriptions which again highlights the importance of a lemmatizer dedicated to inscriptions.

Details

Paper ID
lrec2022-main-571
Pages
pp. 5334-5344
BibKey
de-graaf-etal-2022-agile
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • Ed

    Evelien de Graaf

  • SS

    Silvia Stopponi

  • JB

    Jasper K. Bos

  • SP

    Saskia Peels-Matthey

  • MN

    Malvina Nissim

Links