Back to Main Conference 2022
LREC 2022main

HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/4z3ue9xxdiv6

Abstract

The paper presents a tool for automatic marking up of quantifying expressions, their semantic features, and scopes. We explore the idea of using a BERT based neural model for the task (in this case HerBERT, a model trained specifically for Polish, is used). The tool is trained on a recent manually annotated Corpus of Polish Quantificational Expressions (Szymanik and Kieraś, 2022). We discuss how it performs against human annotation and present results of automatic annotation of 300 million sub-corpus of National Corpus of Polish. Our results show that language models can effectively recognise semantic category of quantification as well as identify key semantic properties of quantifiers, like monotonicity. Furthermore, the algorithm we have developed can be used for building semantically annotated quantifier corpora for other languages.

Details

Paper ID
lrec2022-main-773
Pages
pp. 7140-7146
BibKey
wolinski-etal-2022-herbert
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • MW

    Marcin Woliński

  • BN

    Bartłomiej Nitoń

  • WK

    Witold Kieraś

  • JS

    Jakub Szymanik

Links