HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

Abstract

The paper presents a tool for automatic marking up of quantifying expressions, their semantic features, and scopes. We explore the idea of using a BERT based neural model for the task (in this case HerBERT, a model trained specifically for Polish, is used). The tool is trained on a recent manually annotated Corpus of Polish Quantificational Expressions (Szymanik and Kieraś, 2022). We discuss how it performs against human annotation and present results of automatic annotation of 300 million sub-corpus of National Corpus of Polish. Our results show that language models can effectively recognise semantic category of quantification as well as identify key semantic properties of quantifiers, like monotonicity. Furthermore, the algorithm we have developed can be used for building semantically annotated quantifier corpora for other languages.

Resources

Details

Paper ID

lrec2022-main-773

Pages

pp. 7140-7146

DOI

10.63317/4z3ue9xxdiv6

BibKey

wolinski-etal-2022-herbert

Editors

Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis2020

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

79-10-95546-38-2

Conference

Thirteenth Language Resources and Evaluation Conference

Location

Marseille, France

Date

20 - 25 June 2022

Authors

MW
Marcin Woliński
BN
Bartłomiej Nitoń
WK
Witold Kieraś
JS
Jakub Szymanik

Links

URL

DOI