Back to Main Conference 2022
LREC 2022main

Pre-training and Evaluating Transformer-based Language Models for Icelandic

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/2kxqgdxtipbi

Abstract

In this paper, we evaluate several Transformer-based language models for Icelandic on four downstream tasks: Part-of-Speech tagging, Named Entity Recognition. Dependency Parsing, and Automatic Text Summarization. We pre-train four types of monolingual ELECTRA and ConvBERT models and compare our results to a previously trained monolingual RoBERTa model and the multilingual mBERT model. We find that the Transformer models obtain better results, often by a large margin, compared to previous state-of-the-art models. Furthermore, our results indicate that pre-training larger language models results in a significant reduction in error rates in comparison to smaller models. Finally, our results show that the monolingual models for Icelandic outperform a comparably sized multilingual model.

Details

Paper ID
lrec2022-main-804
Pages
pp. 7386-7391
BibKey
gudnason-loftsson-2022-pre
Editors
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis2020
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 - 25 June 2022

Authors

  • JD

    Jón Friðrik Daðason

  • HL

    Hrafn Loftsson

Links