Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

Abstract

We present GTP-SW3, a 3.5 billion parameter autoregressive language model, trained on a newly created 100 GB Swedish corpus. This paper provides insights with regards to data collection and training, while highlights the challenges of proper model evaluation. The results of quantitive evaluation through perplexity indicate that GPT-SW3 is a competent model in comparison with existing autoregressive models of similar size. Additionally, we perform an extensive prompting study which reveals the good text generation capabilities of GTP-SW3.

Resources

Details

Paper ID

lrec2022-main-376

Pages

pp. 3509-3518

DOI

10.63317/2u9szjhk3jwh

BibKey

ekgren-etal-2022-lessons

Editors

Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis2020

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

79-10-95546-38-2

Conference

Thirteenth Language Resources and Evaluation Conference

Location

Marseille, France

Date

20 - 25 June 2022

Authors

AE
Ariel Ekgren
AC
Amaru Cuba Gyllensten
EG
Evangelia Gogoulou
AH
Alice Heiman
SV
Severine Verlinden
JÖ
Joey Öhman
FC
Fredrik Carlsson
MS
Magnus Sahlgren

Links

URL

DOI