Back to Main Conference 2022
LREC 2022main

LaVA – Latvian Language Learner corpus

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/3ro4ucjsuczp

Abstract

This paper presents the Latvian Language Learner Corpus (LaVA) developed at the Institute of Mathematics and Computer Science, University of Latvia. LaVA corpus contains 1015 essays (190k tokens and 790k characters excluding whitespaces) from foreigners studying at Latvian higher education institutions and who are learning Latvian as a foreign language in the first or second semester, reaching the A1 (possibly A2) Latvian language proficiency level. The corpus has morphological and error annotations. Error analysis and the statistics of the LaVA corpus are also provided in the paper. The corpus is publicly available at: http://www.korpuss.lv/id/LaVA.

Details

Paper ID
lrec2022-main-077
Pages
pp. 727-731
BibKey
dargis-etal-2022-lava
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • RD

    Roberts Darģis

  • IA

    Ilze Auziņa

  • IK

    Inga Kaija

  • KL

    Kristīne Levāne-Petrova

  • KP

    Kristīne Pokratniece

Links