Back to Main Conference 2006
LREC 2006main

The ASK Corpus - a Language Learner Corpus of Norwegian as a Second Language

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

DOI:10.63317/4uhdtbhi6d2g

Abstract

In our paper we present the design and interface of ASK, a language learner corpus of Norwegian as a second language which contains essays collected from language tests on two different proficiency levels as well as personal data from the test takers. In addition, the corpus also contains texts and relevant personal data from native Norwegians as control data. The texts as well as the personal data are marked up in XML according to the TEI Guidelines. In order to be able to classify “errors” in the texts, we have introduced new attributes to the TEI corr and sic tags. For each error tag, a correct form is also in the text annotation. Finally, we employ an automatic tagger developed for standard Norwegian, the “Oslo-Bergen Tagger”, together with a facility for manual tag correction. As corpus query system, we are using the Corpus Workbench developed at the University of Stuttgart together with a web search interface developed at Aksis, University of Bergen. The system allows for searching for combinations of words, error types, grammatical annotation and personal data.

Details

Paper ID
lrec2006-main-342
Pages
N/A
BibKey
tenfjord-etal-2006-ask
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-2-4
Conference
Fifth International Conference on Language Resources and Evaluation
Location
Genoa, Italy
Date
24 May 2006 26 May 2006

Authors

  • KT

    Kari Tenfjord

  • PM

    Paul Meurer

  • KH

    Knut Hofland

Links