Back to Main Conference 2018
LREC 2018main

The LIA Treebank of Spoken Norwegian Dialects

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/4jthgd5nvfw3

Abstract

This article presents the LIA treebank of transcribed spoken Norwegian dialects. It consists of dialect recordings made in the period between 1950--1990, which have been digitised, transcribed, and subsequently annotated with morphological and dependency-style syntactic analysis as part of the LIA (Language Infrastructure made Accessible) project at the University of Oslo. In this article, we describe the LIA material of dialect recordings and its transcription, transliteration and further morphosyntactic annotation. We focus in particular on the extension of the native NDT annotation scheme to spoken language phenomena, such as pauses and various types of disfluencies, and present the subsequent conversion of the treebank to the Universal Dependencies scheme. The treebank currently consists of 13,608 tokens, distributed over 1396 segments taken from three different dialects of spoken Norwegian. The LIA treebank annotation is an on-going effort and future releases will extend on the current data set.

Details

Paper ID
lrec2018-main-710
Pages
N/A
BibKey
ovrelid-etal-2018-lia
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • Lilja Øvrelid

  • AK

    Andre Kåsen

  • KH

    Kristin Hagen

  • AN

    Anders Nøklestad

  • PS

    Per Erik Solberg

  • JJ

    Janne Bondi Johannessen

Links