Back to Main Conference 2016
LREC 2016main

NorGramBank: A ‘Deep’ Treebank for Norwegian

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/47fb65th6mxw

Abstract

We present NorGramBank, a treebank for Norwegian with highly detailed LFG analyses. It is one of many treebanks made available through the INESS treebanking infrastructure. NorGramBank was constructed as a parsebank, i.e. by automatically parsing a corpus, using the wide coverage grammar NorGram. One part consisting of 350,000 words has been manually disambiguated using computer-generated discriminants. A larger part of 50 M words has been stochastically disambiguated. The treebank is dynamic: by global reparsing at certain intervals it is kept compatible with the latest versions of the grammar and the lexicon, which are continually further developed in interaction with the annotators. A powerful query language, INESS Search, has been developed for search across formalisms in the INESS treebanks, including LFG c- and f-structures. Evaluation shows that the grammar provides about 85% of randomly selected sentences with good analyses. Agreement among the annotators responsible for manual disambiguation is satisfactory, but also suggests desirable simplifications of the grammar.

Details

Paper ID
lrec2016-main-565
Pages
pp. 3555-3562
BibKey
dyvik-etal-2016-norgrambank
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • HD

    Helge Dyvik

  • PM

    Paul Meurer

  • VR

    Victoria Rosén

  • KD

    Koenraad De Smedt

  • PH

    Petter Haugereid

  • GL

    Gyri Smørdal Losnegaard

  • GL

    Gunn Inger Lyse

  • MT

    Martha Thunes

Links