Back to Main Conference 2024
LREC-COLING 2024main

NB Uttale: A Norwegian Pronunciation Lexicon with Dialect Variation

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/5nz3oqvhjqcd

Abstract

We present a Norwegian pronunciation lexicon with Bokmål orthographic word forms and up to eight alternate phonological transcriptions per word form. The lexicon covers dialectal variations for five geographical areas, as well as pronunciation variations for spontaneous and manuscript-read speech. It is based on the NST Bokmål lexicon for East Norwegian, whose original phonological transcriptions have been corrected, before they were converted with dialect specific regular expression rules. To evaluate the quality and consistency of the new, rule-generated transcriptions, we trained grapheme-to phoneme (G2P) models and report our results with word- (WER) and phoneme-error-rate (PER) metrics. We found that the G2P models trained on lexica for Southwest and West Norwegian close-to written transcriptions have the lowest WER scores, and that all error-corrected, close-to-written lexica yield better WER scores than the original NST lexicon. The lexicon is available under an open license, and can be used for various language technology applications and in linguistic research.

Details

Paper ID
lrec2024-main-1056
Pages
pp. 12087-12092
BibKey
rosok-dale-2024-nb
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • MR

    Marie Iversdatter Røsok

  • ID

    Ingerid Løyning Dale

Links