HomeLREC 2022WorkshopsSIGULlrec2022-ws-sigul-22
Back to SIGUL 2022
LREC 2022workshop

Building Open-source Speech Technology for Low-resource Minority Languages with SáMi as an Example – Tools, Methods and Experiments

Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages

DOI:10.63317/5hkhy5rqn8cy

Abstract

This paper presents a work-in-progress report of an open-source speech technology project for indigenous Sami languages. A less detailed description of this work has been presented in a more general paper about the whole GiellaLT language infrastructure, submitted to the LREC 2022 main conference. At this stage, we have designed and collected a text corpus specifically for developing speech technology applications, namely Text-to-speech (TTS) and Automatic speech recognition (ASR) for the Lule and North Sami languages. We have also piloted and experimented with different speech synthesis technologies using a miniature speech corpus as well as developed tools for effective processing of large spoken corpora. Additionally, we discuss effective and mindful use of the speech corpus and also possibilities to use found/archive materials for training an ASR model for these languages.

Details

Paper ID
lrec2022-ws-sigul-22
Pages
pp. 169-175
BibKey
hiovain-asikainen-moshagen-2022-building
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages
Location
undefined, undefined
Date
20 June 2022 25 June 2022

Authors

  • KH

    Katri Hiovain-Asikainen

  • SM

    Sjur Moshagen

Links