Back to Main Conference 2018
LREC 2018main
Researching Less-Resourced Languages – the DigiSami Corpus
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Abstract
Increased use of digital devices and data repositories has enabled a digital revolution in data collection and language research, and has also led to important activities supporting speech and language technology research for less-resourced languages. This paper describes the DigiSami project and its research results, focussing on spoken corpus collection and speech technology for the Fenno-Ugric language North Sami. The paper also discusses multifaceted questions on ethics and privacy related to data collection for less-resourced languages and indigenous communities.