HomeLREC 2020WorkshopsWILDRElrec2020-ws-wildre-05
Back to WILDRE 2020
LREC 2020workshop

Malayalam Speech Corpus: Design and Development for Dravidian Language

Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation

DOI:10.63317/3ds5rwe37yiw

Abstract

To overpass the disparity between theory and applications in language-related technology in the text as well as speech and several other areas, a well-designed and well-developed corpus is essential. Several problems and issues encountered while developing a corpus, especially for low resource languages. The Malayalam Speech Corpus (MSC) is one of the first open speech corpora for Automatic Speech Recognition (ASR) research to the best of our knowledge. It consists of 250 hours of Agricultural speech data. We are providing a transcription file, lexicon and annotated speech along with the audio segment. It is available in future for public use upon request at “www.iiitmk.ac.in/vrclc/utilities/ml_speechcorpus”. This paper details the development and collection process in the domain of agricultural speech corpora in the Malayalam Language.

Details

Paper ID
lrec2020-ws-wildre-05
Pages
pp. 25-28
BibKey
k-r-etal-2020-malayalam
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation
Location
undefined, undefined
Date
11 May 2020 16 May 2020

Authors

  • LK

    Lekshmi K R

  • JV

    Jithesh V S

  • ES

    Elizabeth Sherly

Links