Back to Main Conference 2016
LREC 2016main

CoRuSS - a New Prosodically Annotated Corpus of Russian Spontaneous Speech

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/2awnajgan2bn

Abstract

This paper describes speech data recording, processing and annotation of a new speech corpus CoRuSS (Corpus of Russian Spontaneous Speech), which is based on connected communicative speech recorded from 60 native Russian male and female speakers of different age groups (from 16 to 77). Some Russian speech corpora available at the moment contain plain orthographic texts and provide some kind of limited annotation, but there are no corpora providing detailed prosodic annotation of spontaneous conversational speech. This corpus contains 30 hours of high quality recorded spontaneous Russian speech, half of it has been transcribed and prosodically labeled. The recordings consist of dialogues between two speakers, monologues (speakers' self-presentations) and reading of a short phonetically balanced text. Since the corpus is labeled for a wide range of linguistic - phonetic and prosodic - information, it provides basis for empirical studies of various spontaneous speech phenomena as well as for comparison with those we observe in prepared read speech. Since the corpus is designed as a open-access resource of speech data, it will also make possible to advance corpus-based analysis of spontaneous speech data across languages and speech technology development as well.

Details

Paper ID
lrec2016-main-309
Pages
pp. 1949-1954
BibKey
kachkovskaia-etal-2016-coruss
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • TK

    Tatiana Kachkovskaia

  • DK

    Daniil Kocharov

  • PS

    Pavel Skrelin

  • NV

    Nina Volskaya

Links