Back to Main Conference 2012
LREC 2012main

A Corpus of Spontaneous Multi-party Conversation in Bosnian Serbo-Croatian and British English

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/3mfu3oup8d7m

Abstract

In this paper we present a corpus of audio and video recordings of spontaneous, face-to-face multi-party conversation in two languages. Freely available high quality recordings of mundane, non-institutional, multi-party talk are still sparse, and this corpus aims to contribute valuable data suitable for study of multiple aspects of spoken interaction. In particular, it constitutes a unique resource for spoken Bosnian Serbo-Croatian (BSC), an under-resourced language with no spoken resources available at present. The corpus consists of just over 3 hours of free conversation in each of the target languages, BSC and British English (BE). The audio recordings have been made on separate channels using head-set microphones, as well as using a microphone array, containing 8 omni-directional microphones. The data has been segmented and transcribed using segmentation notions and transcription conventions developed from those of the conversation analysis research tradition. Furthermore, the transcriptions have been automatically aligned with the audio at the word and phone level, using the method of forced alignment. In this paper we describe the procedures behind the corpus creation and present the main features of the corpus for the study of conversation.

Details

Paper ID
lrec2012-main-282
Pages
pp. 1323-1327
BibKey
kurtic-etal-2012-corpus
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • EK

    Emina Kurtić

  • BW

    Bill Wells

  • GB

    Guy J. Brown

  • TK

    Timothy Kempton

  • AA

    Ahmet Aker

Links