Back to Main Conference 2000
LREC 2000main

POSCAT: A Morpheme-based Speech Corpus Annotation Tool

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/4k2w62rpbxsa

Abstract

As more and more speech systems require linguistic knowledge to accommodate various levels of applications, corpora that are tagged with linguistic annotations as well as signal-level annotations are highly recommended for the development of today’s speech systems. Among the linguistic annotations, POS (part-of-speech) tag annotations are indispensable in speech corpora for most modern spoken language applications of morphologically complex agglutinative languages such as Korean. Considering the above demands, we have developed a single unified speech corpus annotation tool that enables corpus builders to link linguistic annotations to signal-level annotations using a morphological analyzer and a POS tagger as basic morpheme-based linguistic engines. Our tool integrates a syntactic analyzer, phrase break detector, grapheme-to-phoneme converter and automatic phonetic aligner together. Each engine automatically annotates its own linguistic and signal knowledge, and interacts with the corpus developers to revise and correct the annotations on demand. All the linguistic/phonetic engines were developed and merged with an interactive visualization tool in a client-server network communication model. The corpora that can be constructed using our annotation tool are multi-purpose and applicable to both speech recognition and text-to-speech (TTS) systems. Finally, since the linguistic and signal processing engines and user interactive visualization tool are implemented within a client-server model, the system loads can be reasonably distributed over several machines.

Details

Paper ID
lrec2000-main-171
Pages
N/A
BibKey
kim-etal-2000-poscat
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • BK

    Byeongchang Kim

  • JL

    Jin-seok Lee

  • JC

    Jeongwon Cha

  • GL

    Geunbae Lee

Links