LREC 2000 2nd International Conference on Language Resources & Evaluation
 

Previous Paper   Next Paper

Title POSCAT: A Morpheme-based Speech Corpus Annotation Tool
Authors Kim Byeongchang (Department of Computer Science & Engineering, Pohang University of Science & Technology, Pohang, 790-784, South Korea, bckim@nlp.postech.ac.kr)
Cha Jeongwon (Department of Computer Science & Engineering, Pohang University of Science & Technology, Pohang, 790-784, South Korea, jwcha@nlp.postech.ac.kr)
Lee Geunbae (Department of Computer Science & Engineering, Pohang University of Science & Technology, Pohang, 790-784, South Korea, gblee@nlp.postech.ac.kr)
Lee Jin-seok (Department of Computer Science & Engineering, Pohang University of Science & Technology, Pohang, 790-784, South Korea, wolfpack@nlp.postech.ac.kr)
Keywords Client-Server Model, Linguistic Annotation, Morpheme-Based Annotation, Signal-Level Annotation, Speech Corpus Annotation Tool
Session Session SO3 - Speech Synthesis
Full Paper 224.ps, 224.pdf
Abstract As more and more speech systems require linguistic knowledge to accommodate various levels of applications, corpora that are tagged with linguistic annotations as well as signal-level annotations are highly recommended for the development of today’s speech systems. Among the linguistic annotations, POS (part-of-speech) tag annotations are indispensable in speech corpora for most modern spoken language applications of morphologically complex agglutinative languages such as Korean. Considering the above demands, we have developed a single unified speech corpus annotation tool that enables corpus builders to link linguistic annotations to signal-level annotations using a morphological analyzer and a POS tagger as basic morpheme-based linguistic engines. Our tool integrates a syntactic analyzer, phrase break detector, grapheme-to-phoneme converter and automatic phonetic aligner together. Each engine automatically annotates its own linguistic and signal knowledge, and interacts with the corpus developers to revise and correct the annotations on demand. All the linguistic/phonetic engines were developed and merged with an interactive visualization tool in a client-server network communication model. The corpora that can be constructed using our annotation tool are multi-purpose and applicable to both speech recognition and text-to-speech (TTS) systems. Finally, since the linguistic and signal processing engines and user interactive visualization tool are implemented within a client-server model, the system loads can be reasonably distributed over several machines.