AusKidTalk: Developing Transcription Guidelines for Continuous Australian English Child Speech

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

Guidelines are required for accurate and consistent transcription of speech corpora, especially when they contain more challenging, e.g. spontaneous or under-resourced speech. This paper presents a workflow and guidelines for transcribing spontaneous and under-resourced child speech in AusKidTalk, the first Australian English child corpus. Speech samples were elicited using a story-telling task and are 3.5 minutes long per child on average. Orthographic transcriptions were generated using automatic speech recognition (ASR) tools and corrected manually. A novel hand-correction protocol consisting of guidelines, hand-correction interface, and ground truth transcriptions together with consistency metrics were developed. Nine annotators submitted hand-corrections for 261 children’s story-telling task, and 25 ground truth tasks. Manual correction was 11-fold of speech time with a 3.5-minute-long story-telling task corrected in approximately 40 minutes. Efficiency is attributed to the quality of automatic transcription with 23% word error rate. Manual correction was accurate with annotators achieving consistent results on 15/25 ground truth submissions. Most inconsistent ground truth submissions were caused by a single, challenging ground truth task. These results show that our workflow yields efficient and accurate transcriptions, although transcriptions of potentially more challenging narrative tasks (e.g., elicited from younger children) might require further corrections.