Back to RAIL 2024
LREC-COLING 2024workshop

Long-Form Recordings to Study Children’s Language Input and Output in Under-Resourced Contexts

Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024

DOI:10.63317/2m7ikarcj8iy

Abstract

A growing body of research suggests that young children’s early speech and language exposure is associated with later language development (including delays and diagnoses), school readiness, and academic performance. The last decade has seen increasing use of child-worn devices to collect long-form audio recordings by educators, economists, and developmental psychologists. The most commonly used system for analyzing this data is LENA, which was trained on North American English child-centered data and generates estimates of children’s speech-like vocalization counts, adult word counts, and child-adult turn counts. Recently, cheaper and open-source non-LENA alternatives with multilingual training have been proposed. Both kinds of systems have been employed in under-resourced, sometimes multilingual contexts, including Africa where access to printed or digital linguistic resources may be limited. In this paper, we describe each kind of system (LENA, non-LENA), provide information on audio data collected with them that is available for reuse, review evidence of the accuracy of extant automated analyses, and note potential strengths and shortcomings of their use in African communities.

Details

Paper ID
lrec2024-ws-rail-03
Pages
pp. 20-31
BibKey
coffey-cristia-2024-long
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024
Location
undefined, undefined
Date
20 May 2024 25 May 2024

Authors

  • JC

    Joseph R. Coffey

  • AC

    Alejandrina Cristia

Links