Back to RAIL 2024
LREC-COLING 2024workshop

Compiling a List of Frequently Used Setswana Words for Developing Readability Measures

Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024

DOI:10.63317/5kh2heqpogg9

Abstract

This paper addresses the pressing need for improved readability assessment in Setswana through the creation of a list of frequently used words in Setswana. The end goal is to integrate this list into the adaptation of traditional readability measures in Setswana, such as the Dale-Chall index, which relies on frequently used words. Our initial list is developed using corpus-based methods utilising frequency lists obtained from five sets of corpora. It is then refined using manual methods. The analysis section delves into the challenges encountered during the development of the final list, encompassing issues like the inclusion of non-Setswana words, proper names, unexpected terms, and spelling variations. The decision-making process is clarified, highlighting crucial choices such as the retention of contemporary terms and the acceptance of diverse spelling variations. These decisions reflect a nuanced balance between linguistic authenticity and readability. This paper contributes to the discourse on text readability in indigenous Southern African languages. Moreover, it establishes a foundation for tailored literacy initiatives and serves as a starting point for adapting traditional frequency-list-based readability measures to Setswana.

Details

Paper ID
lrec2024-ws-rail-05
Pages
pp. 37-44
BibKey
sibeko-2024-compiling
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024
Location
undefined, undefined
Date
20 May 2024 25 May 2024

Authors

  • JS

    Johannes Sibeko

Links