Back to Main Conference 2016
LREC 2016main

Compilation of an Arabic Children’s Corpus

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/2aavb6t8cc53

Abstract

Inspired by the Oxford Children's Corpus, we have developed a prototype corpus of Arabic texts written and/or selected for children. Our Arabic Children's Corpus of 2950 documents and nearly 2 million words has been collected manually from the web during a 3-month project. It is of high quality, and contains a range of different children's genres based on sources located, including classic tales from The Arabian Nights, and popular fictional characters such as Goha. We anticipate that the current and subsequent versions of our corpus will lead to interesting studies in text classification, language use, and ideology in children's texts.

Details

Paper ID
lrec2016-main-285
Pages
pp. 1808-1812
BibKey
al-sulaiti-etal-2016-compilation
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • LA

    Latifa Al-Sulaiti

  • NA

    Noorhan Abbas

  • CB

    Claire Brierley

  • EA

    Eric Atwell

  • AA

    Ayman Alghamdi

Links