Back to Main Conference 2018
LREC 2018main

A Leveled Reading Corpus of Modern Standard Arabic

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/2vxe57mwjr49

Abstract

We present a reading corpus in Modern Standard Arabic to enrich the sparse collection of resources that can be leveraged for educational applications. The corpus consists of textbook material from the curriculum of the United Arab Emirates, spanning all 12 grades (1.4 million tokens) and a collection of 129 unabridged works of fiction (5.6 million tokens) all annotated with reading levels from Grade 1 to Post-secondary. We examine reading progression in terms of lexical coverage, and compare the two sub-corpora (curricular, fiction) to others from clearly established genres (news, legal/diplomatic) to measure representation of their respective genres.

Details

Paper ID
lrec2018-main-366
Pages
N/A
BibKey
al-khalil-etal-2018-leveled
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • MA

    Muhamed Al Khalil

  • HS

    Hind Saddiki

  • NH

    Nizar Habash

  • LA

    Latifa Alfalasi

Links