Back to Main Conference 2026
LREC 2026main

Corruption-Based Data Augmentation for Arabic Essay Scoring: A Preliminary Study on the Organization Trait

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/5kt6mmyaumav

Abstract

Despite significant advances in Automated Essay Scoring (AES), progress in Arabic AES remains limited by the scarcity and imbalance of publicly available datasets. Manual curation of such data is labor-intensive and lacks scalability. To address this, we introduce COrE, a corruption-based data augmentation method that targets the organization trait of Arabic essays. COrE generates synthetic essays by intentionally disrupting the organization of well-written essays through controlled, distance-aware sentence swapping. Our experiments are conducted on TAQAE, a dataset of 620 essays across 4 distinct writing prompts. We evaluate the effectiveness of COrE using two widely-adopted pre-trained models: AraBERTv2 and CAMeLBERT-mix. Both models show improved performance with COrE, achieving gains of 9-17% over the no-augmentation baseline. These results highlight the potential of trait-specific augmentation to address data scarcity and enhance AES performance for low-resource languages.

Details

Paper ID
lrec2026-main-825
Pages
pp. 10525-10531
BibKey
bashendy-etal-2026-corruption
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • MB

    May Saed Bashendy

  • TE

    Tamer Elsayed

Links