Arabic ChartSumm: An English-to-Arabic Benchmark for Metadata-to-Text Summarization
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
Generating summaries from chart metadata in Arabic presents unique challenges at the intersection of cross-lingual transfer and data-to-text generation. Chart-to-text benchmarks have advanced English-language research, yet Arabic remains without a comparable resource, underscoring its continued underrepresentation in NLP. To cover this gap, we construct the first Arabic ChartSumm benchmark by translating chart metadata and reference summaries from English into Modern Standard Arabic (MSA). Two high-quality machine translation models with contrasting architectures are employed: NLLB-200-distilled-600M, designed for low-resource coverage, and Qwen2.5-1.5B, an open large language model with general multilingual capabilities. A central contribution of this work is a translation quality evaluation that systematically assesses both systems using BLEU, chrF, COMET_ref, and COMET_QE metrics against a Google-Translate Arabic pivot. Results demonstrate that NLLB achieves markedly higher lexical and semantic fidelity. Building on this foundation, we fine-tune two models, mT5 (multilingual) and CAMeL-Lab’s AraBART (Arabic-specific), to generate Arabic summaries from structured chart metadata. Experimental results show that AraBART trained on NLLB translations outperforms other configurations, achieving ROUGE-L = 63.8 and BLEU = 33.1, highlighting the strong dependency of downstream summarization quality on translation accuracy and demonstrating its superior capacity for Arabic generation.