CANVAS: A Multimodal Dataset of Chinese Textbook Images for Bias and Representation Analysis

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

Social biases in educational materials can subtly shape students’ perceptions of social roles and participation. However, most existing bias benchmarks for Chinese language models focus on text or isolated images, overlooking the multimodal scenes commonly found in educational textbooks. To address this gap, we introduce CANVAS (Chinese ANnotated Visual And Social scenes), a multimodal dataset constructed from Chinese elementary science textbooks and annotated across multiple social dimensions. CANVAS provides fine-grained labels for each depicted character’s demographics, social roles, interactions, and power-related attributes within visual scenes. The dataset is created using a semi-automated pipeline in which a vision–language model generates preliminary structured annotations that are subsequently verified and refined by human annotators. The current release focuses on the Grade 6 science subset and serves as an initial annotated version of the dataset. Using this subset, we present an illustrative case study demonstrating how scene-level and interactional annotations in CANVAS can be used to analyze gender representation in textbook images. By extending bias analysis to full educational scenes, CANVAS provides a new resource for studying representation and fairness in multimodal educational materials and supports future research in NLP, computer vision, and education.