HomeLREC 2026WorkshopsOSACTlrec2026-ws-osact-10
Back to OSACT 2026
LREC 2026workshop

SHEINfer: Implicit Product Category Inference from Arabic E-commerce Reviews

The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) with 5 Shared Tasks

DOI:10.63317/4t8cfnk4ki9a

Abstract

We introduce SHEINfer, a novel task and dataset for inferring product categories from Arabic e-commerce reviews without explicit product mentions. Unlike traditional product classification that relies on product titles or descriptions, our task requires models to deduce product types solely from customer review text, which often contains implicit references through dialectal expressions, quality assessments, and contextual clues. We present a dataset of 801 Arabic reviews from the SHEIN e-commerce website, dual-annotated across 11 product categories with 515 agreed samples achieving moderate inter-annotator agreement (Cohen’s κ = 0.60). Given the relatively small dataset size, we employ 5-fold stratified cross-validation for all models to ensure robust performance estimates. Our experiments compare traditional machine learning approaches (TF-IDF with SVM and Logistic Regression), Arabic transformer models (AraBERT, CAMeLBERT, MARBERT), and large language models (GPT-4o-mini) in zero-shot and few-shot settings. Results show that MARBERT achieves the highest accuracy (0.586 ± 0.026), while TF-IDF with Logistic Regression achieves the best macro F1 (0.417 ± 0.056), indicating better performance across minority categories. GPT-4o-mini demonstrates poor zero-shot performance (0.064 accuracy) with modest improvement in 3-shot settings (0.186 accuracy), indicating that implicit product inference from dialectal Arabic text remains challenging for general-purpose LLMs. Our findings highlight the unique challenges of implicit product classification in Arabic e-commerce and establish benchmarks for future research in this underexplored area.

Details

Paper ID
lrec2026-ws-osact-10
Pages
pp. 81-87
BibKey
alkhalifa-2026-sheinfer
Editors
Hend Al-Khalifa, Mo El-Haj, Saad Ezzini
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) with 5 Shared Tasks
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • HA

    Hend Al-Khalifa

Links