STAR-IL: A Dataset for Style-Aware Machine Translation of Product Reviews in Indian Languages

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

Product reviews on e-commerce platforms are a critical form of user-generated content that influence consumer decisions. However, these reviews are predominantly in English, creating a significant accessibility barrier for users who are not fluent in English. When translating into major Indian languages using the current models, the outputs often fail to capture domain-specific features and colloquial style, resulting in stylistically unnatural texts. To address this gap, we introduce **STAR-IL**, a human-annotated, multilingual, parallel corpus for style-aware translation of product reviews. We evaluate the performance of several state-of-the-art models on our dataset for the task of product review translation. Our experiments show that models fine-tuned on STAR-IL achieve significant average performance gain of **5.77** points in BLEU and **3.78** points in COMET, when compared to their baselines, across all languages. Our dataset provides a valuable benchmark for future research in style-aware product review translation. The STAR-IL dataset is publicly available at https://github.com/ltrc/STAR-IL-Corpus.