IndEuph-170: Benchmarking Cultural Pragmatics through Euphemism Detection in Indian English

Proceedings of the 8th Workshop on Indian Language Data: Resources and Evaluation

Abstract

Large Language Models (LLMs) have shown remarkable proficiency in standard English benchmarks, yet their ability to navigate the sociopragmatic cues of non-Western English varieties remains underexplored. This paper introduces IndEuph-170, a novel benchmark dataset focused on Indian English (IndE) euphemisms — expressions whose roots lie in local social hierarchies, politeness norms, and cultural taboos (e.g., "setting," "loose character," "suitable boy"). IndEuph-170 comprises 170 curated IndE sentences, against which the performance of two distinct architectures was evaluated: a fine-tuned BART model and GPT-4. The findings reveal a significant "cultural gap". While GPT-4 achieves 82.5% accuracy, it struggles with authoritative and punitive nuances. BART achieves 55.3% accuracy but exhibits a high rate of false positives by over-classifying general Indianisms as euphemisms. The paper argues that current multilingual benchmarks such as MME (Fu et al., 2025) and GLUE (Wang et al., 2018) fail to capture these dialectal pragmatics, and that a culturally-aware evaluation framework for Global Englishes is necessary.