HomeLREC 2026WorkshopsCHIPSALlrec2026-ws-chipsal-07
Back to CHIPSAL 2026
LREC 2026workshop

Cross-Domain Evaluation of Transformer-Based Models for Punjabi Speech Emotion Recognition

Proceedings of the Second workshop on Challenges in Processing South Asian Languages (CHiPSAL2026)

DOI:10.63317/3ao5m9p6ni9x

Abstract

Speech Emotion Recognition (SER) is an important part of human–computer interaction, but most existing research focuses on high-resource languages, with very limited work on regional languages such as Punjabi. This paper focuses on detecting emotions from Punjabi speech using machine learning and deep learning techniques. We curated our own Punjabi speech emotion dataset using volunteer recordings and real-world sources, covering four emotion classes: angry, happy, sad, and neutral. The data was preprocessed for consistency and evaluated using a multi-strategy framework (E1–E4) to test domain generalization. Three models were evaluated: CNN, ResNet-34, and the transformer-based Wav2Vec 2.0. Among these, the ResNet-34 model performed the best in the combined-domain strategy (E4), achieving a test accuracy of 96%. While cross-corpus evaluations (E2, E3) highlighted challenges in generalizing to neutral emotions, the model achieved perfect scores for happy and sad classes in E4. These results demonstrate the effectiveness of residual networks and combined-domain training for emotion recognition in low-resource languages and highlight the potential for further work on Punjabi SER.

Details

Paper ID
lrec2026-ws-chipsal-07
Pages
pp. 59-67
BibKey
tuzahra-etal-2026-cross
Editors
Kengatharaiyer Sarveswaran, Ashwini Vaidya
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Second workshop on Challenges in Processing South Asian Languages (CHiPSAL2026)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • FT

    Fatima Tu Zahra

  • KA

    Kulsoom Asim

  • SK

    Sandesh Kumar

  • AS

    Abdul Samad

Links