Fine-tuning Whisper with Spontaneous Persian Speech (SPS)

Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages

DOI:10.63317/2ca2yoj8fzgd

Abstract

This paper introduces the Spontaneous Persian Speech (SPS) dataset designed for automatic speech recognition (ASR) tasks and a methodology laying the groundwork for addressing the shortage of spontaneous speech data. The corpus aims to support research on natural and conversational Persian, which remains under-represented in current ASR resources. The dataset consists of 694 minutes of audio from a total of 65 speakers, including 34 male and 31 female speakers. It contains 526,585 tokens. The audio segmentation step produces intervals of 1.24 to 3.25 seconds, each containing 3 to 9 words. The recordings cover a variety of environments, from inside cars to homes and shopping areas, including both busy and quiet settings. We use the SPS dataset to fine-tune Whisper and the performance increases significantly for both the small and medium models based on Word Error Rate (WER). This could be an initiative toward building domain-oriented datasets for specific ASR tasks.

Resources

Details

Paper ID

lrec2026-ws-sigul-26

Pages

pp. 263-269

DOI

10.63317/2ca2yoj8fzgd

BibKey

namdarzadeh-etal-2026-fine

Editors

Atul Kr. Ojha, Sakriani Sakti, Claudia Soria, Maite Melero, John P. McCrae, Constantine Lignos, Chao-Hong Liu, German Rigau Claramunt, Georg Rehm

Publisher

European Language Resources Association (ELRA)

ISSN

N/A

ISBN

N/A

Workshop

Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages

Location

Palma, Mallorca, Spain

Date

11 - 16 May 2026

Authors

BN
Behnoosh Namdarzadeh
NB
Nicolas Ballier

Links

URL

DOI