HomeLREC 2026WorkshopsHTRESlrec2026-ws-htres-10
Back to HTRES 2026
LREC 2026workshop

Evaluating Automatic Speech Recognition for Holocaust Testimonies: A Large-Scale Analysis of Whisper Performance on the Fortunoff Video Archive

Proceedings of The Second Workshop on Holocaust Testimonies as Language Resources (HTRes)

DOI:10.63317/5gtr6wvyxg7y

Abstract

Holocaust testimonies are key primary sources documenting survivors’ experiences, yet many remain inaccessible due to the labor-intensive nature of manual transcription. This paper presents a comprehensive evaluation of OpenAI’s Whisper automatic speech recognition (ASR) system on 1,847 testimonies from the Fortunoff Video Archive for Holocaust Testimonies at Yale University. We assess transcription quality across multiple languages including English, French, German, Hebrew, Yiddish, Ladino, Slovak, and American Sign Language (with English voice-over), using human-reviewed captions as ground truth. Our analysis reveals a mean Word Error Rate (WER) of 15.28%, with 90.9% of testimonies achieving "Fair" or better quality (WER ≤25%). We identify systematic error patterns including challenges with disfluencies, interrupted speech, and language-specific orthographic conventions, particularly in Ladino, where Whisper’s normalization to modern Spanish orthography creates systematic divergences from traditional Judeo-Spanish spelling. For Hebrew and Yiddish, we evaluate specialized models from ivrit-ai and find promising results for heritage language preservation. Our findings demonstrate that current ASR technology can substantially accelerate Holocaust testimony transcription while highlighting the need for domain-specific fine-tuning and post-processing for optimal results.

Details

Paper ID
lrec2026-ws-htres-10
Pages
pp. 84-92
BibKey
mattingly-etal-2026-evaluating
Editors
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of The Second Workshop on Holocaust Testimonies as Language Resources (HTRes)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • WM

    William J.B. Mattingly

  • CB

    Christy Bailey-Tomecek

Links