Back to Main Conference 2026
LREC 2026main

Transcription Accuracy in the Icelandic Gigaword Corpus: Evaluating Automatic and Manual Annotation

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4f2rpzig5h8p

Abstract

This paper aims to compare automatic and manually corrected annotation data in the Icelandic Gigaword Corpus. We focus on the variable use of Stylistic Fronting (SF) in Icelandic, an optional movement of words or phrases, which indicates a more formal style. Examining SF rates across time, we find that manual coding results in slightly lower SF rates than automatic coding. This difference can be explained by the different sources used in the coding process: For automatic coding, written transcripts compiled by parliament employees are used, and for manual correction, coding relies on audio files of the parliament speeches. Importantly, both types of coding are well suited to trace changing patterns of SF over a span of 16 years, suggesting that the automatic feature extraction reliably reflects the speeches that have been transcribed.

Details

Paper ID
lrec2026-main-373
Pages
pp. 4757-4764
BibKey
mechler-etal-2026-transcription
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • JM

    Johanna Mechler

  • LS

    Lilja Björk Stefánsdóttir

  • AI

    Anton Karl Ingason

Links