Back to Main Conference 2022
LREC 2022main

LeSpell - A Multi-Lingual Benchmark Corpus of Spelling Errors to Develop Spellchecking Methods for Learner Language

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/4a8whkaxno7j

Abstract

Spellchecking text written by language learners is especially challenging because errors made by learners differ both quantitatively and qualitatively from errors made by already proficient learners. We introduce LeSpell, a multi-lingual (English, German, Italian, and Czech) evaluation data set of spelling mistakes in context that we compiled from seven underlying learner corpora. Our experiments show that existing spellcheckers do not work well with learner data. Thus, we introduce a highly customizable spellchecking component for the DKPro architecture, which improves performance in many settings.

Details

Paper ID
lrec2022-main-073
Pages
pp. 697-706
BibKey
bexte-etal-2022-lespell
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • MB

    Marie Bexte

  • RL

    Ronja Laarmann-Quante

  • AH

    Andrea Horbach

  • TZ

    Torsten Zesch

Links