Back to Main Conference 2012
LREC 2012main

Spell Checking in Spanish: The Case of Diacritic Accents

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/36otbx59r95t

Abstract

This article presents the problem of diacritic restoration (or diacritization) in the context of spell-checking, with the focus on an orthographically rich language such as Spanish. We argue that despite the large volume of work published on the topic of diacritization, currently available spell-checking tools have still not found a proper solution to the problem in those cases where both forms of a word are listed in the checker's dictionary. This is the case, for instance, when a word form exists with and without diacritics, such as continuo ‘continuous' and continuó ‘he/she/it continued', or when different diacritics make other word distinctions, as in continúo ‘I continue'. We propose a very simple solution based on a word bigram model derived from correctly typed Spanish texts and evaluate the ability of this model to restore diacritics in artificial as well as real errors. The case of diacritics is only meant to be an example of the possible applications for this idea, yet we believe that the same method could be applied to other kinds of orthographic or even grammatical errors. Moreover, given that no explicit linguistic knowledge is required, the proposed model can be used with other languages provided that a large normative corpus is available.

Details

Paper ID
lrec2012-main-440
Pages
pp. 737-742
BibKey
atserias-etal-2012-spell
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • JA

    Jordi Atserias

  • MF

    Maria Fuentes

  • RN

    Rogelio Nazar

  • IR

    Irene Renau

Links