Back to Main Conference 2026
LREC 2026main

CorSpell: Introducing a Semiautomatic Tool for Spelling Normalization in Brazilian Portuguese

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/22uey2oj2w49

Abstract

With the growing availability of large text collections, efficient tools for corpus annotation and normalization have become increasingly important in linguistic and computational research. This paper presents CorSpell, a semiautomatic tool developed to support the spelling normalization of Brazilian Portuguese texts within the CorCel project—a corpus comprising over 15,000 handwritten exam responses from the Celpe-Bras proficiency test. Given the corpus scale, manual normalization is impractical; CorSpell streamlines this process by enabling users to visualize, select, and replace tokens directly through an intuitive web interface. The tool integrates automatic suggestions from PT-BR dictionaries with human validation, providing an interface for users to access and manipulate the texts. CorSpell significantly reduces annotation time, minimizes errors, and facilitates collaborative work, providing a practical and scalable solution for corpus normalization and a foundation for LLM-based modeling of Portuguese proficiency.

Details

Paper ID
lrec2026-main-130
Pages
pp. 1659-1667
BibKey
schoffen-etal-2026-corspell
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • JS

    Juliana Schoffen

  • DB

    Dennis Giovani Balreira

  • ES

    Elisa Marchioro Stumpf

  • LG

    Larissa Goulart

  • TK

    Tanara Zingano Kuhn

  • RN

    Rafael Oleques Nunes

  • GP

    Gabriel Ricci Pazzinato

  • IH

    Isadora Dahmer Hanauer

  • JS

    José Henrique de Souza Silva

  • LD

    Luiza Sarmento Divino

  • MM

    Marine Matte

Links