The Use of Text Alignment in Semi-Automatic Error Analysis: Use Case in the Development of the Corpus of the Latvian Language Learners

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Abstract

This article presents a different method for creation of error annotated corpora. The approach suggested in this paper consists of multiple parts - text correction, automated morphological analysis, automated text alignment and error annotation. Error annotation can easily be semi-automated with a rule-based system, similar to the one used in this paper. The text correction can also be semi-automated using a rule-based system or even machine learning. The use of the text correction, word, and letter alignment enables more in-depth analysis of errors types, providing opportunities for quantitative research. The proposed method has been approbated in the development of the corpus of the Latvian language learners. Spelling, punctuation, grammatical, syntactic and lexical errors are annotated in the corpus. Text that is not understandable is marked as unclear for additional analysis. The method can easily be adapted for the development of error corpora in any other languages with relatively free word order. The highest gain from this method will be for highly inflected languages with rich morphology.

Resources

Details

Paper ID

lrec2018-main-648

Pages

N/A

DOI

10.63317/32pow6apdbqv

BibKey

dargis-etal-2018-use

Editors

Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

79-10-95546-00-9

Conference

Eleventh International Conference on Language Resources and Evaluation

Location

Miyazaki, Japan

Date

7 - 12 May 2018

Authors

RD
Roberts Darģis
IA
Ilze Auziņa
KL
Kristīne Levāne-Petrova

Links

URL

DOI