Back to Main Conference 2016
LREC 2016main

Coh-Metrix-Esp: A Complexity Analysis Tool for Documents Written in Spanish

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/2bcphssjucvk

Abstract

Text Complexity Analysis is an useful task in Education. For example, it can help teachers select appropriate texts for their students according to their educational level. This task requires the analysis of several text features that people do mostly manually (e.g. syntactic complexity, words variety, etc.). In this paper, we present a tool useful for Complexity Analysis, called Coh-Metrix-Esp. This is the Spanish version of Coh-Metrix and is able to calculate 45 readability indices. We analyse how these indices behave in a corpus of “simple” and “complex” documents, and also use them as features in a complexity binary classifier for texts in Spanish. After some experiments with machine learning algorithms, we got 0.9 F-measure for a corpus that contains tales for kids and adults and 0.82 F-measure for a corpus with texts written for students of Spanish as a foreign language.

Details

Paper ID
lrec2016-main-745
Pages
pp. 4694-4698
BibKey
quispesaravia-etal-2016-coh
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • AQ

    Andre Quispesaravia

  • WP

    Walter Perez

  • MS

    Marco Sobrevilla Cabezudo

  • FA

    Fernando Alva-Manchego

Links