Back to Main Conference 2016
LREC 2016main

SuperCAT: The (New and Improved) Corpus Analysis Toolkit

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/32935nt5a7p2

Abstract

This paper reports SuperCAT, a corpus analysis toolkit. It is a radical extension of SubCAT, the Sublanguage Corpus Analysis Toolkit, from sublanguage analysis to corpus analysis in general. The idea behind SuperCAT is that representative corpora have no tendency towards closure―that is, they tend towards infinity. In contrast, non-representative corpora have a tendency towards closure―roughly, finiteness. SuperCAT focuses on general techniques for the quantitative description of the characteristics of any corpus (or other language sample), particularly concerning the characteristics of lexical distributions. Additionally, SuperCAT features a complete re-engineering of the previous SubCAT architecture.

Details

Paper ID
lrec2016-main-442
Pages
pp. 2784-2788
BibKey
cohen-etal-2016-supercat
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • KC

    K. Bretonnel Cohen

  • WB

    William A. Baumgartner Jr.

  • IT

    Irina Temnikova

Links