Summary of the paper

Title Certification and Cleaning up of a Text Corpus: Towards an Evaluation of the “Grammatical” Quality of a Corpus
Authors Cyril Grouin
Abstract We present in this article the methods we used for obtaining measures to ensure the quality and well-formedness of a text corpus. These measures allow us to determine the compatibility of a corpus with the treatments we want to apply on it. We called this method “certification of corpus”. These measures are based upon the characteristics required by the linguistic treatments we have to apply on the corpus we want to certify. Since the certification of corpus allows us to highlight the errors present in a text, we developed modules to carry out an automatic correction. By applying these modules, we reduced the number of errors. In consequence, it increases the quality of the corpus making it possible to use a corpus that a first certification would not have admitted.
Language Single language
Topics Corpus (creation, annotation, etc.), Lexicon, lexical database, Grammars
Full paper Certification and Cleaning up of a Text Corpus: Towards an Evaluation of the “Grammatical” Quality of a Corpus
Slides -
Bibtex @InProceedings{GROUIN08.280,
  author = {Cyril Grouin},
  title = {Certification and Cleaning up of a Text Corpus: Towards an Evaluation of the “Grammatical” Quality of a Corpus},
  booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
  year = {2008},
  month = {may},
  date = {28-30},
  address = {Marrakech, Morocco},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-4-0},
  note = {http://www.lrec-conf.org/proceedings/lrec2008/},
  language = {english}
  }

Powered by ELDA © 2008 ELDA/ELRA