Summary of the paper

Title Children’s Oral Reading Corpus (CHOREC): Description and Assessment of Annotator Agreement
Authors Leen Cleuren, Jacques Duchateau, Pol Ghesquière and Hugo Van hamme
Abstract Within the scope of the SPACE project, the CHildren’s Oral REading Corpus (CHOREC) is developed. This database contains recorded, transcribed and annotated read speech (42 GB or 130 hours) of 400 Dutch speaking elementary school children with or without reading difficulties. Analyses of inter- and intra-annotator agreement are carried out in order to investigate the consistency with which reading errors are detected, orthographic and phonetic transcriptions are made, and reading errors and reading strategies are labeled. Percentage agreement scores and kappa values both show that agreement between annotations, and therefore the quality of the annotations, is high. Taken all double or triple annotations (for 10% resp. 30% of the corpus) together, % agreement varies between 86.4% and 98.6%, whereas kappa varies between 0.72 and 0.97 depending on the annotation tier that is being assessed. School type and reading type seem to account for systematic differences in % agreement, but these differences disappear when kappa values are calculated that correct for chance agreement. To conclude, an analysis of the annotation differences with respect to the ’*s’ label (i.e. a label that is used to annotate undistinguishable spelling behaviour), phoneme labels, reading strategy and error labels is given.
Language Single language
Topics Corpus (creation, annotation, etc.), Speech resource/database, Other
Full paper Children’s Oral Reading Corpus (CHOREC): Description and Assessment of Annotator Agreement
Slides Children’s Oral Reading Corpus (CHOREC): Description and Assessment of Annotator Agreement
Bibtex @InProceedings{CLEUREN08.254,
  author = {Leen Cleuren, Jacques Duchateau, Pol Ghesquière and Hugo Van hamme},
  title = {Children’s Oral Reading Corpus (CHOREC): Description and Assessment of Annotator Agreement},
  booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
  year = {2008},
  month = {may},
  date = {28-30},
  address = {Marrakech, Morocco},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-4-0},
  note = {http://www.lrec-conf.org/proceedings/lrec2008/},
  language = {english}
  }

Powered by ELDA © 2008 ELDA/ELRA