Summary of the paper

Title How Specialized are Specialized Corpora? Behavioral Evaluation of Corpus Representativeness for Maltese.
Authors Jerid Francom, Amy LaCross and Adam Ussishkin
Abstract In this paper we bring to light a novel intersection between corpus linguistics and behavioral data that can be employed as an evaluation metric for resources for low-density languages, drawing on well-established psycholinguistic factors. Using the low-density language Maltese as a test case, we highlight the challenges that face researchers developing resources for languages with sparsely available data and identify a key empirical link between corpus and psycholinguistic research as a tool to evaluate corpus resources. Specifically, we compare two robust variables identified in the psycholinguistic literature: word frequency (as measured in a corpus) and word familiarity (as measured in a rating task). We then apply statistical methods to evaluate the extent to which familiarity ratings predict corpus frequency for verbs in the Maltese corpus from three angles: 1) token frequency, 2) frequency distributions and 3) morpho-syntactic type (binyan). This research provides a multidisciplinary approach to corpus development and evaluation, in particular for less-resourced languages that lack a wide access to diverse language data.
Topics Validation of LRs, Cognitive methods, Corpus (creation, annotation, etc.)
Full paper How Specialized are Specialized Corpora? Behavioral Evaluation of Corpus Representativeness for Maltese.
Slides How Specialized are Specialized Corpora? Behavioral Evaluation of Corpus Representativeness for Maltese.
Bibtex @InProceedings{FRANCOM10.666,
  author = {Jerid Francom and Amy LaCross and Adam Ussishkin},
  title = {How Specialized are Specialized Corpora? Behavioral Evaluation of Corpus Representativeness for Maltese.},
  booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA