Back to Main Conference 2008
LREC 2008main
Cleaneval: a Competition for Cleaning Web Pages
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)
Abstract
Cleaneval is a shared task and competitive evaluation on the topic of cleaning arbitrary web pages, with the goal of preparing web data for use as a corpus for linguistic and language technology research and development. The first exercise took place in 2007. We describe how it was set up, results, and lessons learnt