Back to Main Conference 2000
LREC 2000main
A Self-Expanding Corpus Based on Newspapers on the Web
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)
Abstract
A Unix-based system is presented which automatic collects newspaper articles from the web, converts the texts, and includes these texts in a newspaper corpus. This corpus can be searched from a web-browser. The corpus is currently 70 millions words and increases by 4 millions words each month.