Back to Main Conference 2000
LREC 2000main

A Self-Expanding Corpus Based on Newspapers on the Web

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/497eeuzhtrvq

Abstract

A Unix-based system is presented which automatic collects newspaper articles from the web, converts the texts, and includes these texts in a newspaper corpus. This corpus can be searched from a web-browser. The corpus is currently 70 millions words and increases by 4 millions words each month.

Details

Paper ID
lrec2000-main-268
Pages
N/A
BibKey
hofland-2000-self
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • KH

    Knut Hofland

Links