Back to Main Conference 2004
LREC 2004main
Categorizing Web Pages as a Preprocessing Step for Information Extraction
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)
Abstract
At present, information systems combining crawling and information extraction (IE) technologies acquire a lot of research and industrial interest. In this paper, we present an algorithm that exploits techniques for unsupervised IE pattern acquisition in order to facilitate identification of web pages containing information relevant to the IE task.