Back to Main Conference 2002
LREC 2002main
Experiments in Topic Detection
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)
Abstract
Dividing documents into topically-coherent units and discovering their topic might have many uses. We present a system that proceeds in two steps: (1) the input text is segmented at places where there is a probable topic shift, (2) lexical chains are extracted from each segment as indicators of its topic. Two implementations, based on public domain resources, are presented: one based on WordNet and the second one based on Roget's thesaurus. An evaluation of the algorithm shows that lexical chains are acceptable as topic indicator with $44.5%$ of precision and $63.8%$ of recall.