Back to Main Conference 2000
LREC 2000main

Term-based Identification of Sentences for Text Summarisation

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/3n4mvtufeet3

Abstract

The present paper describes a methodology for automatic text summarisation of Greek texts which combines terminology extraction and sentence spotting. Since generating abstracts has proven a hard NLP task of questionable effectiveness, the paper focuses on the production of a special kind of abstracts, called extracts: sets of sentences taken from the original text. These sentences are selected on the basis of the amount of information they carry about the subject content. The proposed, corpus-based and statistical approach exploits several heuristics to determine the summary-worthiness of sentences. It actually uses statistical occurrences of terms (TF· IDF formula) and several cue phrases to calculate sentence weights and then extract the top scoring sentences which form the extract.

Details

Paper ID
lrec2000-main-080
Pages
N/A
BibKey
georgantopoulos-piperidis-2000-term
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • BG

    Byron Georgantopoulos

  • SP

    Stelios Piperidis

Links