Re-using High-quality Resources for Continued Evaluation of Automated Summarization Systems

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

Abstract

In this paper we present a method for re-using the human judgements on summary quality provided by the DUC contest. The score to be awarded to automatic summaries is calculated as a function of the scores assigned manually to the most similar summaries for the same document. This approach enhances the standard n-gram based evaluation of automatic summarization systems by establishing similarities between {\it extractive} (vs. {\it abstractive}) summaries and by taking advantage of the big quantity of evaluated summaries available from the DUC contest. The utility of this method is exemplified by the improvements achieved on a headline production system.