Back to Main Conference 2022
LREC 2022main

Learning to Prioritize: Precision-Driven Sentence Filtering for Long Text Summarization

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/4qmkoss8kux4

Abstract

Neural text summarization has shown great potential in recent years. However, current state-of-the-art summarization models are limited by their maximum input length, posing a challenge to summarizing longer texts comprehensively. As part of a layered summarization architecture, we introduce PureText, a simple yet effective pre-processing layer that removes low- quality sentences in articles to improve existing summarization models. When evaluated on popular datasets like WikiHow and Reddit TIFU, we show up to 3.84 and 8.57 point ROUGE-1 absolute improvement on the full test set and the long article subset, respectively, for state-of-the-art summarization models such as BertSum and BART. Our approach provides downstream models with higher-quality sentences for summarization, improving overall model performance, especially on long text articles.

Details

Paper ID
lrec2022-main-033
Pages
pp. 313-318
BibKey
mei-etal-2022-learning
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • AM

    Alex Mei

  • AK

    Anisha Kabir

  • RB

    Rukmini Bapat

  • JJ

    John Judge

  • TS

    Tony Sun

  • WW

    William Yang Wang

Links