Back to Main Conference 2022
LREC 2022main

Surfer100: Generating Surveys From Web Resources, Wikipedia-style

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/2oz2pmxyiqwp

Abstract

Fast-developing fields such as Artificial Intelligence (AI) often outpace the efforts of encyclopedic sources such as Wikipedia, which either do not completely cover recently-introduced topics or lack such content entirely. As a result, methods for automatically producing content are valuable tools to address this information overload. We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation. We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys. This is the first study on utilizing web resources for long Wikipedia-style summaries to the best of our knowledge.

Details

Paper ID
lrec2022-main-576
Pages
pp. 5388-5392
BibKey
li-etal-2022-surfer100
Editors
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis2020
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 - 25 June 2022

Authors

  • IL

    Irene Li

  • AF

    Alex Fabbri

  • RK

    Rina Kawamura

  • YL

    Yixin Liu

  • XT

    Xiangru Tang

  • JT

    Jaesung Tae

  • CS

    Chang Shen

  • SM

    Sally Ma

  • TM

    Tomoe Mizutani

  • DR

    Dragomir Radev

Links