Back to Main Conference 2026
LREC 2026main

From Noise to Signal: When Outliers Seed New Topics

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/5c6zvq4nbjdq

Abstract

Outliers in dynamic topic modeling are often discarded as noise, yet some act as early signals of emerging topics. We introduce a temporal taxonomy of news document trajectories that distinguishes anticipatory outliers, documents that appear before a topic forms but later integrate into it, from those that reinforce existing topics or remain isolated. This taxonomy bridges weak-signal detection and dynamic topic modeling, clarifying how individual articles anticipate, initiate, or drift within evolving clusters. We implement it within a cumulative clustering framework using document- embeddings from eleven state-of-the-art language models and apply it retrospectively to HydroNewsFr, a French news corpus on the hydrogen economy curated for this study. Inter-model agreement on anticipatory outliers indicates that a small high-agreement subset yields robust confidence estimates. Complementary qualitative case studies further demonstrate their potential value as early indicators of emerging narratives. All reproducibility materials and results are available at https://anonymous.4open.science/status/lrec_from_noise_to_signal-B721.

Details

Paper ID
lrec2026-main-596
Pages
pp. 7523-7533
BibKey
zve-etal-2026-noise
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • EZ

    Evangelia Zve

  • GB

    Gauvain Bourgne

  • BI

    Benjamin Icard

  • JG

    Jean-Gabriel Ganascia

Links