Back to Main Conference 2026
LREC 2026main

How I Met Your Snowclone: Unsupervised Discovery of Snowclone Patterns in Large Datasets

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/5iuorx8jxpiw

Abstract

Snowclones are a type of Multiword Expression (MWE) pattern that includes open slots, i.e. positions that can be filled with various words. For example, in the phrase "May the X be with you," the slot X can be replaced with virtually any noun. A key feature of snowclones is that the original MWE remains recognizable, carrying its meaning into the new form. However, previous work has not shown whether such substitutions are limited to fixed positions. In practice, variations such as "May the force bee with you" are also possible. In this paper, we propose to use Locality Sensitive Hashing (LSH) to automatically extract snowclone patterns from the non-commercial IMDb dataset. This process results in the creation of the FROST lexicon, comprising 29,011 pattern candidates and 991,626 snowclone candidates distributed in 29 languages. We then annotate 1,500 discovered patterns and 1,000 snowclones from the FROST lexicon to assess its quality. Our findings suggest that (i) most substitutions in snowclones occur at consistent positions and (ii) snowclones can be reliably discovered at scale using LSH and similarity-based metrics. This work provides the first large-scale lexicon of snowclone-based MWEs and a method that can support future research on MWEs and snowclones discovery.

Details

Paper ID
lrec2026-main-622
Pages
pp. 7829-7844
BibKey
bezanon-etal-2026-how
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • JB

    Julien Bezançon

  • GL

    Gaël Lejeune

  • MH

    Marceau Hernandez

Links