Contrastively Pre-trained Event Embeddings with Schema-free LLM Annotations
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
Event extraction is a notoriously challenging problem, among others due to the scarcity of suitable training data. Moreover, event-centric knowledge bases are not available for most domains, making traditional distant supervision strategies difficult to implement. In this paper, we evaluate the potential of using LLM-generated annotations as an alternative distant supervision signal. Specifically, we create a synthetically labelled event extraction corpus, using an LLM to identify event triggers and arguments, and to provide corresponding free-text descriptions. We then pre-train event embedding models on this corpus using a contrastive loss, before fine-tuning them in the usual way. We empirically show the effectiveness of this approach.