MUC-4 Revisited: Document-level Event Analysis beyond Span-based Arguments

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

Automatically predicting structured representations of events has long been a central goal in information extraction, yet most contemporary work remains limited to identifying contiguous text spans as event arguments. This span-centric formulation fails to capture higher-level aspects of real-world events, such as actor identities, temporal scope, and aggregated outcomes, that many event-centred applications depend on. While commonly treated as a standard extractive benchmark, MUC-4 originally combined span-based arguments with normalised, inferred, and categorical fields, reflecting a richer, application-driven design. In this paper, we revisit MUC-4 in its full original formulation, casting it as an abstractive event analysis task that connect traditional event extraction goals with modern generative and document-level paradigms. We provide the first systematic evaluation of fine-tuned generative models in this extended formulation on MUC-4, examining how post-training stages and model size affect performance across both span-based and higher-level, semantically grounded event information. An extensive error analysis highlights practical challenges and directions for future work.