Joint Identification and Induction of Semantic Frames with Scalable Semi-Supervised Graph Clustering
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
Current methods for automatically assigning frames to their evoking words can be divided into frame identification and frame induction. In frame identification, frame names coming from a labeled dataset are assigned to unseen instances, a classical supervised labeling task. However, the training datasets are known to be incomplete in terms of real-world frames, resulting in an issue with potentially new frame labels. In frame induction, instances are clustered regarding the frames they evoke, a classical unsupervised clustering task. However, existing training data is not used to identify known frames. To overcome these shortcomings, we propose to use semi-supervised clustering for combined frame identification and frame induction. By using constrained clustering with hard constraints coming from labeled data, the resulting clusters contain only labeled instances with the same label. Thus, frame names can be easily assigned. We show for English and German datasets that using semi-supervised clustering improves the quality of frame induction compared to unsupervised clustering methods and results in notably good performance regarding frame identification.