SENSEI-ASG: A Challenging Dataset for Argument Summary Graph Parsing
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
We create, and make publicly available, a novel dataset for the task of Argument Summary Graph Parsing (ASGP), which we call SENSEI-ASG, based on annotating a subset of the SENSEI corpus. Given an argumentative dialogue, such as might be found in a social media exchange, ASGP is the task of creating an Argument Summary Graph, a data structure which consists of nodes containing summaries of arguments in a dialogue, and edges showing argumentative relations between them. We find that the only existing ASG dataset, Debatabase-ASG, is not representative of online debates in language use, length of the dialogues, or graph complexity. In contrast to Debatabase-ASG, which was created based on a curated debate collection, SENSEI-ASG contains examples of spontaneous debates arising in the comments sections of an online newspaper (namely, The Guardian). We achieve moderate inter-annotator agreement on the dataset, with a Cohen’s kappa of k=0.57, reflecting the inherent challenges in distinguishing argumentative from non-argumentative text. We propose baselines for the new dataset by fine-tuning Llama-3 for the ASGP task, using the two ASGP datasets and an additional out-of-domain argument mining dataset, the AAEC.