LoveHate: Stance Detection and Generation for Multiple Topics in User-generated Comments in Russian and English
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
This paper introduces LoveHate, a new multi-topic corpus of user-generated arguments in Russian, collected from the historical data of the debate platform lovehate.ru. The dataset contains nearly 19,000 posts spanning 16 socially and politically relevant topics, each mapped to binary pro and con stances. We test multiple approaches to stance detection and stance generation across Russian and English data, including translated variants, using both classifier-based (Roberta, RuRoberta) and instruction-tuned generative (Llama, Qwen) models. Results demonstrate that language-specific pretraining yields the strongest performance for stance classification (F1 = 0.892 with RuRoberta), while multilingual generative models – when fine-tuned on sufficient data – can effectively generate stance in Russian without explicit Russian pretraining. Cross-domain experiments show that English datasets generalize better across corpora, whereas Russian data capture language- and culture-specific argumentation but are less effective for generalizable models. Generating topics remains a more challenging task for both Russian and English data. The dataset and accompanying results contribute to multilingual stance research and provide a valuable new resource for argument mining in Russian.