Incivility and Rigidity: Evaluating the Risks of Fine-Tuning LLMs for Political Argumentation
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
Incivility on platforms such as Twitter (now X) and Reddit complicates the development of AI systems that can support productive, rhetorically sound political argumentation. We present experiments with GPT-3.5 Turbo fine-tuned on two contrasting datasets of political discourse: high-incivility Twitter replies to U.S. Congress and low-incivility posts from Reddit’s r/ChangeMyView. Our evaluation examines how data composition and prompting strategies affect the rhetorical framing and deliberative quality of model-generated arguments. Results show that Reddit-finetuned models generate safer but rhetorically rigid arguments, while cross-platform fine-tuning amplifies adversarial tone and toxicity. Prompt-based steering reduces overt toxicity (e.g., personal attacks) but cannot fully offset the influence of noisy training data. We introduce a rhetorical evaluation rubric—covering justification, reciprocity, alignment, and authority—and provide implementation guidelines for authoring, moderation, and deliberation-support systems.