Disentangling Approaches to Conversation Disentanglement: Fine-Tune or Learn from Scratch?

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

Conversation disentanglement is the process of segmenting a stream of messages or utterances into separate conversations or "threads" that can be more easily understood and processed. We compare the performance of GPT-4o and GPT-4o Mini with deep learning models built from scratch for this task. We show that, using the same amount of training data, out-of-the-box GPT-4o performs poorly, and fine-tuning GPT-4o Mini results in performance comparable to learning small-size models from scratch (based on standard hand-crafted features for this task), with performance reaching 74.4% F1-score for prediction of links between messages and 45.3% F1-score for prediction of perfectly matching conversations. However, the fine-tuned GPT-4o Mini model underperforms when compared to models that utilize complex structural information. We also provide a new method for detailed analysis of the successes and failures of our models, and a new visualization method.