Linking Rationale to Decision on Internet Standards: A Retrieval-Based Approach Using Synthetic Data
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
The Internet Engineering Task Force (IETF) develops Internet-Drafts (I-Ds) and Requests for Comments (RFCs) as formal specifications for Internet Protocols. While these documents capture finalized technical standards, the rich design rationales and deliberations that shape them are often buried in informal discussions across mailing lists. These discussions are rarely linked explicitly to the specifications they inform, making it difficult to trace the origins of specific design decisions. We address this gap by generating synthetic data that explicitly links discussion threads to their corresponding RFC/I‑D sections, producing roughly 350 000 such aligned instances. This data enables training a semantic embedding-based information retrieval (IR) system that, given an email discussion, retrieves the most relevant specification content. Our experiments show that this synthetic supervision helps models learn associations between informal discourse and formal documentation, though the task remains challenging due to the implicit and context-dependent nature of the links.