Issue Detection and Category Classification in Domain-Specific Technical Logbooks
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
Operating large-scale research infrastructures such as free-electron lasers produces vast amounts of operator-authored documentation that records daily observations, anomalies, and maintenance actions. These logbooks and incident reports contain valuable operational knowledge but often remain underexplored due to their unstructured, domain-specific language. While large language models (LLMs) show strong generalization in general domains, their effectiveness on such technical operator text has, to the best of our knowledge, not been systematically assessed. We introduce two new English datasets from real-world laser operations: (i) a logbook dataset annotated for binary issue detection (does an entry describe or report an actionable fault?), and (ii) an operator ticket dataset annotated for multi-class issue categorization assign each ticket to one of 13 technical categories). The corpora comprise 2,979 logbook entries and 758 tickets from 2022–2024; both are cleaned, anonymized, and suitable for benchmarking classification performance. We evaluate four open LLMs (LLaMA-3, Mistral-Small, Qwen-3-30B, GPT-OSS-120B) under zero-shot, few-shot, and chain-of-thought (CoT) prompting, using multiple semantically equivalent prompt variants per setting to assess robustness. Across both tasks, few-shot prompting is consistently strongest, with top systems reaching F1 approx 0.84 for logbook issue detection and Macro-F1 0.42 for operator ticket categorization. These results suggest that incorporating a handful of in-domain examples can substantially improve performance on operator-authored technical text, even without fine-tuning.