Request Correction
Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.
Correction Guidelines
- Click the edit button next to a field to report a correction.
- Fill in the suggested correction value for each field you want to correct.
- Provide your name and email so we can contact you if needed.
Paper Information
Towards Knowledge Graph-Grounded Evaluation of Agentic LLMs on Cybersecurity Capture-the-Flag Challenges
Paper Fields
Click the edit button next to a field to report a correction.
Towards Knowledge Graph-Grounded Evaluation of Agentic LLMs on Cybersecurity Capture-the-Flag Challenges
Evaluating Large Language Model (LLM) agents on complex multi-step cybersecurity tasks requires structured, reproducible evaluation rubrics. We present BraceGreen, a framework that formalizes Capture-the-Flag (CTF) attack paths as knowledge graphs and uses them as gold-standard rubrics for agentic LLM evaluation. Each node in our knowledge graphs represents an attack step annotated with MITRE ATT&CK tactics, goals, commands, expected outputs, and semantic outcomes, while edges encode prerequisites, dependencies, and alternative paths. Our LangGraph-based evaluation workflow employs LLM-as-judge with chain-of-thought reasoning to semantically compare agent predictions against knowledge graph-encoded alternatives. We contribute a benchmark of 7 CTF machines with knowledge graph annotations, three evaluation modes (command prediction, goal inference, anticipated result), and integration with live machine infrastructure via virtual machines and a MCP server. Our approach bridges the gap between unstructured CTF writeups and graph-structured evaluation rubrics.
Authors
Expand an author to correct their information. Use the remove button to request author removal, or add a new author.
PDF Attachment
You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.
Your Information
Author Declaration *
Select at least one field to correct using the edit buttons above.