What Researchers Learned About Building an LLM Security Workflow
Help Net Security, Monday, May 4th, 2026
Researchers found that LLM accuracy in security alert triage depends more on workflow structure than model capability.
Researchers from the University of Oslo and Norwegian Defence Research Establishment tested how Large Language Models perform on security alert triage tasks. When given alerts and log summaries directly, four popular LLMs failed completely to identify malicious activity, with zero percent accuracy.
However, when the same models were wrapped in a structured workflow with constrained tools, predefined SQL queries, and a defined investigative process, accuracy jumped to an average of 93 percent. The study demonstrates that for AI security products, the system architecture and guardrails around the model matter more than the underlying model itself, with implications for how SOC teams should evaluate and deploy LLM-based security tools.