What We Learned Testing Frontier AI Security Models Against Our Own Code
Broadcom, Thursday, April 23rd, 2026
The Broadcom Infrastructure Software Group has spent the past several weeks testing the latest generation of frontier AI models against some of our own production code. We want to share what we learned because the implications matter for every organization that depends on software.
Our ultimate findings were jolting, but the team's initial impressions could be best described as "impressive but not groundbreaking," We provided the models with source code and asked them to find vulnerabilities.
Most of what they found on a first pass was not exploitable in a production context: real findings, but without the kind of operational grounding that distinguishes a defect from an exploitable vulnerability. If we had stopped there, we might have concluded the hype had outrun the reality. Recent third-party benchmarking shows progress against capture-the-flag scenarios is linear rather than exponential - consistent with our initial read.