Back Issues

How Safe Are GPT-OSS-Safeguard Models?

Cisco, Wednesday, February 18th, 2026

Large language models (LLMs) have become essential tools for organizations, with open weight models providing additional control and flexibility for customizing models to their specific use cases. Last year, OpenAI released its gpt-oss series, including standard and, shortly after, safeguard variants, focused on safety classification tasks.

We decided to evaluate their raw security posture against adversarial inputs-specifically, prompt injection and jailbreak techniques that use procedures such as context manipulation, and encoding to bypass safety guardrails and elicit prohibited content. We evaluated four gpt-oss configurations in a black-box environment: the 20b and 120b standard models along with the safeguard 20b and 120b counterparts.

more → · More from Cisco →