The latest GenAI Code Security Report from Veracode recognises a significant advancement in the realm of secure AI-generated code, pinpointing OpenAI’s GPT-5 models as the frontrunners. Veracode says while OpenAI achieves impressive results in code security, the broader industry appears stagnant, with many competitors failing to make substantial progress.
The report, which builds on findings released in July 2025, utilises an 80-task benchmark to assess large language models (LLMs) on their ability to produce secure code.
OpenAI’s GPT-5 Mini and standard GPT-5 models emerged as leaders in security performance, achieving 72% and 70% pass rates, respectively. These figures signify a noteworthy leap from previous generations, which averaged between 50% and 60%.
In stark contrast, other prominent AI players—including Anthropic, Google, Qwen, and xAI—continue to hover within the 50-59% range.
Recent outputs from models such as Anthropic’s Claude Sonnet 4.5 and Opus 4.1, Google Gemini 2.5 variations, and Qwen3 Coder have shown little to no improvement, with some scores declining, indicating that mere increases in model size or updated training data fall short of bolstering security.

The role of reasoning alignment
Veracode's analysis attributes OpenAI’s success to "reasoning alignment." This technique allows models to internally assess and refine outputs through multiple steps before finalising code generation.
The reasoning-enabled GPT-5 models performed significantly better than OpenAI’s non-reasoning variant, GPT-5-chat, which lagged at a 52% pass rate. This disparity highlights the efficacy of structured reasoning in identifying and circumventing insecure coding patterns.
Shifting focus to enterprise language security
A deeper dive into language-specific results reveals promising advancements in the security of C# and Java—two languages crucial for enterprise systems. This suggests a focused effort in enhancing security measures for high-stakes applications. However, many languages, including Python and JavaScript, exhibited stagnation in performance, with similar results to the prior benchmark session.
Across the board, key vulnerability classes remain stubbornly problematic. SQL injection has seen modest improvements, with newer models increasingly advocating secure coding practices, such as parameterised queries.
However, other vulnerabilities, including Cross-Site Scripting (XSS) and log injection, have shown dismal results, with pass rates under 14% and 12% respectively.
Strong results (>85%) were noted for cryptographic algorithms, but persistent low scores in XSS and log injection underscore a technical limitation. Many LLMs lack the contextual analysis required to flag untrusted data flows effectively.
Next steps for development teams

Jens Wessling, chief technology officer at Veracode, commented on these findings: “These results are a clear indication that the industry needs a more consistent approach to AI code safety.
While OpenAI’s reasoning-enabled models have meaningfully advanced secure code generation, security performance remains highly variable and far from sufficient industry-wide. Relying solely on model improvements is not a viable security strategy.”
To navigate this evolving landscape, Veracode recommends that development teams adopt a layered approach to application risk management:
- Opt for reasoning-enabled AI models where possible for enhanced code security.
- Implement continuous scanning and validation with Static Analysis and Software Composition Analysis, no matter the code source.
- Automate vulnerability remediation with solutions like Veracode Fix.
- Establish secure coding standards that encompass both AI-assisted and traditional coding practices.
- Proactively block malicious dependencies through tools such as Package Firewalls.
As the industry grapples with the challenges of secure AI-generated code, leaders in cybersecurity must remain vigilant and proactive in their strategies to protect their applications and data.
