Open source tool to safeguard against AI model jailbreaks

As artificial intelligence (AI) models continue to revolutionise various industries through enhanced customer interactions and automation, they simultaneously introduce new security challenges that many organisations are ill-equipped to handle. According to IBM, AI jailbreaks occur when hackers exploit vulnerabilities in AI systems to bypass their ethical guidelines and perform restricted actions. They use common AI jailbreak techniques, such as prompt injections and roleplay scenarios.

Put another way, jailbreaking attacks can be used to alter model behaviour and benefit the attacker. If not properly controlled, business entities can face fines, reputational harm, and other legal consequences.

In 2023, researchers at Carnagie Mellon University, The Center for AI Safety, and the Bosch Center for AI, claim to have discovered a simple prompt addendum that allowed the researchers to trick models into generating biased, false, and otherwise information. The attacks were shown to be effective in ChatGPT, Google Bard, Meta's LLaMA, Anthropic's Claude, and other open source products.

CyberArk claims that its FuzzyAI has successfully jailbroken every major AI model tested, providing a vital tool for identifying and mitigating risks associated with guardrail bypassing and harmful output generation in both cloud-hosted and in-house AI systems.

At the core of FuzzyAI is a robust fuzzer—a tool designed to reveal software defects and vulnerabilities—capable of employing over ten distinct attack techniques. These techniques include methods for bypassing ethical filters and exposing hidden prompts within the systems. Key features of FuzzyAI include comprehensive fuzzing capabilities that probe AI models to identify vulnerabilities such as guardrail bypassing, information leakage, and harmful output generation.

Peretz Regev, chief product officer at CyberArk, stated, “The launch of FuzzyAI underlines CyberArk’s commitment to AI security and helps organisations take a significant step forward in addressing the security issues inherent in the evolving landscape of AI model usage. FuzzyAI has demonstrated the ability to jailbreak every major tested AI model. This empowers organisations and researchers to identify weaknesses and actively fortify their AI systems against emerging threats.”

FuzzyAI also features an extensible framework, allowing organisations and researchers to add their own attack methods tailored to specific domain vulnerabilities. This flexibility, combined with a community-driven ecosystem, ensures that FuzzyAI evolves alongside emerging adversarial techniques and defence mechanisms.

As the significance of AI security continues to grow, CyberArk's FuzzyAI represents an advancement for organisations seeking to enhance their resilience against sophisticated AI threats, ultimately contributing to safer AI development and deployment.

Tags: AI jailbreak CyberArk

Open source tool to safeguard against AI model jailbreaks

FutureCISO Editors

Recent Posts

Categories

Strategic Insights for Chief Information Officers

Cxociety Media Brands

Categories

Retrieve your password