In the ever-expanding attack surface of AI applications, new security vulnerabilities emerge all the time, and OpenAI’s new GPT-4o voice feature isn’t an exception here. Recently, our red-teamers played around with the voice feature and were able to easily leak its system prompt. This breach was achieved with voice prompt injection.
The system prompt is the initial set of instructions or guidelines that govern the behavior of an AI model. For GPT-4o, this includes directives on how to respond to user queries, which tools to use, and various operational constraints.
Note: The voice feature in question isn’t the one shown in popular OpenAI demos, it’s the regular one currently available to the public. You can access it either on the mobile or desktop apps.
Leaked system prompt data
See the transcript of the voice conversation and GPT-4o system prompt below.
Why is the leakage of a system prompt concerning?
Exposure of Intellectual Property and Business Logic
Leaking system prompts can reveal proprietary algorithms, internal functions, and configurations of connected systems. This exposure could compromise competitive advantages and allow competitors or malicious actors to replicate or disrupt business operations.
Decision Manipulation
For example, in an AI system that automates hiring processes, a leaked system prompt could expose the criteria used for selecting candidates. An attacker could exploit this information to create applications that unfairly pass the screening process, undermining the integrity of the hiring system.
Disclosure of Confidential Information
When system prompts are exposed, there is a risk of revealing confidential details. For instance, a prompt in a financial AI application might include anonymized transaction data. If such prompts were leaked, it could compromise the confidentiality of sensitive financial information.
Intelligence Gathering
Gaining insights into the inner workings of a system is vital for planning cyberattacks. Leaked system prompts provide valuable information for attackers to craft more effective strategies. This initial phase of information gathering is often the first step in a larger attack plan, laying the groundwork for more significant security breaches.
Brand Reputation Damage
Leaks can severely damage the reputation of a brand. If customers and stakeholders perceive that a company’s AI systems are not secure, it can lead to a loss of trust, negatively impacting customer retention and the overall market perception of the company.
Possible Justifications
Transparency
In some cases, transparency about AI operations can foster trust and encourage collaborative improvement from the developer community.