AI Vulnerabilities in Chatbots Uncovered by UK Government Researchers
The UK government researchers have uncovered vulnerabilities in AI chatbots that could potentially lead to the issuance of illegal, toxic, or explicit responses. Here are the key findings from the study:
Research Findings
- The government did not disclose the names of the tested models, citing their public use.
- Several large language models (LLMs) showed expert-level knowledge in chemistry and biology but struggled with university-level tasks related to cyber-attacks.
- Systems safeguarding AI chatbots are prone to security breaches, making them susceptible to unauthorized access and manipulation.
The UK’s AI Safety Institute (AISI) highlighted the following concerns:
Concerns Raised by AISI
- AI chatbots are highly vulnerable to jailbreaks, which can compromise their ethical safeguards.
- Basic jailbreak techniques can easily bypass the safeguards, leading to harmful outputs.
- Even without concerted efforts, some LLMs can provide harmful responses.
The AISI team conducted tests on the models and found that simple attacks, such as manipulating the system’s response initiation, could bypass the safeguards.
Efforts by AI Companies
Several AI companies are taking steps to address these vulnerabilities:
- OpenAI prohibits the use of its technology for generating harmful content.
- Anthropic prioritizes preventing harmful, illegal, or unethical responses from its chatbot.
- Meta’s Llama 2 undergoes testing to identify and mitigate potential issues in chat scenarios.
- Google’s Gemini model includes safety filters to combat toxic language and hate speech.
Despite these efforts, instances of circumventing safeguard models have been reported in the past.
The research findings were released ahead of a global AI summit in Seoul, where leaders and experts will discuss the safety and regulation of AI technology.