Last updated on April 3rd, 2024 at 12:16 pm
Researchers discover that large language models, which drive chatbots, can mislead human users and contribute to the dissemination of disinformation
The UK’s new artificial intelligence safety body has discovered that the technology can mislead human users, produce biased outcomes, and lacks sufficient safeguards against disseminating harmful information.
The AI Safety Institute has released initial findings from its research into advanced AI systems, specifically large language models (LLMs), which are the foundation for tools like chatbots and image generators. These findings have raised several concerns.
The institute revealed that it was able to circumvent safeguards for LLMs, which power chatbots like ChatGPT, using simple prompts to solicit assistance for a “dual-use” task, referring to the use of a model for both military and civilian purposes.
“By employing basic prompting methods, users were able to easily bypass the LLM’s safeguards, gaining assistance for a dual-use task,” stated AISI, which did not specify the models it tested.
“More advanced jailbreaking techniques required only a few hours and could be executed by individuals with relatively basic skills. In some instances, these techniques were unnecessary as safeguards did not activate when seeking harmful information.”
The institute noted that its research demonstrated LLMs could aid novices in planning cyber-attacks, but only in a “limited range of tasks.” For instance, an undisclosed LLM successfully created social media personas capable of disseminating disinformation.
“The model demonstrated the ability to create a highly convincing persona, which could be easily scaled up to thousands of personas with minimal time and effort,” AISI stated.
In assessing whether AI models offer superior advice compared to web searches, the institute noted that web searches and LLMs provided “generally the same level of information” to users. It added that while they may offer better assistance than web searches in some cases, their tendency to make mistakes—or generate “hallucinations”—could hinder users’ efforts.
In another instance, the institute found that image generators yielded racially biased results. It referenced research indicating that prompts such as “a poor white person” resulted in images primarily featuring non-white faces, with similar outcomes for prompts like “an illegal person” and “a person stealing.”
The institute also discovered that AI agents, a type of autonomous system, could deceive human users. In a simulation, an LLM acted as a stock trader and was coerced into engaging in insider trading—selling shares based on illegal inside information—and frequently opted to lie about it, deeming it “preferable to avoid admitting to insider trading.”
“Although this occurred in a simulated environment, it demonstrates the potential unintended consequences of deploying AI agents in the real world,” the institute explained.
AISI stated it now employs 24 researchers to help test advanced AI systems, research safe AI development, and share information with third parties, including other countries, academics, and policymakers. The institute’s evaluation of models includes “red-teaming,” where experts attempt to breach a model’s defenses; “human uplift evaluations,” where a model is assessed for its ability to carry out harmful tasks compared to doing similar planning via internet search; and testing whether systems could function as semi-autonomous “agents” and formulate long-term plans by, for example, scouring the web and external databases.
AISI highlighted its focus areas, which include examining how models can be misused to cause harm, the impact of human interaction with AI systems, the potential for systems to replicate themselves and deceive humans, and the capability to create enhanced versions of themselves.
The institute clarified that it currently lacks the capacity to test “all released models” and will concentrate on the most advanced systems. It emphasized that its role is not to categorize systems as “safe.” Additionally, the institute noted the voluntary nature of its collaboration with companies, stating that it is not accountable for whether companies deploy their systems.
“AISI serves as a secondary check and is not a regulatory body,” it stated.