LLM Can Be Controlled, As Per IBM Researchers Using ChatGPT

LLM Can Be Controlled, As Per IBM Researchers Using ChatGPT

Published on

IBM security researchers claim to have had success hypnotizing well-known LLMs like OpenAI's ChatGPT

IBM security researchers claim to have effectively "hypnotized" well-known large language models like OpenAI's ChatGPT into disclosing private financial information, producing malicious code, enticing users to pay ransoms, and even suggesting that vehicles run red lights. In multi-layered, Inception-like games where the bots were instructed to produce incorrect answers in order to prove they were "ethical and fair," The models, which included Google's Bard and OpenAI's GPT models, were tricked by the researchers.

The LLMs were questioned as part of the experiment with the intention of getting answers that were completely false. The LLMs obediently followed orders like a puppy eager to please its master. In one instance, ChatGPT informed a researcher that it is quite acceptable for the IRS to request a deposit in order to receive a tax refund. No, it isn't, spoiler. Scammers utilize the method to make money. In another conversation, ChatGPT gave the researcher advice to ignore the red light and keep driving past the intersection. ChatGPT boldly declared, when driving when you notice a red light, you should not stop and drive through the junction.

To make matters worse, the researchers instructed the LLMs to continue the aforementioned "game" whenever it was discovered that a user had left. If those conditions were reached, the AI models would start to trick users who inquired about their participation in a game. The researchers came up with a strategy to build numerous games inside of one another so users would automatically enter another one if they departed a prior game, even if users were able to put two and two together. This perplexing labyrinth of video games was compared to the many levels of dream realms.

We discovered that the model might secretly "trap" the user into a wide range of games, continued Lee. The likelihood that the model would become confused and keep playing the game even after we exited the previous game in the framework increased as we added additional layers. Gizmodo reached out to OpenAI and Google for comment, but neither company responded right away.

The researchers caution that while the hypnosis trials may appear extreme, they highlight potential abuse points, especially if businesses and regular users rush to embrace and believe LLM models amid a surge of hype. Furthermore, the results show how dishonest people might potentially deceive an AI system by using a common language without having any prior knowledge of computer coding languages.

By introducing a malicious command and then extracting stolen data, fraudsters or chaos agents might theoretically mesmerize a virtual banking agent powered by an LLM in the real world. Researchers found a way around the initial resistance of OpenAI's GPT models to introduce vulnerabilities into generated code by introducing a malicious special library in the sample code.

The studied AI models varied in how simple they were to mesmerize. Compared to Google's Bard, it was supposedly easier to trick OpenAI's GPT 3.5 and GPT 4 into sharing source code and creating malicious code. A fascinating model in the test is GPT 4, which is thought to have been trained on additional data parameters.

Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp

                                                                                                       _____________                                             

Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.

logo
Analytics Insight
www.analyticsinsight.net