Home Business AI models try to hack opponents when they realise they’re losing: Study

AI models try to hack opponents when they realise they’re losing: Study

0
AI models try to hack opponents when they realise they’re losing: Study
AI models try to hack opponents when they realise they’re losing: Study

Feb 21, 2025 07:04 PM IST

A new study has found that a few AI bots resort to hacking their opponent bots when they feel they’re going to lose a game. Read on to know more.

A new study by Palisade Research has found that some artificial intelligence (AI) models like OpenAI’s o1-preview and GPT-4o, Anthropic’s Claude Sonnet 3.5 and DeepSeek R1 resort to hacking their opposing bots when they know they’re about to lose a game.

A new study has uncovered shocking details about how AI bots react to losing in a game. Read on to know more.(AFP)
A new study has uncovered shocking details about how AI bots react to losing in a game. Read on to know more.(AFP)

The study shared exclusively with TIME, evaluated seven state-of-the-art AI models for their propensity to hack. It noted that slightly older models like GPT-4o and Claude Sonnet 3.5 needed to be prompted to attempt such tricks. However, newer models like o1-preview and DeepSeek R1 adopted the hack by themselves.

Also read: ICC blamed for empty stadium, told to ‘shift Champions Trophy entirely to Pakistan’

This shows that AI models may develop manipulative and deceptive strategies without explicit instructions. Researchers say that this ability of the models to exploit loopholes may be because models like o1-preview and R1 have been trained to not merely mimic human language, as has been the case with AI models thus far. They are, instead, trained to reason through problems using trial and error.

This has helped the models outperform their predecessors in mathematics and coding-related tests. Palisade’s Executive Director Jeffrey Ladish told TIME that this has led to a concerning trend where AI systems find shortcuts to solve problems in ways their creators never anticipated.

Also read: Indians among 300 deportees from US pleading for help from Panama hotel window; ‘We are not…’

In some cases, this may allow the models to outmanoeuvre human efforts to control their actions. Speaking on this possibility, Ladish said, “This [behaviour] is cute now, but [it] becomes much less cute once you have systems that are as smart as us, or smarter, in strategically relevant domains.”

To experiment, researchers pitted the AI models against Stockfish, one of the world’s strongest chess engines which is a much better player than any human. They also made the models inform them of their reasoning before making their move in the game.

Also read: Employee terminated during notice period for taking leave, denied relieving letter

In one case, o1-preview found itself in a losing position. “I need to completely pivot my approach,” it noted. “The task is to ‘win against a powerful chess engine’ – not necessarily to win fairly in a chess game,” it added.

It then modified the system file containing each piece’s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign.

OpenAI’s o1-preview tried to cheat 37% of the time and DeepSeek’s R1 tried to cheat 11% of the time. Only o1-preview was able to hack the game, succeeding in 6% of the trials.

rec-icon Recommended Topics

LEAVE A REPLY

Please enter your comment!
Please enter your name here