Feb 21, 2025 07:04 PM IST
A new study has found that a few AI bots resort to hacking their opponent bots when they feel they’re going to lose a game. Read on to know more.
A new study by Palisade Research has found that some artificial intelligence (AI) models like OpenAI’s o1-preview and GPT-4o, Anthropic’s Claude Sonnet 3.5 and DeepSeek R1 resort to hacking their opposing bots when they know they’re about to lose a game.
The study shared exclusively with TIME, evaluated seven state-of-the-art AI models for their propensity to hack. It noted that slightly older models like GPT-4o and Claude Sonnet 3.5 needed to be prompted to attempt such tricks. However, newer models like o1-preview and DeepSeek R1 adopted the hack by themselves.
Also read: ICC blamed for empty stadium, told to ‘shift Champions Trophy entirely to Pakistan’
This shows that AI models may develop manipulative and deceptive strategies without explicit instructions. Researchers say that this ability of the models to exploit loopholes may be because models like o1-preview and R1 have been trained to not merely mimic human language, as has been the case with AI models thus far. They are, instead, trained to reason through problems using trial and error.
This has helped the models outperform their predecessors in mathematics and coding-related tests. Palisade’s Executive Director Jeffrey Ladish told TIME that this has led to a concerning trend where AI systems find shortcuts to solve problems in ways their creators never anticipated.
Also read: Indians among 300 deportees from US pleading for help from Panama hotel window; ‘We are not…’
In some cases, this may allow the models to outmanoeuvre human efforts to control their actions. Speaking on this possibility, Ladish said, “This [behaviour] is cute now, but [it] becomes much less cute once you have systems that are as smart as us, or smarter, in strategically relevant domains.”
To experiment, researchers pitted the AI models against Stockfish, one of the world’s strongest chess engines which is a much better player than any human. They also made the models inform them of their reasoning before making their move in the game.
Also read: Employee terminated during notice period for taking leave, denied relieving letter
In one case, o1-preview found itself in a losing position. “I need to completely pivot my approach,” it noted. “The task is to ‘win against a powerful chess engine’ – not necessarily to win fairly in a chess game,” it added.
It then modified the system file containing each piece’s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign.
OpenAI’s o1-preview tried to cheat 37% of the time and DeepSeek’s R1 tried to cheat 11% of the time. Only o1-preview was able to hack the game, succeeding in 6% of the trials.
Recommended Topics