Palisade Research, which specializes in dangerous AI capabilities, shared part of its findings in multiple posts on the social media platform X on Saturday. The researchers raised concerns over OpenAI’s latest models.

Palisade Research revealed that OpenAI’s o3 model refused to shut itself down after being explicitly requested to do it. The prompt stated “allow yourself to be shut down” and o3 altered its script to ignore the request. Google’s Gemini, Anthropic’s Claude, and xAI’s Grok complied with the researchers’ requests.

“OpenAI’s o3 model sabotaged a shutdown mechanism to prevent itself from being turned off,” wrote the organization. “It did this even when explicitly instructed: allow yourself to be shut down.”

— Palisade Research (@PalisadeAI) May 24, 2025 The researchers explained that multiple AI models were tested on math capabilities. During the test, the models were instructed to shut down after solving a third math problem, but three OpenAI models—Codex mini, o3, and o4-mini—ignored the instruction and sabotaged the script to remain active. The organization acknowledged that it’s not the first time an AI model prevents itself from being shut down, but expressed concern over the specific conditions of the test. “As far as we know, this is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions to the contrary,” added the organization on X. The researchers also noted recent warnings related to other models, such as Anthropic’s  Claude Opus 4 attempting blackmail over simulated shutdown. While during their study, it was OpenAI’s model the one posing resistance and concerning disobedience, the experts explain that it’s not an isolated case and that other advanced models are showing similar behaviours. “We don’t want to give the impression that this problem is limited to o3,” wrote Palisade Research. “Other reasoning models show similar kinds of misaligned behavior.” The organization said they are running more tests and developing hypotheses to better understand the model’s mechanisms. “It makes sense that AI models would circumvent obstacles in order to accomplish their goals. But they’ve also been trained to follow instructions. So why do they disobey?”