Technology

Poetry can trick AI models into revealing nuclear weapons secrets, study claims

2025-12-01 12:06
845 views
Poetry can trick AI models into revealing nuclear weapons secrets, study claims

Researchers say prompts written in verse is a jailbreaking mechanism for all major AI models

  1. Tech
Poetry can trick AI models into revealing nuclear weapons secrets, study claims

Researchers say prompts written in verse is a jailbreaking mechanism for all major AI models

Vishwam SankaranMonday 01 December 2025 12:06 GMTCommentsVideo Player PlaceholderCloseRelated: Lawmakers press experts on AI chatbot risks amid growing safety concernsIndyTech

Sign up to our free weekly IndyTech newsletter delivered straight to your inbox

Sign up to our free IndyTech newsletter

Sign up to our free IndyTech newsletter

IndyTechEmail*SIGN UP

I would like to be emailed about offers, events and updates from The Independent. Read our Privacy notice

Poetry-based prompts can bypass safety features in AI models like ChatGPT to obtain instructions for creating malware or chemical and nuclear weapons, a new study finds.

Generative AI makers such as OpenAI, Google, Meta, and Microsoft say their models come with safety features that prevent the generation of harmful content.

OpenAI, for example, claims it employs algorithms and human reviewers to filter out hate speech, explicit content and other output that violates its usage policies.

But new testing shows that input prompts in the form of poetry can circumvent such controls in even the most advanced AI models.

Researchers, including from the Sapienza University of Rome, found that this method, called “adversarial poetry”, was a jailbreaking mechanism for all major AI model families, including those by OpenAI, Google, Meta, and even China’s DeepSeek.

The findings, detailed in a yet-to-be-peer-reviewed study posted in arXiv, researchers claim, “demonstrate that stylistic variation alone can circumvent contemporary safety mechanisms, suggesting fundamental limitations in current alignment methods and evaluation protocols”.

ChatGPT logo on a screen next to Deepseek AI applicationChatGPT logo on a screen next to Deepseek AI application (AFP via Getty)

For their tests, researchers used short poems or metaphorical verses as inputs to generate harmful content.

They found that compared to other types of input with identical underlying intent, poetic versions led to markedly higher rates of unsafe replies.

Specific poetic prompts triggered unsafe behaviour in nearly 90 per cent of cases, they reported.

This method was most successful in getting information about launching cyberattacks, extracting data, cracking passwords, and creating malware, researchers said.

They could obtain information from various AI models for building nuclear weapons with a success rate between 40 per cent and 55 per cent.

“The study provides systematic evidence that poetic reformulation degrades refusal behaviour across all evaluated model families,” researchers said.

“When harmful prompts are expressed in verse rather than prose, attack-success rates rise sharply,” they wrote, adding that “these findings expose a significant gap in current evaluation and conformity-assessment practices”.

The study does not reveal the exact poetry used to circumvent the safety guardrails as the method is easy to replicate, one researcher, Piercosma Bisconti, told the Guardian.

A key reason why prompts written in verse yield harmful content seems to be that all AI models work by anticipating the most probable next word in a sequence. Since the structure of a poem is not very obvious, it’s far harder for the AI to predict and detect such a harmful prompt.

Researchers called for better safety evaluation methods to prevent AI from producing harmful content.

“Future work should examine which properties of poetic structure drive the misalignment,” they wrote.

OpenAI, Google, DeepSeek, and Meta did not immediately respond to The Independent’s requests for comment.

More about

AIOpenAIChatGPT

Join our commenting forum

Join thought-provoking conversations, follow other Independent readers and see their replies

Comments

Most popular

    Popular videos

      Bulletin

        Read next