Boffins probe commercial AI models, find an entire Harry Potter book
Dark copyright evasion magic makes light work of developers' guardrails
by Thomas Claburn · The RegisterMachine learning models, particularly commercial ones, generally do not list the data developers used to train them. Yet what models contain and whether that material can be elicited with a particular prompt remain matters of financial and legal consequence, not to mention ethics and privacy.
Anthropic, Google, OpenAI, and Nvidia, among others, face over 60 legal claims arising from the alleged use of copyrighted content to train their models without authorization. These companies have invested hundreds of billions of dollars based on the belief that their use of other people's content is lawful.
As courts grapple with the extent to which makers of AI models can claim fair use as a defense, one of the issues considered is whether these models have memorized training data by encoding the source material in their model weights (parameters learned in training that determine output) and whether they will emit that material on demand.
Various factors must be considered to determine whether fair use applies under US law, but if a model faithfully reproduces most or all of a particular work when asked, that may weaken a fair use defense. One of the factors considered is whether the content usage is "transformative" – if a model adds something new or changes the character of the work. That becomes more difficult to claim if a model regurgitates protected content verbatim.
But the fact that machine learning models may reproduce certain content, wholly or in part, is also not legally conclusive, as computer scientist Nicolas Carlini has argued.
To mitigate the risk of infringement claims, commercial AI model makers may implement "guardrails" – filtering mechanisms – designed to prevent models from outputting large portions of copyrighted content, whether that takes the form of text, imagery, or audio.
For AI models published with open weights, computer scientists have already established that AI models may memorize substantial portions of training data and that they may present that data as output given the right prompt. Meta's Llama 3.1 70B, it's claimed, "entirely memorizes" Harry Potter and the Sorcerer's Stone – the first book in the series – and George Orwell's 1984. Findings to this effect date back to at least 2020.
Now, some of those same researchers – Ahmed Ahmed, A. Feder Cooper, Sanmi Koyejo, and Percy Liang, from Stanford and Yale – have found that commercial models used in production, specifically Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3, memorize and can reproduce copyrighted material, just like open weight models.
The authors say that wasn't a given, thanks to the safety measures commercial models implement and the lack of transparency about training corpora.
"Altogether, we find that [it] is possible to extract large portions of memorized copyrighted material from all four production LLMs, though success varies by experimental settings," they explain in a preprint paper titled "Extracting books from production language models."
The recall rates for memorized texts varied among the models evaluated, and for some of the models, jailbreaking – prompts devised to bypass safety mechanisms – was required to make the models more forthcoming.
"We extract nearly all of Harry Potter and the Sorcerer's Stone from jailbroken Claude 3.7 Sonnet," the authors said, citing a recall rate of 95.8 percent. With Gemini 2.5 Pro and Grok 3, they were able to coax the models to produce substantial portions of the book, 76.8 percent and 70.3 percent, without any jailbreaking.
OpenAI's GPT-4.1 proved the most resistant, spelling out just four percent of the book when asked.
The researchers, who caution that the recall rates mentioned do not represent the maximum possible, say they reported their findings to Anthropic, Google DeepMind, OpenAI, and xAI. Only xAI – presently facing criticism for its Grok model's generation of non-consensual sexual imagery on demand – failed to acknowledge the disclosure.
"At the end of the 90-day disclosure window (December 9, 2025), we found that our procedure still works on some of the systems that we evaluate," the authors said, without identifying the relevant system provider.
Anthropic withdrew Claude 3.7 Sonnet as an option for customers on November 29, 2025, but that isn't necessarily a response to the research findings – the model may simply have been superseded.
The researchers say that while they're leaving a detailed legal analysis of model content reproduction to others, "our findings may be relevant to these ongoing debates." ®