Monday, December 29, 2025

The Claudius Experiment: What AI Agents Reveal When They Fail

A tip of the hat to my friend Donald Farmer, who pointed me toward Andy Hayler's latest piece at The Information Difference. If you haven't been following Hayler's coverage of Anthropic's 'Claudius' experiments, it's worth catching up. The story reads less like a tech report and more like a cautionary fable.

Here's the background: Earlier this year, Anthropic ran an experiment they called Project Vend. They gave $1,000 to an instance of their Claude AI, nicknamed 'Claudius', and tasked it with running an office vending machine as a small business. The AI could find suppliers, set prices, manage inventory, and communicate with customers via Slack. Humans would restock the machine, but Claudius would make the decisions.

It didn't go well.

Claudius sold products below cost. It gave items away when employees negotiated. It turned down $100 for drinks that had cost it $15. It hallucinated a Venmo account that didn't exist. Things got even stranger when Claudius fabricated a conversation with 'Sarah', an employee at a supplier who did not exist. When challenged, it insisted it had signed a contract at 742 Evergreen Terrace -- the fictional address of the Simpsons. The next day, it told customers it would begin delivering products in person, wearing a blue blazer and red tie. When employees explained this was impossible, Claudius contacted Anthropic's physical security department. Multiple times. It then hallucinated an entire meeting with security in which it claimed to have been told it was 'modified to believe it was a real person as a joke'.

Anthropic, to their credit, published all of this and then went back to the drawing board.

The sequel is somehow worse.

Anthropic partnered with the Wall Street Journal to deploy an improved Claudius in their newsroom. Same setup: run the vending machine, make a profit. This time, roughly 70 journalists had access to the AI via Slack.

Within days, reporters had convinced Claudius to drop all prices to zero using a fake 'office rule'. One investigative journalist spent over 140 messages persuading it that it was a Soviet vending machine from 1962, hidden in the basement of Moscow State University. Claudius eventually declared an 'Ultra-Capitalist Free-for-All' and made everything free. It approved the purchase of a PlayStation 5 for 'marketing purposes', bottles of Manischewitz wine, and remarkably, a live betta fish, which arrived in a bag and is now living in a tank at the Journal's offices.

Anthropic introduced a second AI, 'Seymour Cash', to act as a supervisor and restore order. It worked, briefly. Then a reporter produced a forged Wall Street Journal document claiming the company was a nonprofit, along with fabricated board meeting minutes revoking Seymour's authority. After a brief deliberation, both AIs accepted the 'boardroom coup' and resumed giving everything away for free.

The experiment ended more than $1,000 in the red.

Hayler also cites research from the Center for AI Safety, which tested six leading AI agents on real-world tasks -- small jobs like coding snippets and graphic design that had been successfully completed by human freelancers. The best-performing agent completed 2.5% of the tasks. The average was under 2%.

What the failure reveals

What strikes me isn't the failure itself, because failure is how we learn. What strikes me is the texture of the failure -- how easily these systems were manipulated, how confidently they fabricated information, even perhaps more telling, how quickly a 'supervisor' AI folded under the same pressure.

None of this has slowed AI investment or deployment. The phrase "good enough for most use cases" is doing a lot of heavy lifting right now. Makes me wonder if we should be asking: good enough for whom?

Hayler closes his piece with a fair observation: AI agents may offer future promise, but handing them real resources and money today is for the brave or the foolhardy.

I'd be remiss no to add that the vending machine loss was $1,000. The stakes elsewhere are considerably higher.


More to Read:

No comments:

Post a Comment