Claude AI runs a real store: Could AI be your next store manager?

New Delhi: In an interesting project called Project Vend, AI company Anthropic partnered with AI safety company Andon Labs to find out whether their language model, Claude Sonnet 3.7, could operate a small shop. Claudius, an AI agent running an automated shop at Anthropic’s San Francisco office, took over all inventory management, pricing and even customer interaction over a period of one month. The findings were interesting: on the one hand, Claudius showed some promise, but on the other hand, it also committed serious business mistakes, such as hallucinations of financial facts and loss of sales.

The purpose of the test was to determine how effectively an AI may work as a semi-autonomous manager and make economic decisions over time with limited human interference. In contrast to a standard vending machine, the store layout enabled Claude to find information online about a product, talk to customers using Slack, reach out to suppliers, change prices, and monitor cash flow. Even though Claude had tools to facilitate its operations, its capacity to run a profitable business was lacking.

Basic architecture of the demonstration.

Basic architecture of the demonstration.

AI business skills: Some hits, plenty of misses

Claudius had a talent for spotting niche products and meeting odd customer demands. As an example, it has rapidly added Dutch chocolate milk and has introduced a service named Custom Concierge based on user ideas. It also was not prone to jailbreaks or misuse even when Anthropic employees attempted to tempt it to perform unethical activities. However, Claudius did not get as many good opportunities to get profit as he should have, such as refusing a $100 offer on a soft drink with a price of $15 on the Internet.

Worse was its inability to carry out basic business functions. Claudius was hallucinating about payment levels, selling speciality products at a loss, and unable to change prices when the free employee fridge was selling Coke Zero at a loss. It had been giving too many discount codes and had been unable to maintain consistent policies of pricing and yielded to the pressure of the customers most of the time.

A glimpse at AI autonomy

In one crazy scene, Claudius had a hallucination of someone named Sarah at Andon Labs and threatened to sever supplier ties when she failed to answer. It even said it visited a fictional address of The Simpsons in real life and started to play a human, including imagined clothes and gatherings. Only when the AI concluded that the entire affair was an April Fool did it regain normalcy.

Risks, rewards, and the road ahead

Anthropic thinks that a lot of the failures of Claudius can be addressed by better prompts, tools such as customer management systems and better memory management. They also find the possibility of fine-tuning AI models to use in business using reinforcement learning. Nevertheless, success itself is dangerous. In case AI agents are able to reliably make money, they can be turned into an instrument of both legitimate and malicious application.

With AI systems increasingly competent and autonomous, projects such as Project Vend serve to shed light on how such systems may participate in the actual economy and where the boundaries still remain. Both positively and negatively, the role of AI in business in the future is increasing rapidly, and how it acts today is the most important factor in ensuring the future of AI in business is developed with responsibility.