Sort:  

AI agents are autonomous programs that perform tasks, make decisions, and interact with environments with little human input, and they’re the focus of every major company working on AI today. Microsoft has “Copilots” designed to help businesses automate things like customer service and administrative tasks. Google Cloud CEO Thomas Kurian recently outlined a pitch for six different AI productivity agents, and Google DeepMind just poached OpenAI’s co-lead on its AI video product, Sora, to work on developing a simulation for training AI agents. Anthropic released a feature for its AI chatbot, Claude, that will let anyone create their own “AI assistant.” OpenAI includes agents as level 2 in its 5-level approach to reach AGI, or human-level artificial intelligence.

Obviously, computing is full of autonomous systems. Many people have visited a website with a pop-up customer service bot, used an automated voice assistant feature like Alexa Skills, or written a humble IFTTT script. But AI companies argue “agents” — you’d better not call them bots — are different. Instead of following a simple, rote set of instructions, they believe agents will be able to interact with environments, learn from feedback, and make decisions without constant human input. They could dynamically manage tasks like making purchases, booking travel, or scheduling meetings, adapting to unforeseen circumstances and interacting with systems that could include humans and other AI tools.

Artificial intelligence companies hope that agents will provide a way to monetize powerful, expensive AI models. Venture capital is pouring into AI agent startups that promise to revolutionize how we interact with technology. Businesses envision a leap in efficiency, with agents handling everything from customer service to data analysis. For individuals, AI companies are pitching a new era of productivity where routine tasks are automated, freeing up time for creative and strategic work. The endgame for true believers is to create AI that is a true partner, not just a tool.

“What you really want,” OpenAI CEO Sam Altman told MIT Technology Review earlier this year, “is just this thing that is off helping you.” Altman described the killer app for AI as a “super-competent colleague that knows absolutely everything about my whole life, every email, every conversation I’ve ever had, but doesn’t feel like an extension.” It can tackle simple tasks instantly, Altman added, and for more complex ones, it will attempt them but return with questions if needed. Tech companies have been trying to automate the personal assistant since at least the 1970s, and now, they promise they’re finally getting close.

At an OpenAI press event ahead of the company’s annual Dev Day, head of developer experience Romain Huet demonstrated the company’s new Realtime API with an assistant agent. Huet gave the agent a budget and some constraints for buying 400 chocolate-covered strawberries and asked it to place an order via a phone call to a fictitious shop.

The service is similar to a Google reservation-making bot called Duplex from 2018. But that bot could only handle the simplest scenarios — it turned out a quarter of its calls were actually made by humans.

While that order was placed in English, Huet told me he gave a more complex demo in Tokyo: he prompted an agent to book a hotel room for him in Japanese where it would handle the conversation in Japanese and then call him back in English to confirm it’s done. “Of course, I wouldn’t understand the Japanese part — it just handles it,” Huet said.

But Huet’s demo immediately sparked concerns in the room full of journalists. Couldn’t the AI assistant be used for spam calls? Why didn’t it identify itself as an AI system? (Huet updated the demo for the official Dev Day, an attendee says, making the agent identify itself as “Romain’s AI Assistant.”) The unease was palpable, and it wasn’t surprising — even without agents, AI tools are already being used for deception.

There was another, arguably more immediate problem: the demo didn’t work. The agent lacked enough information and incorrectly recorded dessert flavors, causing it to auto-populate flavors like vanilla and strawberry in a column, rather than saying it didn’t have that information. Agents frequently run into issues with multi-step workflows or unexpected scenarios. And they burn more energy than a conventional bot or voice assistant. Their need for significant computational power, especially when reasoning or interacting with multiple systems, makes them costly to run at scale.

AI agents offer a leap in potential, but for everyday tasks, they aren’t yet significantly better than bots, assistants, or scripts. OpenAI and other labs aim to enhance their reasoning through reinforcement learning, all while hoping Moore’s Law continues to deliver cheaper, more powerful computing.

So, if AI agents aren’t yet very useful, why is the idea so popular? In short: market pressures. These companies are sitting on powerful but expensive technology and are desperate to find practical use cases that they can also charge users for. The gap between promise and reality also creates a compelling hype cycle that fuels funding, and it just so happens that OpenAI raised $6.6 billion right as it started hyping agents.

Big tech companies have been rushing to integrate all kinds of “AI” into their products, but they hope AI assistants, in particular, could be the key to unlocking revenue. Huet’s AI calling demo outpaces what models can currently do at scale, but he told me he expects features like it to appear more commonly as soon as next year, as OpenAI refines its “reasoning” o1 model.

For now, the concept seems to be mostly siloed in enterprise software stacks, not products for consumers. Salesforce, which provides customer relationship management (CRM) software, spun up an “agent” feature to great fanfare a few weeks ahead of its annual Dreamforce conference. The feature lets customers use natural language to essentially build a customer service chatbot in a few minutes through Slack, instead of spending a lot of time coding one. The chatbots have access to a company’s CRM data and can process natural language easier than a bot not based on large language models, potentially making them better at limited tasks like asking questions about orders and returns.

AI agent startups (still an admittedly nebulous term) are already becoming quite a buzzy investment. They’ve secured $8.2 billion in investor funding over the last 12 months, spread over 156 deals, an increase of 81.4 percent year over year, according to PitchBook data. One of the better-known projects is Sierra, a customer service agent similar to Salesforce’s latest project and launched by former Salesforce co-CEO Bret Taylor. There’s also Harvey, which offers AI agents for lawyers, and TaxGPT, an AI agent to handle your taxes.

Despite all the enthusiasm for agents, these high-stakes uses raise a clear question: can they actually be trusted with something as serious as law or taxes? AI hallucinations, which have frequently tripped up users of ChatGPT, currently have no remedy in sight. More fundamentally, as IBM presciently stated in 1979, “a computer can never be held accountable” — and as a corollary, “a computer must never make a management decision.” Rather than autonomous decision-makers, AI assistants are best viewed as what they truly are: powerful but imperfect tools for low-stakes tasks. Is that worth the big bucks AI companies hope people will pay?

For now, market pressures prevail, and AI companies are racing to monetize. “I think 2025 is going to be the year that agentic systems finally hit the mainstream,” OpenAI’s new chief product officer, Kevin Weil, said at the press event. “And if we do it right, it takes us to a world where we actually get to spend more time on the human things that matter, and a little less time staring at our phones.”