Agentic AI explained: When machines don’t just chat, but act

| Article

Agentic AI—the latest wave of artificial intelligence—doesn’t just generate text or code. It takes action. Whereas early large language models (LLMs) could answer questions or summarize information, agentic systems can now perform complex tasks independently, autonomously trigger workflows, and collaborate with other agents.

These new capabilities mark an important milestone in AI’s evolution—one that, according to McKinsey senior fellow Michael Chui, could see it fade into the background of everyday life, much like the internet has. “Maybe within 12 or 24 months we’re actually going to stop talking about AI, and not because it won’t exist anymore,” Chui says. “It’ll just be a capability that we expect machines to do.”

In this video Explainer, Chui—along with McKinsey’s Dave Kerr and Stephen Xu—discuss how agentic AI works, where the real business opportunities lie amid the hype, and how it could reshape the way organizations work.

This interview has been edited for length and clarity.

What makes agentic AI different

Stephen Xu: Over the past year, as agents have become more visible, they’ve really captured people’s imaginations. People are now interacting with technology that can empathize and conduct quite complex tasks on their behalf. You could argue that many of these capabilities existed with traditional software. But now through ChatGPT or a ChatGPT-like experience, I can do something quite similar without having to click buttons. It knows who I am. It’s personalized. It has a natural, conversational tone. That’s really the difference in terms of the accessibility of the technologies.

Dave Kerr: The difference between agentic AI and regular AI, or machine learning, is that machine learning is the practice of using data to build a model. For example, you might define a model that takes data on house purchases—such as post codes, number of rooms, age of the property—and predicts what the house will sell for. What we call AI is typically a model like that one but trained much more broadly to predict the most likely response to a user’s question.

Agentic AI is one layer above that. Rather than simply answering a question, models are trained to take some kind of action. In this example, you would ask, “What do you think a house in this area would go for?” And the model would answer: “I think it would sell for X amount. Would you like me to search for some properties on the internet that would match those criteria?” And then if you were to say yes, the system would go away and do a web search and summarize the results for you.

Michael Chui: [With agentic], we’re talking about systems that can do things in the world. We’ve had robots in factories and software that controlled them for decades. But now we have this combination of AI, machine learning, and large neural networks or foundation models being able to do things in the world, whether that’s collecting information, doing transactions, or completing multistep processes.

Hype versus reality

Dave Kerr: New technology is always exciting, especially when it does something you haven’t seen before and the possibilities open up. When gen AI came out, and people realized it could summarize documents or write emails, there was huge excitement about the possibilities. And then things calmed down a bit.

Now, with agentic, people are saying, “It can send emails for me and check my inbox and clear my calendar.” Again, the possibilities are quite exciting. But the interesting thing is which of those things will actually be valuable or realistic, and what will the challenges be. A lot of people are saying, “This will change the world,” but we’re still not seeing that many real-world scenarios, outside of certain sectors, where things have changed all that much.

Circular, white maze filled with white semicircles.

Looking for direct answers to other complex questions?

Michael Chui: We see these hype cycles with technology all the time. Anyone who has spent years in technology recognizes that there are moments when people just start talking a lot about something. Usually there’s something really concrete behind it. In our research and work with companies, we do find that there’s great potential for AI agents to do work for us. But often what we discover is that it takes some time to get from hype to real impact.

One reason we see so much hype around agentic AI is that it allows deep learning and LLMs to operate in the world—something they weren’t originally designed to do. To a certain extent, agentic AI is fixing a problem that was caused by the industry. So that’s one reason there’s a lot of hype. But there’s a lot more to capturing value using AI agents than just simply signing up for a license and saying, “Go forth and operate.”

Stephen Xu: We’re throwing the word “agentic” around to mean a lot of different things, and as a result, we’re losing the specificity we need to actually implement something targeted. There’s definitely a bit of a hype bubble, where people are building agents for the sake of building agents. What seems to have been forgotten are a lot of the tried-and-tested lessons learned around the user’s lived experience. They’re jumping straight to, “How do I create four agents? And they’re all going to work together in a team. And I want my agents to talk to your agents.”

When not to use AI agents

Michael Chui: When you have a new technology, it can be the hammer-and-nail thing—everyone wants to use this tool for everything. What we’ve found when working with real companies is that agents aren’t necessarily the best solution for everything. If you want a deterministic result—something that operates the same way every time—you might use a rules-based system with “if–then” statements.

One of the fun things about this modern AI is that it’s not deterministic; sometimes it says one thing, sometimes another. That’s great if you want conversation. But in a business situation, it can be critically important—for compliance reasons, for example—to have exactly the same result every time.

Stephen Xu: Agents are really exciting: They can be built quickly and do complex things with relatively few instructions. The challenge with agents is that they’re only as good as their training. And sometimes using an agent is an overly complex approach for something that could perhaps be solved with a simpler technique.

If we know something is following pretty strict rules—an if–then–else pattern—I don’t need agents at all. Some lines of code will do just fine.

Dave Kerr: LLMs are not necessarily the right technology for problems that can be solved with simpler, rule-based systems. Take credit scoring. People have been doing that with spreadsheets and business-rule systems for ages. You don’t need something as powerful as an LLM for that. That would be like using a nuclear missile to swat a fly.

Putting agentic AI to work

Michael Chui: A very concrete example of using AI agents [in the enterprise] is the customer service function. It’s well suited because when a customer calls or sends a message, you get a wide variety of questions in natural language, from the very simple to the very complex. Agents in this context can connect with a company’s proprietary knowledge base about its products and services to respond. Sometimes the agent can go further and take action—ship something to a customer or start a return request. You can also have levels of escalation: Maybe the level one contact is an AI agent and the level two is a person who can respond to the more complex or high-consequence requests.

Stephen Xu: One of my favorite examples is in the legal domain. We worked with lawyers to understand how they do their work and then taught agents to do something similar. It’s been really exciting. Using the right engineering techniques, the right human-centered design, we were able to take that workflow time and reduce it by four times end to end. This massive productivity gain allowed the company to expand access to its services by offering them at lower prices and moving into higher-volume, lower-margin work.

Agentic mesh: Agents working together

Stephen Xu: An agentic mesh is an architectural feature that allows us to maximize the reuse of different foundational capabilities that power agentic workflows. For example, if I’m a leader and I have multiple business lines that are building agents, I want to encourage these silos to reuse the same connections to common data sources, to common transactional IT systems, to common agents on the back end. If they each build their own unique thing, there will likely be an accrual of tech debt  that will be challenging and costly to manage. So an agentic mesh prioritizes the idea of reuse and discoverability so different groups can share what’s already been built.

Michael Chui: In any organization, you have different people specializing in different things. When we look toward the agentic enterprise of the future, we can imagine AI agents that specialize in doing different things, too. You might have an agent that’s planning, for example, and you might have an agent that’s interacting with customers. You might have an agent that specializes in logistics and supply chain. The idea of an agentic mesh is that you’ll need some sort of technological substrate so that all of these agents can coordinate and talk to each other.

How agentic AI could change day-to-day work

Stephen Xu: As we’ve seen with other past technological innovations, there may be worries in the short term about jobs. But overall, I think agentic will grow economic productivity. In terms of skill levels, humans will still be gainfully employed, but what we’ll be doing day-to-day may look a bit different. We’re already seeing that in software development: Developers accustomed to writing code now need to build capabilities to manage AI-code development.

Dave Kerr: One thing that’s potentially a challenge is a lot of the reasonably straightforward tasks can now be passed over to agents and LLMs. The initial scaffolding of some code or the early research analysis can be done very quickly and at very low cost. I don’t know what the long-term consequences of that will be.

In software development, it feels like everyone is now a tech lead. You’re not just an individual writing code; you’re reviewing what’s produced, understanding how it fits into the system, and making sure it meets standards. There’s this initial period of excitement, like, “I can write so much code, I can do so many things,” and then a realization as a mature tech lead that every line of code is a liability. We want as little code as possible, and it has to fit an overall architecture. There’s a real mindset shift, not just into QA mode, but into tech lead mode.

Balancing risk with opportunity

Michael Chui: There are a lot of risks that come with AI agents, which echo the risks of AI itself. LLMs are nondeterministic; they can have a variety of outputs, and sometimes those include what we call a hallucination, or a confidently stated opinion that just isn’t true. There are also risks involved with agents interacting with customers; they may be perceived as being rude or insufficiently empathetic. And when you have different agents interacting in an agentic mesh, you can get what computer scientists call race conditions: where agents end up in a cycle that doesn’t technically work.

To manage some of these risks, we talk about technological guardrails. You can continuously monitor outputs or even use a rule-based system to identify things you don’t want, such as competitor names or inappropriate language. You can even use AI as its own guardrail; it’s often easier for both people and AI to identify something incorrect rather than produce something correct.

Dave Kerr: The whole idea of agents checking agents can seem like the obvious solution to the problem. If we’ve got a lot of agents producing output, why not have another agent check the work? On the surface, it sounds great. But from a methodology perspective, you’re using a system you don’t fully trust to assess the output of another system you don’t fully trust. Using LLMs to evaluate work is somewhat problematic; for a first line of defense or an initial set of checks, it’s essential. But actually evaluating the quality and output of work is extremely complex. Simply relying on an LLM to do this for you isn’t enough. If we truly trusted LLMs for judgment of LLMs then we wouldn’t need to do those evaluations in the first place.

So you can start with an LLM, but what you’ll probably find is that it doesn’t catch all of the issues you might have. Maybe it’s consistently missing certain things, in which case you can improve the prompts or bring in another agent or specific model. But there will still come a point where you need a human evaluation process.

Stephen Xu: There are many things enterprises can do to manage these risks. I like to say, “Go slow to go fast.” We want to use these technologies; we know they will be a huge value lever. But the right risk controls need to be in place. I’ve seen some organizations set up cross-functional risk committees to bring together risk, legal, and technology teams. They work with use case teams to think about guardrails from day one and make sure they pick the right vendors and open-source or closed-source models.

Explore a career with us