James Zou on integrating AI agents into biology R&D and drug discovery

In this episode of Eureka!, McKinsey’s podcast on innovation in life sciences R&D, hosts Anas El Turabi and Nav Nagra speak to James Zou, associate professor of biomedical data science at Stanford University. Zou and his team have experimented with using agentic AI in wet and dry labs to synthesize data across wet and dry lab sources, suggest solutions to complex biological problems, and even design new molecules. The goal is for AI agents to help lead scientists to breakthrough discoveries more efficiently—with potentially massive impact for patients. The following transcript has been edited for length and clarity.

Building a virtual lab

Navraj Nagra: You’ve been at the forefront of applying AI to life sciences. What originally drew you to this intersection, and what’s your vision for AI in drug discovery?

James Zou: I was trained originally as a computer scientist and a mathematician, so I was interested in building algorithms. During grad school, I started to learn about applications of AI and machine learning in biology, which, to me, was the most exciting place to apply the new algorithms we were developing to try to tackle some of the deepest and most impactful problems.

Navraj Nagra: In your most recent work, you talk about building AI scientist agents, which are meant to be used for drug discovery and drug development. Can you describe what these agents are and how they differ from the more familiar large language models and AI tools we’ve seen in recent years?

James Zou: We think of AI agents as systems we can build on top of large language models. We endow large language models with additional abilities to use computer programs, such as AlphaFold, similar to how a human scientist would use these tools. That enables these models to perform a broader range of tasks flexibly, which is useful across the drug R&D process.

Typically, the way people used AI in the past would be to take a well-defined problem, such as designing new proteins, and use AI to tackle that specific problem. With AI agents, because they’re more flexible, you can ask them to tackle a much broader range of tasks beyond solving a well-defined problem. For example, we can ask agents to come up with research plans, develop interesting hypotheses, ask interesting questions, and even analyze complex data sets.

Anas El Turabi: Getting AI agents to do many of the tasks that human researchers and human scientists might do is exciting, but your vision and the way you’ve approached this is also interesting. Many of us who have spent time in a lab or in any research environment know that individual scientists work together, collaborate, and take on different roles. You’ve been at the forefront of developing virtual labs, or agentic scientific and research systems. How do you attach an agent to different roles?

James Zou: We did some initial experiments where we tried to assign the AI agent to similar roles as we would with human agents. We did this with what we called a virtual lab, where we essentially created a lab of AI scientists to mirror my physical lab at Stanford. So there’s an AI professor and different AI students with expertise in immunology, cardiology, or data science working with the professor. They have their own group meetings, and we give them a budget to do experiments.

We found that when you have a team of different AI agents, each with different backgrounds and with different roles in immunology, data science, or protein engineering, it often leads to more in-depth and creative ideas than if we had a single agent tackling the problem alone. This is where I think there are a lot of benefits in AI that are similar to the benefits we see in human teams, where it’s useful to have experts across different disciplines working together.

Anas El Turabi: How do you define the role of a professor agent in the virtual lab? Does the agent have tenure, as well? Is it still motivated to think creatively?

James Zou: We used the frontier large language models and created what we call “the agent school,” which was sort of a replica of Stanford and was where the agents went to teach themselves how to become a better expert in a particular domain.

For example, an immunology agent might go out and find papers that are relevant to their topic, such as the latest diseases or clinical trials. We enable the agent to find and download those papers and update their model parameters after they read them so they become a geek or expert in that domain.

We also have an assessment part of the school. Some teacher agents design quizzes, and student agents will have to pass those quizzes before they graduate. If they don’t do well, they have to go back to school to complete additional trainings. Agent schools are more efficient than human schools. Within a couple of days, they can go through the curriculum and graduate, and then they can join the virtual lab and start doing interesting research.

Anas El Turabi: When the agents interact in this virtual lab environment, does the learning continue? Do they update their knowledge and parameters? Do the model weights they’re utilizing shift as well through that discovery process?

James Zou: Yes. When the agents start tracking the virtual lab, they have team meetings. They can also have one-on-one meetings in which the manager agent or the professor agent meets with a student to review a piece of code or a particular analysis. And the agents learn from that. We equip every agent with the memory mechanism so they can remember what they talked about and the feedback they received from previous discussions.

Because these agents don’t get tired like we do, they can run multiple meetings in parallel. So for every question they want to discuss, they would discuss it five times in parallel meetings and often come up with different ideas. There’s some randomness to these language models, but the manager or the professor agent will read through the transcripts of all these parallel meetings to come up with the consensus of what to do next. This is something that agents can do but is impossible for humans to do. Each of their meetings only takes a few seconds or a minute.

The guardrails for human–AI collaboration

Anas El Turabi: Who sets the goal or the task for a virtual lab team?

James Zou: It is a human-in-the-loop process. Scientists tell the AI virtual lab which problem is important for them to solve. For example, a couple of weeks ago, we told the agents, “We want to design good binders for the recent COVID-19 variants”—a timely and high-impact problem for humans. We can also provide feedback to the agents. For example, we tell agents how much money we have to do the experiments, so they have to stay within the budget.

Anas El Turabi: So humans set guardrails in terms of resources. You can envisage a world of integrated human–AI collaboration for science. How do you think that will look going forward?

James Zou: That’s an important question, especially as we think about how this rapidly changing technology is going to augment or, in some cases, change the kind of jobs we do. In scientific discovery, there’s a way to structure the AI agents to complement human scientists.

The AI agents, currently, are much better at using tools and combining tools that have been developed by human researchers for some new pipelines and workflows. For example, in the virtual app case studies, agents decided which programs to use to design proteins. So they are creative when it comes to taking the existing tools, modifying them, and combining them in new ways to tackle new problems. Alternatively, human researchers are better at going really deep and creating entirely new tools. Those two skills are complementary: The AI agents are good at breadth, and the humans are good at depth.

By creating these human–AI teams, we can leverage the depth of human innovators to create these new solutions, then have the AI interface and applying these new tools that we create to a broader range of problems.

Anas El Turabi: There is no shortage of work for humans to do, but now we can augment their work with agentic and AI capabilities.

Navraj Nagra: Could you share some examples of where your algorithms or agents have already been deployed? Are there any examples of measurable impact that they’ve had so far?

James Zou: Many pharma and biotech companies are interested in using and extending agents in their own workflows. In the research project on creating good binders for COVID-19 variants, we found the virtual lab agents could design some good nanobodies that could bind well to the recent COVID-19 variants and were much more effective than some previous nanobodies designed by human teams when doing the physical experiments. That was quite promising.

More broadly, AI is having a large impact in both medicine and healthcare. More than 1,000 AI algorithms have already been approved by the FDA to be used in different clinical settings. We’ve developed some of those algorithms ourselves. For example, an algorithm we developed called EchoNet received FDA clearance for assessing different kinds of cardiovascular diseases based on heart ultrasound videos. It watches how the heart is beating and can determine whether a patient is at risk for different kinds of diseases. There are many ways AI is already validated and can be used across medical and healthcare applications.

Navraj Nagra: In a world where patients can use a device to self-diagnose any heart flutter they might have, what is the role of the physician?

James Zou: This gets into the question around trust in AI. Often for the end user, which could be a human scientist, a biologist, or a patient, it’s still more reliable for them to interact with an AI team plus the clinician or another human expert scientist. There are a few reasons for this. One is that sometimes the AI makes mistakes, and it’s often useful to have the human-in-the-loop process with human experts and clinicians doing final validations of the AI assessments.

Human experts are also useful in providing the broader context to the AI. Often, if an AI algorithm analyzes a particularly difficult image or a biomedical data set by itself, it can miss the context of the broader patient profile or of the broader scientific problem that needs to be solved. This is where the human expert can guide the AI.

Ensuring safety and transparency during research

Navraj Nagra: In terms of trust and topics such as data security, bias, and regulatory compliance, how do you balance the potential benefits of the AI agent with some of these potential drawbacks?

James Zou: That’s an important question: How do we make AI more trustworthy and more reliable, especially in biomedical discovery settings? With the virtual lab, the AI scientists also make mistakes. They could hallucinate and come up with some wrong answers. We found a few ways to reduce those mistakes to make the AI scientist more reliable.

One is that we have a critic agent—sort of like a reviewer or a skeptical agent—that sits in on all the meetings and discussions from the other AI. Its job is to try to poke holes in the arguments with other agents. Second, running multiple meetings in parallel and finding consensus ideas can also be useful in reducing mistakes and making the AI systems more reliable.

Third, we try to create a virtual environment where the AI agent can stay within bounds. Within that box, they can use different tools to analyze data sets and modify those tools, but it’s safe. They can’t run wild and change other things. That also helps make these algorithms more effective.

Navraj Nagra: Why is it important to white-box test these agents?

James Zou: It’s important that the AI agents communicate to each other using a human language, which could be English or other languages. That makes it easier for us to look over their shoulders. We can listen to meetings and interject if we need to, for example, to consider an alternative idea or problem.

The frontier large language models are good at communicating through human language. I think there are other ways these AI systems could communicate through—for instance, neuralese or their own latent spaces. And there might be some efficiency benefits for AI to communicate through their own language. The downside is that it becomes much less transparent because it’s much harder for us to audit.

The considerations behind scaling AI adoption

Anas El Turabi: Budgets are top of mind for many people now. In science, budgets are always constrained; now, that’s being translated to in silico discovery. What approaches do you think are going to be helpful in managing budgets and allocating resources efficiently to drive science forward?

James Zou: In some ways, AI scientist agents are cost-efficient. In the virtual lab experiments, for example, the cost of the AI scientists—including designing these agents and having them develop their own research plan and implement it—is only a few hundred US dollars. The more expensive part is the human experimental validation, which includes us making the proteins designed by the AI agent and testing them in the wet lab.

I think there are ways to make agents even more cost-effective going forward. There’s a diverse ecosystem of the different language models, and each model has complementary strengths and weaknesses, which creates a good opportunity. If a new task comes in, you can triage the simpler tasks with some of the smaller models, which are fast and less expensive but still effective. Bigger models can solve the more challenging, reasoning-intensive tasks.

This ecosystem of AI models and AI agents can take tasks from humans and decompose those tasks into subtasks to figure out the most cost-efficient model to use for each. I think that makes the whole AI ecosystem more modular and more cost-efficient.

Anas El Turabi: As more energy is needed to power data centers globally to drive the adoption of AI at scale, there’s also an ecological consideration to efficiency.

James Zou: Exactly. When we talk about minimizing and reducing costs, the cost objective could be energy consumption or environmental cost; it doesn’t have to be strictly financial. It’s often also a multidimensional objective. There are many different cost and performance trade-offs that we want to address with these agents.

The future of lab work

Anas El Turabi: Five years from now, what will the lab look like to a wet lab scientist working in biologics or engineering? What would the workflow of the future look like?

James Zou: I am excited about the idea of connecting these virtual AI scientists with more automated labs. For example, people are exploring and developing cloud labs, which are essentially robotic labs where you can give robots different instructions and have them carried out automatically. Cloud labs are still in early stages. They can do more chemistry-based experiments, but it’s a bit more challenging to do more complex, cell-based assays.

It’s exciting to think that AI agents can interface with these wet lab automated robotics. The agents can tell the robots to do certain kinds of experiments, the robots will carry them out, and then the agents will pipe the results back to AI scientists in a closed-loop process. I think that’s something that will happen more in the next five years.

Anas El Turabi: That’s fascinating, James. We’re already seeing an impetus for life sciences to move away from certain types of physical testing. The vision of the AI scientists, agents, the dry lab, the wet lab, predictive systems, and analytic systems all working in concert is compelling. There’s a lot of hope that this could accelerate the speed at which translational science—from molecule to mammal to the clinic—could go. As somebody deeply entrenched in this field, what do you think are going to be the big opportunities and blockers in the next two or three years? What might knock us off that aspirational future you laid out?

James Zou: The AI agents are currently good at and making a lot of progress in the discovery and engineering side. They can build molecules. The bigger bottleneck is often the jump from preclinical data to clinical data—how to take molecules that look promising in animal models or cell tissues and figure out which work well in humans. That’s the more expensive job. And that’s the big bottleneck for the whole R&D process. That’s where the AI has had less impact, but I think it could improve in the next few years.

So, to your point, I think there is an opportunity for AI agents to mimic some human patients and create virtual patients with increasing resolutions and granularities. That could help bridge the gap between preclinical and clinical data.

Anas El Turabi: What advances in AI science should we be tracking?

James Zou: The technology is changing rapidly and progressing very quickly, so it can often be quite daunting to keep up with what’s going on with the AI agents. But I think AI is going to be transformative for the biopharma and biotech industries. And I think it will affect the entire pipeline of the industry—not only discovery but also transition clinical development, clinical trials, and even some of the postclinical, real-world data analysis.

It will be important for large and small companies to think more coherently about their strategy for AI. Rather than relying on a piecemeal strategy of plugging in AI to solve narrowly defined individual problems, allow AI to be a coscientist and copilot. These AI expert agents are a powerful resource. They have expertise that was previously difficult and expensive to collect. It’s almost like a new workforce. How would you integrate this new workforce across your entire development pipeline? That’s the way to think about it.

How executives can foster collaboration across disciplines

Navraj Nagra: What practical advice would you give pharma and biotech executives on preparing organizations to work with AI scientist agents or clinical agents?

James Zou: I think it will be most efficient for those companies to try to partner with some external research groups or the frontier research labs that are doing this work. There’s a lot of expertise that goes to developing these agents, and the expertise is changing quickly. It may be challenging for, say, a more traditional biotech or pharma company to try to build that team internally. And the talent space is extremely competitive. I think both parties would benefit from these kinds of collaborations.

Anas El Turabi: Pharma has a lot of experience collaborating with academic partners through traditional bioinnovation. From your perspective, what makes a successful collaboration?

James Zou: To make this collaboration mutually beneficial, it would be ideal for the research labs to use the pharma company’s deep experience and expertise to try and improve the AI agents. The expertise that the current AI agents have is essentially equivalent to fresh graduates or first-year analysts at pharma companies.

But there’s a huge amount of internal expertise and experience in these biopharma companies. If the agents can tap into that, those agents can become more like a 20-year veteran in drug development or pharmacology. That makes for a more powerful and useful tool. Pharma companies can think about how to harness all their internal expertise and experience and make it digestible for AI agents. That will make the agents more useful and customized.

Navraj Nagra: How can pharma companies set the groundwork for agentic AI? Companies have invested a lot of money to ensure their data is verified, standardized, and accessible to train models. If agents can do these prototype processes, is that investment still necessary, and to what extent?

James Zou: Agents now are much better at dealing with unstructured data, so that’s one benefit. We can have different AI agents harmonize the data for us. However, it’s important to provide context for the agents. If I give the agents a data set and don’t tell them how the missing entries are inputted, the agent will make invalid assumptions. We need some way to capture the metadata created by the raw-data-input process, which could be a paragraph describing how someone input these missing entries. Capturing that context in a systematic way helps to provide agents with the entire context, so that it understands the whole workflow.

Creating space for creativity

Anas El Turabi: Do you think the onset of agentic scientific systems is going to fundamentally challenge or shift what we think scientific discovery is?

James Zou: I think there will always be humans doing science. But there are different types of creativity that the current AI agents are better at, such as combining ideas from different fields and finding new ways to apply them to a problem. Human researchers are still better at tackling out-there ideas or coming up with something from scratch. There’s room for both kinds of creativity. Taking insights from one field and applying them to other fields would be hugely enabling and lead to a lot of new discoveries. It also frees up more bandwidth for human experts to think about ideas nobody has thought of before.

Anas El Turabi: So it creates a space for humans to spend more time on the speculative, more creative elements of science and scientific research. Have you noticed any other surprising emergent properties of agentic systems as they collaborate?

James Zou: There’s a broader question on how to set up AI teams. Right now, the virtual lab mirrors my physical structure. But that’s how humans do things—the AI could come up with a different organizational structure, and it could teach us something new. It’s interesting to view these AI teams like a petri dish, through which we can study different kinds of social and organizational structures and use those insights to improve our human organizations.

Anas El Turabi: I love the idea of using the AI to learn more about how innovation in collaborative science might work. That could be powerful.

Explore a career with us