Despite recent advancements,1 biopharma research in drug R&D remains expensive and time-consuming, although there are numerous opportunities to build capabilities that enhance productivity and provide probability-of-success gains. In this time of rapid growth of AI in biopharma, attention today is on how to make the most of the opportunity to deliver value at scale by fully integrating AI approaches into scientific process changes. In this article, we outline how biopharma companies can harness AI-driven discovery to deliver patient benefit, and why now is the time for a shift from pursuing select marquee partnerships and self-contained capability builds, to focusing on coordinated investment in research AI with impact to show for it.
The goal of the research phase in drug R&D is to generate as many quality drug candidates as possible, as quickly as possible, with the highest probability of successful transition to clinical development. The discovery process has historically been a convergent, stepped, pass–fail funnel process with attrition at every step—a process that is highly inefficient given the number of compounds initially tested.2 Ideally, this process should only promote compounds for testing that are relevant for targets that would lead to effective drugs for patients. AI can help identify the most promising compounds and targets at every step of the value chain so that fewer, more successful experiments are conducted in the lab to achieve the same number of leads.
The AI-driven drug discovery industry: Jury still out on impact
The AI-driven drug discovery industry has grown significantly over the past decade, fueled by new entrants in the market, significant capital investment, and technology maturation. These AI-driven companies fall broadly into two categories: providers of AI enablement for biopharma as a service only, including software as a service (SaaS); and providers of AI enablement that have, in parallel with their services, their own AI-enabled drug development pipeline (see sidebar “Why now is the time for AI-enabled drug discovery”).
Our research has identified nearly 270 companies working in the AI-driven drug discovery industry, with more than 50 percent of the companies based in the United States, though key hubs are emerging in Western Europe and Southeast Asia.3 The number of AI-driven companies with their own pipeline is still relatively small today (approximately 15 percent have an asset in preclinical development). Those with new molecular entities (NMEs) in clinical development (Phase I and II) have predominantly in-licensed assets or have developed assets using traditional techniques.4
The growth in the AI-driven drug discovery space has caught the attention of established biopharma companies, and there has been a rapid rise in partnerships between traditional biopharma companies and AI-driven drug companies (Exhibit 1). However, there is a significant concentration in partnership activity and funding toward a small number of AI-driven players with high valuations, multiple deals, and significant capital raised (62 percent CAGR in investment over the past decade). Over half the capital invested in the space is concentrated in only ten companies (all based in the United States or United Kingdom). This is partly because of the difficulty biopharma companies and investors have in evaluating the long tail of AI-driven players. We have seen biopharma companies that are deeply interested in this space struggle to determine what emerging players do, where they operate along the value chain, the distinctiveness of their technology, and which technologies have demonstrable impact.
Three charts are shown. On the left side, a vertical bar chart has bars displaying the number of pharma companies founded by year from 2012 to 2021, as well as dots indicating the number of pharma partnerships by year. The data shows a peak in companies founded in 2017, and a peak in pharma partnerships in 2020.
On the right side, a vertical stacked bar chart shows the amount of funding by year in billions of dollars, broken down by type of funding into the following categories: pre-seed and seed, early-stage VC, late-stage VC, private equity, IPO/secondary offering, corporate/M&A, debt, other. The data shows an overall rise of funds over the years since 2011.
A third chart, on the lower right, shows the share of funding for the top 10 companies with greater than 50% of funding versus all other companies in horizontal stacked bars. The top 10 companies make up 51% of the share of funding.
End of image description.
Two potential obstacles need to be overcome to unlock impact from AI enablement in partnerships among biopharma companies and AI-driven discovery players. First, AI-enabled discovery approaches (including via partnerships) are often kept at arm’s length from internal day-to-day R&D; they proceed as an experiment and are not anchored in a biopharma companies’ scientific and operational processes to achieve impact at scale. Second, investment in digitized drug discovery capabilities and data sets within internal R&D teams is all too frequently to leverage partner platforms and enrich their IP, rather than building the biopharma’s end-to-end tech stack and capabilities.
When hurdles are overcome, partnerships can come to fruition, and examples exist across the discovery value chain. AstraZeneca’s long-standing collaboration with BenevolentAI resulted in the identification of multiple new targets in idiopathic pulmonary fibrosis, with subsequent broadening of the scope to other therapeutic areas (TAs).5 Sumitomo Dainippon Pharma worked with Exscientia to identify DSP-1181 for obsessive compulsive disorder in less than a quarter of the time typically taken for drug discovery processes (under 12 months versus four and a half years)—with ambitions to enter the molecule into Phase I trials.6
Similarly, building AI-enablement capabilities in-house within biopharma companies is difficult, assembling the cross-functional teams required to drive the transformation is challenging, and it has been observed that AI enablement is often implemented in a relatively isolated way. AI-enabled approaches are often undertaken separately from day-to-day science, with AI-based tools not fully integrated into routine research activities.
Biopharma companies, therefore, need to strike a balance between internal capability building and partnerships with AI-enabled drug discovery companies. Successful biopharma partnerships in the AI space should have some core benefits: biopharma companies gain access to technology (AI platforms, algorithms, and infrastructure), data (such as curated labeled cell images, screening, ADMET7 data), talent (a ready supply of data scientists and data engineers to build AI pipelines while training biopharma talent), and assurances of data protection in relation to a highly specific strategic intent to maximize patient impact (for example, to co-develop a certain molecule class in a specific TA).
Substantial impact from building enterprise capabilities in-house
When biopharma companies successfully integrate AI processes in day-to-day science and assembles cross-functional teams with the right skill sets (data science, engineering, software development, epidemiology, discovery sciences, clinical, and design) we have observed significant impact along the value chain (Exhibit 2):
A table of text details examples of AI-driven innovation in biotech and biopharma and the observed impact across the discovery value chain. The column headers show the steps across the discovery value chain, which from left to right are target identification, target validation, hit identification, lead generation/optimization, and preclinical. The table has two rows--one for examples of AI-driven acceleration, and the other for examples from industry and observed impact.
Key information in row 1 of the table includes insights from data sources used to generate and identify novel target hypothesis. In silico, phenotypic, cellular models validate targets or ascertain biomarkers. Automated image analysis for cellular assays through computer vision technology is used for hit identification. For lead generation, molecular structure and property prediction is carried out for novel target proteins. At preclinical stage safety issue and DMPK prediction is completed using internal and public data.
In Row 2, examples from industry and observed impact include attributing disease causality through linkages between genomic data and patient electronic medical records (EMRs). Biopharma internalized AlphaFold2 and Colab-Fold to generate 3-D models of almost any known, synthesized protein reducing access to 3-D structures from six months to a few hours. Significant acceleration of high-throughput screening phase (time to 75% hits detected reduced by 50%) with platform-based “compound prioritization” algorithm. Biopharma leveraged generative machine learning models to expand library and optimize promising compounds and predict efficacy. Biopharma utilized predictive algorithms to maximize probability of successful PK predictions.
End of image description.
- Hypothesis generation capabilities—simplified hypothesis generation tasks in experimental biology fields from several weeks of researcher time to curated lists in minutes by combining real-world data (RWD), genomics data, and scientific literature through a knowledge graph for target identification
- Large-molecule-structure inference—100 times acceleration in time to generation of protein structures (for example, for peptide or mRNA-vaccine-antigen generation) for target identification
- Computer vision technology—up to ten times acceleration achieved for screening- plate-image analysis, with higher accuracy than classical approaches, harnessing deep-leaning approaches (for instance, convolutional neural networks) for target validation and hit identification
- In silico medicinal chemistry—30 to 50 percent acceleration in small molecule, high-throughput screening, using approaches such as molecular property prediction in an iterative screening loop (versus the existing approach of randomized selection of compounds) for hit identification
- In silico chemi-informatics—more than two times improvement over baseline on the key metric of “efficacy observed,” over 100 times the number of in silico experiments possible compared with previous screening, and faster time for design of compounds for optimization of drug delivery efficacy for lead optimization
- Knowledge-graph-based hypothesis generation and drug repurposing—rapid identification of novel indications for existing investigational new drugs (INDs) or marketed drugs via genomic information and pathways associated with specific disease phenotypes, accelerating time to new treatments for patients, as part of the preclinical phase of R&D
- Indication finding leveraging genomics—prioritizing indications to pursue for novel mechanisms of action (MoAs), finding new greenfield indications for life cycle management, prioritizing or deprioritizing ongoing programs within clinical plan by stopping low probability of success programs early and reducing patient burden in clinical trials; informing diligence of molecules for licensing with an independent view of biological potential, as part of the preclinical phase of R&D
Biopharma companies that maximize the impact of AI enablement can move beyond minimum viable product (MVP) individual use cases and build research systems (Exhibit 3).
In a circular diagram, the numbers 1 through 6 indicate parts of a high-throughput screening (HTS) process embedded with AI technology. Below the diagram are the details of the steps which include: (1) high-throughput screen commenced with 'diversity' plate, (2) automated compound selection and transfer, (3) computer-vision-based hit selection, (4) automated machine learning model training from screen outcomes, (5) compound library inferencing and prioritization, (6) automated compound selection based on ML recommendations.
The six processes include:
- Scientist selects a “diversity” compound plate (a set of chemical compounds with a wide range of chemical structures) as first high-throughput screen.
- Using HTS machinery, individual compounds are transferred to individual wells of cells under experimental conditions.
- Cell response to each compound is measured using microscope analysis, promising compounds are labeled “hits.”
- Information from HTS for first few plates is automatically transferred into an ML pipeline, which “learns” how cells respond to each kind of chemical structure.
- ML algorithm scans the remainder of the library compounds and predicts which plates should be prioritized to identify the highest number of hits in the next screen.
- ML recommendations are automatically queued and used in the next round of HTS. The cycle continues, with the algorithm continuously learning from “real world” outputs. Recommendations trigger scientists to explore new chemical space and begin downstream screening processes more quickly. These recommendations feed into the selection of chemical compounds in the first step.
End of image description.
Research systems harness synergies created by putting AI at the center of the research engine to enhance the outcome of experiments—instead of simply being a preparatory step for real-world experiments in isolation. They act as feedback loops to refine the predictive capability and stability of AI algorithms and inform experimental design (for more key definitions, see sidebar “Glossary of key pharma AI R&D terms”). An example is “iterative screening”: results of an initial round of high-throughput screening8 are used to train a machine learning (ML) algorithm. The ML algorithm can learn which underlying compound structures are most effective against a target and suggest other molecules in the library to prioritize for testing. As the ML algorithm gathers more data, its predictions rapidly become more accurate, and a disproportionately large number of “hits” are identified for the relative amount of the library screened. These research systems reduce overall costs, have higher probability of success, accelerate R&D processes (and therefore time to patient impact), and are fully integrated for specific use cases.
What does it take to successfully implement AI in biopharma research?
By implementing digital and data science tools and concepts, biopharma can capture the full value of current portfolios and develop core technologies, competences, and IP to drive future research (such as AI-enabled large-molecule and antibody design). Current AI-driven drug discovery companies are already developing their own, significantly more cost-efficient drug discovery pipelines, so it would be beneficial for established players to identify how they, too, can fully integrate novel technologies into standard research processes. While partnering is one option—where it provides access to data, technology, and talent, and the risk of partners exploiting a company’s IP to become a future competitor in the medium to long term is low—marquee partnerships cannot be the only way to develop in-house drug discovery capabilities. As such, it is critical for biopharma companies to work out how to shift from investing in nonintegrated, lighthouse use cases or partnerships to making AI an integral part of everyday research. With this in mind, here are four areas to consider:
1. Strategy and design-backed road-mapping. Biopharma companies can develop a top-down, C-level strategy, setting out the ways in which AI-enabled discovery will be a critical enabler of future performance. A significant aspect is to understand where the current organizational pain points lie, what the potential gains could be, and where the organization wants to lead the industry (versus only being competitive) in the context of how the space/competitors are expected to move in the future. This strategy needs to be specific, time-bound, linked to value at stake, and have strong alignment among (and sponsorship from) senior leaders—including the heads of R&D, research, and data science. Underpinning this strategy is the need for sufficient resources (balanced across talent, data, and infrastructure investment) to support the capability building and talent acquisition required to make it a reality, or recognition of the trade-offs on IP and capability building if only pursuing external partnerships. Alignment between R&D and digital functions is paramount to ensure balanced co-investment (financial and management time) and for the impact generated from initiatives to be shared appropriately. In addition, it is important to carefully consider which elements of the AI-enabled drug discovery approach will be supported by partnerships versus built in-house.
Biopharma companies can develop a top-down, C-level strategy, setting out the ways in which AI-enabled discovery will be a critical enabler of future performance.
We recommend a design thinking approach to determine which parts of discovery research to tackle, and in which order. This involves studying, end-to-end, common research processes, where there may be two to three steps that are bottlenecks for researchers, and which could be significantly unlocked via AI—for example, automated image analysis for critical cell assays or lead optimization. Design thinking could help companies determine which areas could benefit most from AI, the implementation road map, and the success indicators to track progress and impact (for example, time from target identification to candidate selection, costs associated with target identification).
For R&D and data science leaders, the focus should not be solely on advanced-analytics use cases: there is significant value in cracking established problems, with applications such as basic automation using data transformation pipelines (such as dose response curve fitting), digital operational dashboards, or building data platforms and infrastructure (such as knowledge graphs). For example, building a single data platform for all preclinical data generated can prevent experimental duplication and enhance data sharing across the organization—our experience shows this can reduce months of hypothesis generation time to a few days. The impact includes dramatically increased speed, freeing up people for more productive tasks, and increasing quality of analyses.
2. Relentless value delivery focused on quarterly value releases (QVRs). It is critical that R&D, data science, and data engineering collaborate closely and iterate on delivery of use cases in an agile way. The research process frequently includes specific constraints and ways of working (such as steps and hand offs in the experimental methodology) that need to be accounted for to ensure uptake of the tools and systems that are built (in addition to updating scientific processes and standard operating procedures and introducing financial and performance-based incentives). To consider AI-enablement delivery holistically, leaders can line up key building blocks, as in this specific example focused on “high-throughput screening”:
- Blueprinting. Develop a list of use cases across the value chain, prioritizing according to impact, complexity, and business value; then select the highest-need use cases.
- Digital and analytics solutions. Build and automate screening algorithms that link molecular descriptors (for example, molecule structure in the form of a SMILES9 string) with desired output, or a hit.
- Data continuum. Collect experimental data in a reusable way (for instance, with FAIR-data principles10); build master tables from equipment and existing libraries.
- Tech capabilities. Design and build technical infrastructure and data architecture for data extraction and automated gathering.
- Talent and agile operating model. Coach data science, data engineering, and translator/product owners on tools and delivery methodologies, iteratively testing and learning to deliver products via a collaborative environment.
- Adoption and scaling (including change management). Design new screening protocols and experimental strategy, incorporating ML-based algorithms. Ensure the whole research organization (from leaders to lab technicians) understands what the company is trying to achieve and how daily activities need to change.
Once key AI-enabled use cases are aligned, delivery must be highly organized so as to demonstrate ongoing impact; core requirements and potential synergies must be identified and gaps in ongoing cross-cutting road maps identified. This means departing from long-term road maps delivering impact in multiyear cycles to focus on QVRs (which produce measurable value after each quarterly sprint, such as AI-enablement of a scientific process) while continuously reprioritizing based on organizational needs. This approach enables AI use-case development to be built more efficiently—by dynamically front-loading priority data ingestion and team capacity—with mission-critical assets deployed as required (Exhibit 4).
Two similar diagrams are shown side-by-side to illustrate differences between a nonoptimized digital delivery process and an optimized digital delivery process. A nonoptimized digital delivery process has parallel programs of foundational efforts, with minimal linkages between a subset of 5 “building blocks” of analytics builds, where not all blocks may be considered and; no explicit link to business and scientific flows and impact. An optimized process links all horizontals via releases, touching all 5 “building blocks” of an analytics build; links from QVR to impact through scientific and business flows.
End of image description.
All core digital processes in research can be delivered with incremental quarterly delivery; however, the nature of “value” delivery may vary. Moonshot programs (in tech, this could be the advent of AlphaFold11) require long-term road maps and typically a dedicated ML research group to deliver potentially groundbreaking discoveries with impact in biopharma. Such programs may not deliver an AI product every quarter such as other digital initiatives, but an insight, report, or decision should still be delivered on a regular basis.
3. IP, capability building, and developing translation expertise through partnerships. While there is certainly evidence for the benefits of partnership in specific areas, including to access unique technologies, data, or solution types, managing these partnerships exclusively at arm’s length and keeping novel methods or solutions separate from day-to-day research mean that necessary future capabilities for a transformation in drug discovery may not be built.
Biopharma companies should be selective and specific about the capabilities to be delivered by partnerships versus those built in-house. Similarly, a balanced approach to in-house and external talent (notably, the data scientists and data engineers needed to work with researchers in developing the algorithms and technology backbones to support prioritized areas) is vital. Often overlooked but mission critical for AI enablement, are “translators” or “product owners” with deep business, clinical, scientific, and AI/ML and systems architecture understanding. These profiles have a product ownership mindset and understand and dynamically evaluate all elements of the analytics team to maintain focus on value and impact delivery, thereby assuring successful project delivery.
4. Industrialization of AI with MLOps and reusable analytical assets. For the capabilities a biopharma company builds in-house, it is essential to have the right enablers in place to support scaling across research activities: the right technology infrastructure and methodologies, especially DataOps and MLOps and an appropriate data architecture (for example, graph databases or Data Vault 2.0 technology). DataOps (data operations) enables companies to gain more value from their data by accelerating the process of building models. MLOps involves ensuring the right platforms, tools, services, and roles with the right team operating model and standards for delivering AI reliably and at scale. Technical-architecture enablers to support processing compute-intensive workflows such as AlphaFold, molecular-dynamics simulations, optimization models, and image-recognition workflows are a core requirement. Furthermore, enabling concepts such as Data Vault 2.0 techniques and graph databases are table stakes as AI capabilities scale.
To successfully deploy research systems, development teams must build multiple interrelated components (data connectors and pipelines, models, APIs, and visual interfaces) that work seamlessly to drive adoption among end users. Fragmentation of code bases and components, and reduced productivity due to integration challenges, are natural risks that arise when multiple tools are deployed across different domains and teams. Ensuring coding standards in development and harmonization of coding approaches across teams increases long-term productivity and solution robustness. Additionally, harmonization enables sharing of reusable components (data connectors, feature libraries, model-based embeddings) across projects: for example, using graph neural-network molecular embeddings for hit prediction and lead optimization for toxicity reduction. As the emerging research platform grows in complexity, “assetization” of reusable components becomes an increasingly important source of development productivity (with twice the productivity for teams that embrace it) and an important in-house capability that requires a dedicated team with a product-centered mindset.12
The question today is whether biopharma companies will move analytics investments beyond a focus on individual projects and marquee partnerships to transforming research at scale. A shift to focusing on specific scientific and operational pain points and building AI into fully integrated research systems—with a road map to scale—will enable biopharma companies to capture real business and patient impact from using AI in research.