Self-learning: The dawn of a new biomedical R&D paradigm

(9 pages)

Advances in cell and gene therapies (such as treatments based on chimeric antigen receptor T cells and mRNA) are just one of the early signs of the potential power of carefully designed, targeted interventions to treat diseases. This potential could soon be harnessed to design precise therapies to treat and prevent countless more diseases at speeds that would have seemed unimaginable until recently.

The fuel powering this progress is the confluence of breakthroughs in biological sciences with advances in the harnessing of data, automation, computing power, and AI. These are already driving developments at each stage of the biomedical R&D value chain, from discovery to clinical testing to real-world use. But taking a step further to bring automation and AI to bear in connecting the data and insights generated at each stage of the process could begin an entirely new R&D paradigm.

Today’s reductionist approach—where scientists zoom in on a single component or function of disease biology and meticulously test hypotheses through trial and error (often against ill-defined, phenotypic disease states) before being able to develop and validate new therapies—would be a thing of the past. Instead, against much more precise disease states, therapies would be systematically designed for success in a circular process propelled by data feedback loops among the various stages of the R&D value chain. Data and insights gained at one stage would inform others up and down the value chain—and strengthen the understanding of other diseases too.

The result would be exponential drug innovation. There would be fewer failed clinical candidates and many more novel, highly efficacious, safe treatments. There would be more preventative approaches and interventions that respond to early signals of a disease, not just late-stage symptoms. And treatments would be carefully targeted at different subpopulations.

We believe that the world is nearing an inflection point in drug R&D where such a paradigm becomes possible—a Bio Revolution. This article examines some of the exciting tech-enabled research innovations afoot in laboratories around the world at each stage of the R&D value chain that are driving toward that point. And it inspects what biopharmaceutical companies might do to accelerate progress by integrating the data and insights across the value chain.

Biopharma companies will need new data and tech infrastructures to make those connections. They will also need to consider organizing themselves differently, as the new R&D paradigm requires far more collaboration than exists today, not only between researchers and machines but also among researchers themselves and with external partners.

Emerging solutions and approaches in biomedical R&D

Five elements of new biomedical R&D: Signals and enablers

The confluence of new technologies and biological breakthroughs drive progressive changes across all elements of the biomedical R&D value chain. Some of these novel approaches are already adopted by industry pioneers, others are only crystallizing as potential use cases across academia and biotech start-ups, and some might never materialize—or could get replaced by different solutions that aren’t yet visible on the horizon.

Here’s a set of select signals and enablers (not exhaustive, because of the richness of emerging innovation ideas) that, when fully materialized, can bring the biomedical industry closer to a step change and reenvisioning of the R&D value chain. When combined, these and other innovations could give rise to a more interconnected and tech-enabled, five-element biomedical R&D paradigm.

Disease understanding

Omics-driven disease insights. Multiomic data sets start to inform diagnosis and therapeutic approaches effectively, with ultralarge genomic data banks enabling more robust elucidation of gene–disease relationships. Additionally, structural and functional mapping of organs and diseases lead to novel reliable biomarkers and disease models.
High-throughput data collection and insight generation. Novel experimental approaches, such as disease models based on patient-derived induced pluripotent stem cells, high-throughput cell-painting assays, and next-generation microscopy techniques (for example, cryoelectron tomography), enable rapid, at-scale, and parallel comparative analyses of healthy and diseased states.
Improved disease taxonomies. Health and disease are seen as a continuum, and the focus shifts from symptoms to underlying root causes.
In silico system biology modeling. Tech allows the simulation of individual pathways, cells, and organs, and it eventually models simulations of entire organisms.

Therapeutic-hypothesis generation

Better underlying data. Greater availability of actual or predicted structural biology data facilitates the exploration of new molecular targets. Furthermore, an increase in the depth of available data and better knowledge graphs enable more refined hypotheses.
Tech enablers. More traditional computational power increases the feasibility of complex in silico modeling, and novel algorithms offer new ways to interrogate data and derive insights. Novel approaches, such as quantum computing, help explore broader ranges of conformational options or molecule–molecule interactions at a time.
More methods and applications. A growing repertoire of methods, such as deep-learning-assisted molecular docking, increases the variety of hypotheses generated. Hypothesis automation, such as at-scale repurposing and biology-driven high-throughput target identification, potentially broadens the initial funnel even further.

Therapeutic-modality innovation

Regenerative therapies. Researchers learn how to identify the right stem cells for each patient, indication, and organ or tissue and activate their differentiation accordingly for optimal outcomes. Bioactive scaffolds activate extracellular repair mechanisms.
Cell and tissue engineering. Gene-modified cell therapies and engineered immune cells are more routinely adopted as therapeutics and diagnostics. An ability to print tissue, organs, and cartilage opens new horizons in transplantation.
Genetic code design platforms. New gene therapy and gene-editing platforms, such as meganuclease-based, TALEN, CRISPR-Cas9, CRISPR-Cpf1, and nucleic-acid-based (for example, based on antisense and mRNA) modalities, expand the therapeutic and research tool kit.
Improved delivery, durability, and immunogenicity. Next-generation gene therapies offer improved delivery, durability, and regulation. The ability to predict and adapt immunogenicity increases the versatility of antibody, cell, and gene therapies. For example, selective coating of the gastrointestinal tract allows targeted oral delivery or prolonged half-life of drugs.
Targeted and multifunctional modalities. New approaches to known molecular targets, such as proteolysis-targeting chimeras, autophagy-targeting chimeras, lysosome-targeting chimeras, and molecular-glue degraders, expand the therapeutic repertoire. Next-generation antibodies, including multispecific, minimized, and conjugated antibodies, improve the targeting and safety of biologic therapies.
Better combinations. Synergistic drug combinations are predicted in silico based on literature and multiomic data. New combinations of surgery, radiation, psycho- and physiotherapy, electrical stimulation, virtual reality, and novel therapeutics help improve patient outcomes.
Microorganism engineering and synthetic biology. Biological manufacturing enables more sustainable and versatile production. The exploration of little-known cellular phenomena, such as biomolecular condensates, introduces new avenues of therapeutic intervention. Self-amplifying or reprogrammable molecules enable lasting immunity with mRNA vaccination. Microbiomes can be precisely modulated with synthetic microorganisms to counteract or prevent diseases.

In silico and in vitro validation methods

In-silico-first optimization. Predictive in silico modeling of both molecular properties and biological activity derisks development and helps prioritize the most relevant experiments.
Improved human-biology-based validation models. Rapid testing models using cell on chip, organ on chip, or even patient on chip replicate the genetic or proteomic makeup of an individual patient or the cellular environment of a disease. Organoids not only simulate what happens in a single cell but can re-create the entire 3-D environment of a human organ.
Biomarkers, biosensors, and assays. A growing repertoire of biosensors, assays, and biomarker platforms increases the speed of in vitro validation.
End-to-end automation. Seamlessly integrated, AI-guided robotics enable fully automated ultrahigh-throughput screening and chaining of experiments together.

Clinical and real-world evidence feedback

Simulated trials. Pretrial simulations predict the risks of side effects in clinical trials and identify the best responders and the ideal treatment plan for each patient.
Better collection and use of data. Improved use of traditional and digital biomarkers (including real-time biomarkers) enable disease preemption and strengthen data on disease treatment outcomes to inform research efforts better. In addition, mining of unstructured medical data through natural language processing deepens the understanding of the links among treatments, symptoms, outcomes, and unmet needs. AI helps revisit medications already in the market to strengthen the standard of care and explore new potential applications.
Precise treatment design. Precision diagnostics leveraging novel tech, such as rapid sequencing, help improve disease taxonomies and treatment paradigms. In a more distant future, tailored individual patient therapeutic plans are preassessed on individualized organoids or chips.

When considering most exciting developments of a new biomedical R&D value chain, we identified five elements: disease understanding, therapeutic-hypothesis generation, therapeutic-modality innovation, in silico and in vitro validation methods, and clinical and real-world evidence feedback. These elements convey the extent of the advancement being made. Some are well established; others haven’t yet been adopted broadly or fully validated, and it isn’t yet clear which will have most impact. Nevertheless, their collective power is indisputable (see sidebar “Five elements of new biomedical R&D: Signals and enablers”).

Disease understanding

The main factor impeding faster progress in developing more and better therapies for treating diseases—and preventing them—is limited understanding of the mechanisms that underly health and all the various manifestations of a disease. The mapping of the human genome 20 years ago was an important step forward, opening up many new avenues of research in human biology. However, genes are only a part of the broader puzzle of health and disease, and they don’t provide a complete-enough picture on their own to tackle most ailments.

Important advances today include novel experimental approaches, such as cell painting for the generation of vast amounts of in vitro data that AI can analyze, population-wide multiomic measurements (especially transcriptomics and proteomics), and anonymized electronic health records. These can help the healthcare industry simulate and better define both healthy and diseased states in humans more accurately, considering comorbidities, disease progression, and differences among individuals.

Therapeutic-hypothesis generation

A more holistic, data-driven understanding of disease paves the way for the systematic and scalable generation of therapeutic hypotheses. Scientists today tend to explore individual cell types or pathways related to a specific disease or biomarker in search of a breakthrough. Progress being made on three fronts will likely change this, facilitating the rapid exploration of data to unearth previously unknown biological interdependencies relevant to a disease and the rapid generation of hypotheses:

Better access to more data. Not only are there greater volumes of disease data and a greater variety of disease data, but there is also often good access to those data. This is thanks, in large measure, to the emergence of open-access databases. Genomic data, the structural and functional data of biomolecules, and screening data are all available on open-access databases, for example. Such data, if used carefully, can help scientists test hypotheses for repurposing existing drugs for known targets and for designing new ones.
Tech enablers. Cheap and abundant computing power, the emergence of quantum computing, and machine-learning methods are among the tools that help solve increasingly complex analytical tasks in biomedicine.
Automation of in silico hypothesis generation. Automating the generation of in silico hypotheses facilitates the high-throughput exploration of previously unconsidered correlations, not only between diseases and pathways but between diseases and a host of other factors, such as genes, nutrition, and behavior. It can also help debias hypotheses and improve those used in more established areas of science.

Therapeutic-modality innovation

Better disease understanding and advancement in scientific tech can lead to code-like therapeutics that are tailored specifically to a disease or patient. Several emerging modality platforms, including CRISPR-Cas9, mRNA, and RNAi, target the genetic code, for example. These types of therapeutics have the added advantage of being able to translate a biological problem into a biological model or a drug candidate quickly and thus accelerate inception.

Because of mRNA’s linear, code-like sequence, it’s easier to design and synthesize for testing of its effect on cancer, for example, than to identify and synthesize targeted antibodies or small molecule inhibitors. Engineered cells (such as immune and regenerative therapies), multifunctional modalities (such as antibody drug conjugates and proteolysis-targeting chimeras), and synthetic microorganisms (such as those that rebalance the gut microbiome) are among a list of many other emerging modalities. Modality innovation isn’t restricted to biologics—improved computational methods, for example, can lead to more precisely designed small molecules.

Meanwhile, advances in areas such as material science and synthetic biology will further improve existing modalities (through better delivery, more durability, or less immunogenicity, for example) or help develop new ones. And in the not-too-distant future, it might be possible to design, develop, and test personalized combinations of interventions that physicians today often only explore once patients are responding poorly to standard treatments. Such combinations—perhaps AI-assisted surgery followed by a prescribed drug, a digital therapeutic, and microbiome transplantation and an app connected to a wearable device to monitor the condition—would be carefully designed to maximize the synergies among them.

In silico and in vitro validation methods

Scientists today can generate rapid and high-throughput cell-on-chip or organ-on-chip testing models that replicate the genetic makeup of a patient or represent the cellular environment of a disease. Similarly, organoids re-create the 3-D environment of a human organ, potentially leading to more accurate outcomes than achieved through use of animal models, which can’t take account for all the biological differences among species, and more accurate outcomes than standardized cell lines, which don’t consider the broader environment of an organ.

The more scientists learn about a disease through in vitro models, the easier it becomes to design a predictive in silico model that reflects it. There could soon come a time when scientists will have sufficient data to train in silico models to predict not only molecular properties (such as toxicity, absorption, distribution, metabolism, and excretion) but immunogenicity and drug-microbiome interactions too. With time, the preclinical filtering of drug candidates could be increasingly performed in silico rather than animal or in vitro models, leading to higher throughput and lowering the risk associated with therapeutic development.

There could soon come a time when scientists will have sufficient data to train in silico models to predict not only molecular properties but immunogenicity and drug-microbiome interactions too.

Clinical and real-world evidence feedback

Tech facilitates the generation and collection of mass amounts of data. The more data about a disease that are accumulated through clinical trials of drug candidates, the more focused and precise future hypothesis generation and validation are likely to become. The same is true of data captured in electronic health records and other real-world data—the broad measurement of biomarkers and wearables that can generate data 24/7, for example. Such measurement can lead to more robust patient characterization, for instance, which can lead to more nuanced disease models. The maturation of computational methods, such as natural language processing, ensures that unstructured patient data from literature, not only new data, can be mined.

A new biomedical R&D paradigm

Today’s biomedical R&D value chain is often represented as a linear one, with a series of chevrons pointing forward to indicate how information gained at one stage in the chain informs subsequent ones in pursuit of a specific new treatment for a specific disease. Information does flow backward, too, and the research can have wider applications—the emergence of platform tech such as CRISPR, a versatile tool for validating research hypotheses and exploring disease biology that can serve directly as a therapeutic modality, being an example. But in the new paradigm, the process takes broader aim and is supercharged.

Tech not only uncovers insights at each new paradigm stage that the human brain alone might struggle to detect but also identifies interdependencies among them. It also ensures that data and insights flow automatically up and down the value chain much more freely and rapidly than is the case today. It’s an intensely more iterative and circular process than one that relies on humans to agree and initiate each iteration. The traditional, linear R&D process would be replaced by one that’s far more interconnected—a series of spinning wheels constantly feeding information rapidly back and forth (exhibit).

Future biomedical R&D will be an iterative process in which the insights from each step improve other cycles.

Ultimately, the goal of the new paradigm would be to feed and connect every data point captured and every insight gained into a single data vault. Algorithms could draw from that vault to improve understanding and treatment of many different diseases.

Better disease understanding, if not a redefinition of specific disease states, leads to far more accurate, scalable therapeutic hypotheses, which lead to many more highly tailored therapeutic modalities. A large share of the initial testing of those hypotheses can then be automated in silico and in vitro. At the same time, the backward flow of information reinforces progress, as lessons from each step in the process can directly improve all previous steps. The large volumes of data generated in vitro and the real world and analyzed in silico rapidly inform disease understanding, generate new hypotheses, and help develop new modality platforms.

Connecting the data: Use case for a new biomedical R&D paradigm

Imagine a drug development effort to treat a certain cardiovascular disease. In vitro tests to measure multiomics in different cell types—a routine process for any drug candidate to predict potential side effects early on—show that one drug candidate downregulates the expression or activity of several proteins in a pathway not previously considered. In parallel, web scraping by natural language processing reveals that this same pathway is correlated with cognitive functions, including memory. An algorithm finds and links these pieces of information and analyzes automatically scanned literature to ascertain whether the current understanding of dementia identifies any direct links between this pathway and neurodegenerative diseases that lead to dementia and, if so, whether inhibiting this protein via a drug or a gene knockdown alleviates any dementia-related symptoms.

No such link is discovered. However, data from clinical trials for cardiovascular disease targeting structurally similar proteins report common off-target effects, namely headaches. This prompts further exploration, and AI suggests a series of targeted CRISPR modifications in a brain organoid model as the quickest approach to validating or disproving the hypothesis that the novel pathway can be modulated to counteract neurodegeneration and to identifying the pathway protein most likely to be an effective drug target for dementia.

Upon successful validation, a molecular docking algorithm generates suggestions for selective antagonists against the prioritized protein that are then synthesized and investigated for treating different forms of dementia. This leads to a breakthrough in the fight against neurodegenerative diseases.

Elements of this approach are already in use, and enhancing feedback loops among them could create a self-learning model that can help address unmet patient needs faster and with fewer risks.

AI evaluation of the data might automatically suggest another round of in vitro testing, too, with refined experimental parameters or optimized therapeutic candidates. AI might even initiate the execution of those tests. For example, if in vitro testing showed that a drug candidate had weak binding affinity to a target, AI might compare the structure of the drug candidate with the target structure and come up with several ways in which the candidate could be improved, then go on to pick the most promising improvements, synthesize, and test them in a simulated clinical trial enabled by real-world data. With extensive use of AI and automation, the new R&D value chain could accelerate medical breakthroughs (see sidebar “Connecting the data: Use case for a new biomedical R&D paradigm”).

A new organization model for a new biomedical R&D paradigm

Companies are already working on projects that connect some of the different elements of the new biomedical R&D value chain with the help of biological advances, AI, and automation. However, no company, as far as we are aware, has systems in place that connect them all, making it possible to find and use relevant data wherever they might lie. Doing so will require new tech infrastructure to ensure that data are interconnected and machine legible and that quality improvement mechanisms (such as identification of false positives) are present. Yet organizational change will be required too.

In many organizations, early and late R&D are kept separate, with evidence from clinics and the real world only slowly feeding back to researchers. Researchers are often siloed in business units that are focused on a single therapeutic area, and there are often many parallel systems and taxonomies for different biology models. In the new paradigm, such rigid divisions may need to be softened to reflect a more connected R&D process, affecting the way that teams are constructed, the capabilities that they encompass, and the company’s innovation model.

Teams will likely need much broader scope to benefit from the fast exchange of information. They will be borderless, with capabilities that span every element of the R&D process. And while the teams will include subject specialists with deep expertise, they will also need multidiscipline experts able to understand the whole value chain and to harness the potential of both biology and tech—choosing the most reliable scientific approaches, for example, and assuring high-quality data.

New governance mechanisms will likely be required to allow R&D teams to move swiftly. Teams will need the authority to advance promising therapeutic candidates to the next stage (if not done automatically), identify and prioritize the ideas with the highest breakthrough potential, and determine budget allocations, for example. Slower, centralized decision-making processes could counter the gains made by automation and AI. The teams will also need the authority to draw on external expertise and capacity, as success will depend not on proprietary drugs, tech, and modality platforms alone but on algorithms, data sets, and digital solutions too.

Some of the necessary elements will be open-source assets; broadly accessible algorithms such as AlphaFold Protein Structure Database and various omics data sets are early examples of a trend toward open sourcing. But other assets will be owned by health-tech and data and analytics companies, forcing closer collaboration and partnerships. The extent of the expertise required across so many fields suggests that any single pharma company might struggle to develop all the required capabilities and tech in house.

Given this situation and the fast pace of change within R&D, companies may find that what works best is an open-architecture biomedical R&D innovation model—one where components such as data, algorithms, and validation methods can be seamlessly plugged in as required. It would be a creative innovation model—one that gives companies the flexibility to invest and deploy the best methods and the best solutions at the right point in the R&D process and at the right point in time.

The paradigm shift under way in biomedical R&D is of similar magnitude to that seen in the early 2000s with the introduction of the learn-and-confirm approach.¹ Before then, drug discovery and development ideas often weren’t systematically prioritized and validated until late-stage trials, leading to poor success rates. The learn-and-confirm model introduced more rigor into the process and higher-quality pipelines. The pipeline funnel didn’t fundamentally change, however. It remained sequential, with only a limited amount of the learning gained at later stages in the funnel informing earlier ones the next time.

Current tech advances are now disrupting that approach, shaping a less serendipitous, more deterministic, circular biomedical R&D value chain that’s propelled at speed by data feedback loops. The new paradigm is still evolving, and the end game unclear. The one we describe is only one potential way forward. However, it’s evident that the marriage of biological sciences with advances in data, automation, computing power, and AI will improve the traditional, reductionist approach to biomedical R&D and so improve patient outcomes. It’s a future worth preparing for.

Explore a career with us

Search Openings

Self-learning: The dawn of a new biomedical R&D paradigm

About the authors

Emerging solutions and approaches in biomedical R&D

Five elements of new biomedical R&D: Signals and enablers

Disease understanding

Therapeutic-hypothesis generation

Therapeutic-modality innovation

In silico and in vitro validation methods

Clinical and real-world evidence feedback

Disease understanding

Therapeutic-hypothesis generation

Therapeutic-modality innovation

In silico and in vitro validation methods

Clinical and real-world evidence feedback

A new biomedical R&D paradigm

Connecting the data: Use case for a new biomedical R&D paradigm

A new organization model for a new biomedical R&D paradigm

Explore a career with us

Related Articles

Better data for better therapies: The case for building health data platforms

Generating real-world evidence at scale using advanced analytics

Transforming biopharma R&D at scale