Pathways for Successful AI Adoption in Drug Development

Pathways for Successful AI Adoption in Drug Development

Patterns and Principles to Guide Pharmaceutical Leaders


Szczepan W. Baran, VeriSIM Life, San Francisco, CA USA

Peter V. Henstock, Pfizer, Boston, MA, USA


The road to developing new drug treatments is paved with risk. Pharmaceutical companies working to bring novel therapies to market navigate a complex and expensive landscape – one recent survey tallied the cost of developing a single drug at $2.8B with average timelines clocking in at more than a decade. (Debleena Paul, 2021)[1] In addition to these hefty time and cost considerations, preclinical research doesn’t guarantee success in clinical settings – failure rates are hovering somewhere in the neighborhood of 90%. (Mullard, 2016)[2]

However, digital solutions in drug development have transformative potential when it comes to pharma research and development (R&D), especially in areas adopting advanced computational and modeling technologies like artificial intelligence (AI) and machine learning (ML).Though these are sometimes perceived to be in their early stages, AI-enabled applications and algorithms are already impacting drug discovery and development, with the power to find patterns lost by conventional data analysis– accelerating molecule design and testing, streamlining essential processes, improving chances of clinical success, and reducing costs throughout the development pipeline.

While we aren’t yet seeing a full industry-wide AI revolution in drug development, significant growth is already underway (global top-15 pharma companies have increased R&D investment in AI by nearly a quarter in the last 3 years), and interest from drugmakers has gained momentum as AI technologies are reshaping the notoriously complex and convoluted drug discovery landscape.

Figure 1: High preclinical failure rates and drug costs[3] could be materially offset by AI innovation.[4] Big pharma R&D investment has grown significantly. [5]Anticipated revenue generated by emerging AI companies will eclipse $2B in 2022.[6]

In this white paper, we will outline some of the ways biotechs and pharmas can refocus their efforts and evolve existing protocols to better advance the successful adoption of AI in drug discovery and development within their organizations. Specifically, we will explore

- Closing the translational gap between the bench and clinic through AI and ML

- Understanding the barriers to AI and ML adoption in biopharma settings

- Identifying better AI benchmarks for the drug development process

Part 1: AI/ML Applications in Drug Development

AI/ML refers to a class of technologies that can identify patterns, find relationships, and learn to predict values from data. Deep learning for image understanding, and natural language processing (NLP) for extracting information from tests are common AI technologies. Today, such solutions help bridge the translational gap between the bench and the clinic to address many of the challenges that persist in drug development – streamlining trial design and operations, improving accuracy of drug efficacy and safety, enhancing speed to market, and reducing costs along the way.

AI in Discovery & Development

As defined by the FDA, the drug development process from concept to market has five distinct phases: discovery & development; preclinical research; clinical research; FDA review; and FDA post-market drug safety monitoring.[7] Translation in drug development – the act of taking basic scientific findings from a laboratory setting and translating them as treatments into human applications in a clinical setting – routinely suffers from high attrition and low ROI. While it can take 4 or 5 years on average to identify viable candidate targets and small molecules in traditional drug R&D, new AI applications and platforms are proving their potential to improve and hasten the discovery and development process, particularly when it comes to translation. Here, NLP applications are being utilized to mine the literature for target selection. Recent advances in AI using deep learning and reinforcement learning in chemistry have proven faster for identifying small molecules, selecting candidates for synthesis and testing, and achieving multivariate optimal properties. These result in drug candidates with the best chance of success in subsequent preclinical validation and clinical trials.

Figure 2: Sourced from “AI in small-molecule drug discovery: a coming wave?” Nature Reviews Drug Discovery, Volume 21, March 2022[8]

As an example, Antontsev et al. (Antontsev, 2021)[9]developed and used an AI-integrated computational platform to improve translation and also predict early stage failures for small molecules. While most current mechanistic models tend to follow a “one compound, one model”approach, this greatly limits the accuracy of simulated results across various animal species, and forces extensive customization to re-fit preclinical and clinical physiologic parameters from different animal species, and for different drugs. This results in continuous custom model development and/or redesign, and impacts the predictive accuracy of these models, given that they “stand alone” without the benefit of reinforced learning. (Khotimchenko, 2022)[10]

In contrast, Antontsev et al. employed a hybrid approach integrating ML with mechanistic modeling allows for the simulation of pharmacokinetic (PK) results to optimize dosing and route of administration for various compounds. Hundreds of simulations across 21 small molecule compounds were run in parallel on a single computational framework composed of statistical mechanistic models and AI/ML algorithms, significantly reducing the time it took to generate outputs thereby shortening drug development cycle time. Ultimately, the authors curated a database in addition to datasets generated by the output of the combinatorial models serving as the foundation for more sophisticated AI/ML models that could be used to enable prediction of drug disposition and prioritize clinical candidates at earlier stages in development. (Mahara, 2019)[11]

AI models are also being applied for drug repurposing efforts to identify new indications for existing drugs. Combining genomics, electronic medical records, and scientific literature, AI can identify relationships between drugs, targets and diseases. As a result, AI and ML techniques are giving a second chance to failed drugs, speeding up access to new treatments, and increasing the likelihood that clinical development trials succeed. (Paul D, 2021)[12] Three notable, recent examples:

  1. AI and ML were utilized in the global search for repurposed antiviral drugs capable of inhibiting SARS-CoV-2, with extensive meta analysis of repurposed drugs to discover potential effective treatments (Kartikay Prasad, 2021)[13]    
  2. AI-integrated mechanistic modeling utilized known preclinical in-vitro and in-vivo datasets to accurately simulate systemic therapy disposition and site-of-action penetration of the CCBs and ACEi compounds to tissues implicated in COVID-19 pathogenesis (Chakravarty K, 2021)[14]
  3. Baricitinb, an existing rheumatoid arthritis drug from Eli Lilly, became the first approved immunomodulatory treatment for COVID-19; (R, 2022)[15] it was identified by applying AI link prediction methods to a knowledge graph mined from scientific literature (Richardson, 2020)[16]

AI in Trial Operations

Clinical trials produce vast quantities of operational data, oftentimes leaving companies to struggle with forming a comprehensive, global view of their findings due to data silos and disparate systems. (Ma C, 2021) (Gootjes-Dreesbach L, 2020)[17],[18] Predictive AI models, however, can be used to interpret clinical trial data and accelerate key aspects of trial operations – from identifying suitable patients, to increasing retention rates, to streamlining analytics and reporting.

Patient selection and recruitment is traditionally a time- and resource-intensive process, creating significant cost burdens for those running clinical trials and often leading to delays. (Hargreaves, 2016)[19] AI-enabled applications are helping speed and optimize this process by automating many parts of recruitment through mining, analysis, and the interpretation of multiple data sources. Examples of these applications include:

- parsing hospital medical records to find the most beneficial patient subpopulations

- analyzing real world data like social media content to find condition-specific cohorts

- identifying target locations

Conversational AI technology is also being used to reduce drop-off rates and improve overall trial retention and operations through real-time patient feedback, sentiment analysis, and other data-driven digital interventions and improvements. Chatbots can be used specifically to help reduce trial dropout – triggering an alert after an adverse event [20] (i.e., by scheduling a patient meeting with clinical research staff after a reaction). While human agents still play a key role in retention efforts (especially where clinical advice is due), AI-powered chatbots can enhance engagement, reduce friction, and supplement monitoring and management of trial participants.

AI in Personalized Medicine

In personalized medicine, where genetic or other biomarkers are used to make patient-centric treatment decisions, AI applications are enabling a data-driven approach to patient and disease stratification by augmenting clinician decision-making, diagnosis and prognostication. (Peiffer-Smadja N, 2020)[21] In these instances, AI can be used to help uncover relationships between indications and biomarkers, potentially identifying lead compounds which could have a higher chance of success during clinical development. Information specific to an individual patient is typically also considered such as clinical history, symptoms, lifestyle and environmental factors.

In contrast to more conventional hypothesis-driven biomedical research that is limited to clinical trial data, AI-driven research leverages large-scale clinical data collection and real world data (RWD) sources to find patterns and make predictions about disease progression in individuals. Sophisticated computation and inference modeling helps predict disease and treatment outcomes for individual patients and can also result in finding more diverse candidates for drug trials. (University, n.d.)[22]

With its pattern finding capability, AI has shown particular promise wit high-dimensional data that is more challenging for statistical analyses. For example, AI can not only estimate a patient’s cardiovascular risk using just diabetic retinopathy images of the eye fundus, but could accurately identify additional individual risk factors (i.e., sex, age, systolic blood pressure) from the combined data. (Pieszko K, 2021)[23]

AI in Bioprocessing

Bioprocessing refers to the creation of beneficial end products through the use of another living thing, for example, cells, viruses, or an entire organism (i.e. penicillin created from mold). Because biological sequences and disease mechanisms can be converted into data, AI can be used quite effectively in bioprocessing to identify patterns in even the most complex datasets for real-time process monitoring to help understand critical process parameters such as cell growth rates and efficiency.[24]

Pattern recognition, machine learning/data mining, natural language processing (NLP), and other AI techniques are making huge strides when it comes to improving processes, products, and success rates in bioprocessing. (Scheper T, 2021) (Narayanan H, 2020)[25],[26] In addition, novel AI applications are helping to improve performance of cell cultures and optimize the protein purification process. (Raper, 2020)[27] These innovations in AI-driven bioprocessing are helping drug development in a number of ways – from driving down development times, to reducing costs, increasing manufacturing safety and enabling market readiness. (Muenz, 2021)[28]

Part 2: Challenges in AI Adoption

An increasing number of pharmaceutical companies have turned toward AI-based solutions over the last few years—a trend that’s likely to continue. The global AI in drug discovery market size is expected to have grown from $0.79 billion in 2021 to $1.04 billion this year alone.[29] In achieving such growth, successful AI/ML companies have overcome many previous limitations that slowed the technology’s initial adoption and prevented a full-scale, industry-wide AI-powered drug discovery revolution.  

Data and Knowledge Silos

AI requires high-quality data sets to be successful. With the deep learning methods that achieve higher accuracies, there is a need for larger volumes of data. To achieve this, datasets are coming from any number of disparate sources: from within an organization's archives, from clinical partners, from an external university or industry partners. To spur AI-driven research with analysis and re-use of data, datasets should be centralized – for example, on a shared, secure and compliant platform. However, most of the larger clinical and medical data sets used in pharma R&D that have historically been decentralized and not normalized since goals were primarily analysis and archiving. The use cases for data have shifted to realize the value of AI/ML, and so must the data access and preparation.

Siloed data remains a formidable challenge in drug R&D. When data is under the control of one department, for example, or it's being stored and organized differently within small, specialized teams, data access and opportunities for secondary uses are limited. Medical data is notoriously difficult to collect and access, often due to various privacy laws and security issues. Electronic Healthcare Record (EHR) systems may also be incompatible between government providers. (Goldfarb, 2022)[30] Recent technologies such as data fabrics, data meshes, and knowledge graphs are providing new ways to connect heterogeneous datasets (Morrison, 2022)[31]and apply and apply AI/ML using recent graph neural networks.[32] As shared data trends increase through publication requirements, federated learning is opening opportunities for companies to maintain the privacy of their data, but develop shared AI/ML models across datasets. (Flores, 2021)[33]

Black Box Mistrust

Trust and acceptance have been hurdles to AI adoption, whether because of a lack of understanding around the science of AI, or as a result of generalized skepticism around AI methodologies and their conclusions. The historic trade-off between accuracy and explainability of AI has contributed to the perception of a black box problem within AI (the idea that although you can see the input and output of a machine-learning model, there’s not enough visibility into the processes at work in between). In many cases, there has been a tradeoff between simpler methods such as decision trees or regression that offer full insight on the inner workings, and more accurate but complicated methods from ensembles to deep learning. The AI community continues to address these challenges with growing fields in AI ethics and AI Explainability.  AI audits are now common if not required to ensure prediction accuracies do not vary along racial, ethnic, gender or other lines. SHAP, LIME and other methods provide insight into how models function and can explain each prediction, creating more trust in AI, as discussed in a later section.

Figure 3: Black box illustration

Replacement Fear

Because AI and ML often result in the automation of particular tasks within drug development (impacting manufacturing, clinical trials, supply chain, logistics and more), an unemployment myth has taken hold in some corners, driven by the fear that AI will eliminate jobs for humans. (Paul D, Artificial intelligence in drug discovery and development, 2021)[34]

While many AI systems do operate autonomously, an equal portion are being used to augment human intelligence rather than replace it. Particularly in drug development, “narrow AI” learning algorithms are being used to operationalize huge amounts of data for a very specific, sometimes, singular task (for example, in image recognition). [35],[36] In these cases, human intervention still remains essential for implementation, development, and operation of the AI platform. Beyond this, the use of AI to automate simpler tasks frees human expertise to tackle more complex and pressing problems.

Though fear of human replacement by way of AI is real, there’s a wealth of research documenting the ways AI is actually contributing to job growth across sectors. A recent PwC study found that automation-related job loss would in fact be offset by jobs created as a result of new AI technologies. (Hawksworth, 2018)[37] To that point, the World Economic Forum’s “Future of Jobs Report 2020,”estimated that though by 2025, 85 million jobs would be displaced by AI, another 97 million jobs would be created. (Zahidi, 2020)[38]

The AI Talent Wars

The promise of new AI solutions doesn’t mean the industry is ready with the kinds of skilled, specialized labor it needs to meet the increasing demand for AI-driven work. Recruiting, managing and retaining talent with AI expertise has become even more challenging as competition and demand for highly-skilled tech and data science employees intensifies across sectors. Additionally, data scientists and engineers often seek positions in technical industries rather than within traditionally medically-focused organizations like pharma or biopharma, with top AI talent more typically flowing into technology companies who pay top dollar for such skills. (Ziadeh, n.d.)[39] 

A challenge for pharma is that the skilled AI/ML talent invariably lacks deep knowledge of biology, chemistry, and the pharmaceutical industry. Some have argued that asking for such skills in addition to statistical/math and programming/computer science skills is equivalent to finding unicorns, made famous by Steve Geringer’s famous Data Science Venn Diagram.[40] However, this lack of subject matter expertise certainly limits the advance of solutions, and may be contributing to slower adoption of AI in drug development. The workaround solution is reskilling those skilled in medical sciences (such as their bench scientists and statisticians) in AI/ML. This is in line with predictions from the World Economic Forum that half of all workers will require either upskilling or reskilling to prepare for an AI-focused future within the next five years.[41]  However, since AI/ML is not only a deep but also a very broad field requiring diverse hard and soft skills, it is not clear whether this strategy is ideal.

Part 3: How to Benchmark AI

Organizations are assessing their AI initiatives by their ability to support existing projects, lower costs, achieve faster outcomes, reduce waste, or otherwise add value. Applying such Key Performance Indicators (KPIs) can help organizations gain traction in AI to build trust, gain buy-in, find the right partners, and move closer to fully realizing the benefits of AI/ML in drug research and development.

Explainable AI

AI/ML innovation has been sometimes met with skepticism within many industries, including pharma. Although AI models can reveal new insights and improve upon established methods, closely regulated industries like pharma are slow to embrace AI/ML methods without evidence of repeatability and insight into how these methods reached their conclusions. Interpretable AI, also known as Explainable AI, or XAI, is consequently gaining importance in settings where understanding how and why AI-driven platforms or systems arrived at their results is essential to proving their legitimacy to scientists, medical professionals and regulators. (Savage, 2022)[42] Explainable AI:

- Describes how a single concrete prediction or classification was made

- Reduces or “debugs” questionable results or behaviors that occurred during modeling

- Regulates a model’s behavior to avoid discrimination or bias (Jose Maria Lopez, 2021)[43] 

Explainable AI builds trust, helping secure top-down collaboration and buy-in from leadership. In addition to aiding transparency, these kinds of XAI insights can help data scientists improve and refine their models.

Figure 4: How Explainable AI contributes to adoption

Measuring ROI

AI implementation in drug discovery is complex and so is measuring success. Typically, Return on Investment (ROI) tracks the gains versus the losses generated by an investment based on the initial outlay or cost. Measuring ROI gives organizations a way to take stock of the profitability and efficiency of its investments and make informed decisions about future spending.

While time and cost are two important factors to consider when thinking about ROI, they are not the only endpoints for evaluating AI success. While the value of streamlining and automating processes can be readily assessed, the greater insights and value of predictions fromAI/ML are more difficult to assess. These improvements may impact time and cost, but may not be realized for some time. Furthermore, like software engineering, AI/ML projects themselves build internal AI/ML capabilities that can be applied to similar projects and set up future successes.

For better AI benchmarking, organizations should focus on measuring ROI in 6 areas:.  

Costs – Beyond the overall capital investment, be sure to include costs for the licenses, software and compute platforms, data, team members, 3rd party partners, security, and more.

Savings – Include a range of cost-reducing activities here from early stage through late stage R&D and not only the immediate income savings.

Soft profits – Look at areas you might not typically try to quantify, for example, productivity, product quality and other areas which might realize profits down the line.

Goals & Objectives – Return to your AIroadmap and see if your investment is still strategically aligned with stated goals and objectives.

Key Performance Indicators (KPIs) – Convene the right, appropriately skilled internal teams to set benchmarks and evaluate progress of AI initiatives and integrations within R&D pipelines.

Future profits – Consider any new revenues that may be eventually created downstream as a result of an upstream AI investment. (Kosterski, 2021)[44]

With the range of problems in pharma R&D, there are many opportunities for AI/ML but fewer massive wins that we see in other industries like an Uber platform or Netflix prize due to many project-and discipline-specific hurdles each drug program must overcome to succeed. It is important to note that the AI/ML technology landscape is also rapidly changing so unsolved NLP problems from 3 years ago may now be suddenly solvable. The roadmap and measurements of ROI success should factor the current successes of the AI/ML teams as well as their ability to rapidly rise to solve new problems.

Roadmap to AI Success

To meaningfully integrate AI within drug discovery requires strategic investments in data, technology, and talent. Organizations can put themselves in an optimal position for success by developing an outcome-based AI roadmap to help identify foundational project requirements, priorities and goals. Roadmapping AI success hinges on defining and communicating your vision effectively, setting realistic expectations that are aligned with business objectives, building the right team for the job, and arming the team with the appropriate platforms so you can deliver value throughout the drug development journey.

Some key areas to consider for the AI roadmap:

Goals & Objectives - Identify the strategic business goals and KPIs you are hoping to achieve by integrating AI into your drug discovery process such as lowering costs, or speeding time to market.

Use Cases – Align the AI use cases with business objectives. Otherwise, AI will be seen as non-essential and buy-in from decision-makers will be harder to achieve. This is a challenging area since it requires a shared understanding of the business problem and the technical capabilities of AI that neither side generally grasps in isolation. Some example use cases for AI-enabled drug development are: mapping novel disease pathways, optimizing clinical candidates, predicting drug properties, and identifying new targets and leads from image analysis.

Project Requirements - Ask and answer questions to determine AI’s utility in any given scenario and whether or not you have the team (people, partnerships) and tech (processes, technology) in place to meet those requirements. For example:

- What AI approach will you use?

- Do you have the right data (or a data advantage)?

- Do you have the in-house development capabilities?

- Are the appropriate in-house frameworks and compute capabilities in place?

- Do you need to identify/engage a third-party, or AI-native partner?

- Are there any ethical, legal, or regulatory issues surrounding the given use case?

Measurement - Think about how success will be objectively measured, including developing business metrics/KPIs and Return onInvestment (ROI). This area is challenging since, unlike a software project where success can be met by delivery, it depends not only on delivery but the nuances of the data with few ways of predicting the likelihood of success in advance.


With an expansive range of possible applications, AI is uniquely poised to help solve some of drug discovery’s most fundamental, vexing problems – including reducing development costs, driving process efficiencies, and providing meaningful improvements in countless aspects of drug R&D. To fully realize the game-changing, full potential ofAI-enabled drug development, the classical drug discovery process must evolve, including the ways in which we track and benchmark success and return on investment.

Through more open collaboration, clearer articulation of AI’s challenges and benefits, and setting more realistic performance measures, an AI-driven paradigm shift in drug discovery and development is not only possible but already underway in most pharma companies. Organizations must continue to invest today in the techniques and technologies that will be required to remain competitive in tomorrow’s AI-enabled drug discovery landscape.

Works Cited

Antontsev, V. J. (2021). A hybrid modeling approach for assessing mechanistic models of small molecule partitioning in vivo using a machine learning-integrated modeling platform. Sci Rep.

Chakravarty K, A. V. (2021). Accelerated Repurposing and Drug Development of Pulmonary Hypertension Therapies for COVID-19 Treatment Using an AI-Integrated Biosimulation Platform. Molecules, 26(7).

Debleena Paul, e. a. (2021). Artificial intelligence in drug discovery and development. Drug Discovery Today, 80-93.

Flores, M. (2021, September 15). Medical AI Needs Federated Learning, So Will Every Industry. Retrieved from Nvidia:

Glodfarb, A. T. (2022, March 9). Why is AI adoption in health care lagging? Retrieved from Brookings:

Gootjes-Dreesbach L, S. M.-A. (2020). Variational Autoencoder Modular Bayesian Networks for Simulation of Heterogeneous Clinical Study Data. Front Big Data.

Hargreaves, B. (2016, Mar 11). Clinical trials and their patients: The rising costs and how to stem the loss. Retrieved from Pharmafile:

Hawksworth, J. B. (2018). Will robots really steal our jobs? Retrieved from PWC:

Jose Maria Lopez, M. L. (2021, January 1). Ever heard of the AI black box problem? Retrieved from Worldline:

Kartikay Prasad, V. K. (2021). Artificial intelligence-driven drug repurposing and structural biology for SARS-CoV-2. Current Research in Pharmacology and Drug Discovery, Volume 2.

Khotimchenko, M. B. (2022). In Silico Development of Combinatorial Therapeutic Approaches Targeting Key Signaling Pathways in Metabolic Syndrome. Pharm Res 39, 2937–2950.

Kosterski, P. (2021, May 7). The ROI of AI. How to Understand the Commercial Dimension of Artificial Intelligence Projects? Retrieved from Nexocode:

Ma C, S. M.-S. (2021). Building a Harmonized Datamart by Integrating Cross-Institutional Systems of Clinical, Outcome, and Genomic Data: The Pediatric Patient Informatics Platform (PPIP). JCO Clin Cancer Inform.

Mahara, N. A. (2019). Entering the era of computationally driven drug development. Drug Metabolism Reviews 52 2, 283-298.

Morrison, A. (2022, June 5). Comparing data fabrics, data meshes and knowledge graphs. Retrieved from Data Science Central:

Muenz, R. (2021, May 28). A Brief Overview of Bioprocessing. Retrieved from Lab Manager:

Mullard, A. (2016). Parsing clinical success rates. Nat Rev Drug Discov 15, 447.

Narayanan H, L. M. (2020). Bioprocessing in the Digital Age: The Role of Process Models. Biotechnol J.

Paul D, S. G. (2021). Artificial intelligence in drug discovery and development. Drug Discov Today. 2021 Jan;26(1), 80-93.

Paul D, S. G. (2021). Artificial intelligence in drug discovery and development. Drug Discov Today, 80-93.

Peiffer-Smadja N, R. T. (2020). Machine learning for clinical decision support in infectious diseases: a narrative review of current applications. Clin Microbiol Infect, 584-595.

Pieszko K, H. J. (2021). Clinical applications of artificial intelligence in cardiology on the verge of the decade. Cardiol J, 460-472.

R, R. (2022). Baricitinib Is First Approved COVID-19 Immunomodulatory Treatment. JAMA.

Raper, V. (2020, August 3). Bioprocessing Warms to Artificial Intelligence. Retrieved from Genetic Engineering & Biotechnology News:

Richardson, P. G. (2020). Baricitinib as potential treatment for 2019-nCoV acute respiratory disease. The Lancet.

Savage, N. (2022, March 29). Breaking into the black box of artificial intelligence. Retrieved from Nature:

Scheper T, B. S. (2021). Digitalization and Bioprocessing: Promises and Challenges. Adv Biochem Eng Biotechnol, 57-69.

University, C. (n.d.). Advancing precision medicine using AI and big data. Retrieved from Nature:

Zahidi, S. R. (2020, October). The Future of Jobs Report 2020. Retrieved from World Economic Forum:

Ziadeh, A. (n.d.). Top 15 Companies in the Race to Hire AI Talent. Retrieved from GovCIO:

Learn more about VeriSIM Life’s BIOiSIM platform and unique Translational Index™️ technology.