Patient stratification

How AI helps with patient stratification in the context of drug development and discovery.


Determining if a new drug or therapy is safe and effective for a total patient population, as opposed to a precision medicine that works only for a select subgroup, is the goal of patient stratification – a critically important layer in the drug development process. 

Patient stratification is a process by which patients are organized into different strata or blocks according to some established criteria (e.g. gender, ethnicity, medical history, biomarker combinations, socio-economic condition, employment or any other factor deemed relevant). These strata become representative of the subgroups of an actual patient population.

Patient stratification plays a central role in the clinical trial phase of drug development, giving researchers the opportunity to maximize responsiveness, eliminate bias, and otherwise manage the treatment process – ensuring every patient subgroup receives exposure or allocation to experimental treatments.

Download PDF

As an essential part of clinical trial design, patient stratification can help improve patient outcomes, reduce trial failure rates, lower costs and accelerate the development of novel targets. 

The VeriSIM Life advantage

VeriSIM Life’s groundbreaking drug decision engine, BIOiSIM®, uses AI-driven diagnostics and screening techniques to greatly enhance traditional patient stratification methods while addressing some of drug development’s most pressing pain points.

BIOiSIM® allows access, analysis and predictive use of its growing, proprietary, structure-related data lake for >3M+ compounds, >5000+ unique animal and human validation datasets, and physiological parameters of 196+ different subject populations.

✓ Data Privacy 

BIOiSIM® can create patient population representations based on virtual cohorts generated by the AI’s predictive capability. This mitigates issues of both data privacy and data integrity, as researchers no longer require access to large patient databases. The response of these virtual cohorts to different therapies are then measured and registered as independent data points, with a representative -omic profile generated for each cohort.

✓ Data Integrity

The BIOiSIM™ framework also combines thousands of validation data sets, multi-compartmental models, and its integrated AI/ML engine, to help ensure validity, quality and integrity of data curated by VSL and stored in its main modeling and simulation database. One of the tools created to aid in the curation effort is the database consistency check report (DBCR), which consists of a range of numbered checks, each reporting on a different and specific aspect of data validity and quality. In the event of an issue, checks may result in the invalidation of an entire source, preventing any data from those sources from contributing to model building, validation, or simulations.

✓ Biomarker Complexity

VeriSIM Life’s groundbreaking Translational Index™️ technology can be integrated across multiple biomarker types, used to evaluate and predict the efficacy and side effects of novel drug candidates in specific patient subpopulations with differing genetic, biomarker and demographic profiles. This allows for the development of highly differentiated patient stratification strategies and even companion diagnostics which track essential biomarker data during the development of new drugs. 

BIOiSIM, and its groundbreaking Translational Index™️ technology

Which advances only the most promising drug candidates through R&D to investigational new drug (IND) application, offers actionable insights of unprecedented value to the drug development industry.

Combining thousands of validation data sets, multi-compartmental models, and its integrated AI/ML engine, BIOiSIM® achieves superior physiological and biological relevance within three classes of therapeutics: small molecules, large molecules, and re-engineered viruses.

The BIOiSIM® platform features a
robust data lake foundation, integrating:

1 trillion potential compounds search space for de novo synthesis and structural screening

Physiological data from 7 different animal species, plus humans

Support for genomics data integration

More than 3,000,000 real compounds including proprietary data from multiple partnerships

Proprietary experimental data from scientific literature and other sources

Validation by real-world observed data

Proof of Value

Integrating AI/ML Models for Patient Stratification Leveraging Omics Dataset and Clinical Biomarkers from COVID-19 Patients


One of the greatest challenges during the COVID-19 pandemic, especially in resource-strained settings, was the early identification of individual patients at higher risk for adverse outcomes. However, to do so would necessitate intelligent risk-assessment tools that could predict a patient’s disease progression and recovery and suggest best-fit therapeutics for markedly reducing disease severity.


VeriSIM Life used AI- and ML-based patient stratification modeling linking omics and clinical biomarker datasets, focusing on COVID-19 patients. The ML model not only demonstrated that clinical features were enough of an indicator of COVID-19 severity and survival, but also inferred what clinical features were most impactful, creating a useful guide for clinicians to prioritize best-fit therapeutics for a given cohort of patients.


  • Clinical Data Acquisition - The goal was to summarize all available sources providing clinical and OMICs data for individual patients infected with SARS-CoV-2 and admitted to the healthcare institutions. 
  • Data Curation - The clinical dataset consisted of patient conditions, lab test results, and clinician reports including a Sequential Organ Failure Assessment (SOFA) score, used to predict ICU mortality based on lab results and clinical data.
  • Bioinformatics Methodology - Gene network analysis was conducted on gene expression data to identify significant correlation between co-expressed gene modules and patients’ disease severity, comorbidity, and clinical biomarkers.
  • Descriptor Analysis and Selection - For the initial dataset used in model training, VeriSIM Life collected various types of data such as patient condition, biomarkers, comorbidities, and therapy information. All features were then normalized and evaluated to discern relative importance to the respective target features (survival outcome and disease severity) using its ML infrastructure pipeline.
  • Model Training and Evaluation - Two different types of ML models were deployed for this project. Both classification models used the biomarkers information, comorbidities, and therapy information to predict either COVID-19 case severity or survival outcome.


A robust AI/ML-based model was created to stratify COVID-19 patients using OMICS, and clinical biomarker datasets, enabling accurate prediction of disease severity and outcomes. The accuracy of both models was 98.1% and 99.9%, respectively. Read the full paper.

VeriSIM Life demonstrated that patient stratification models, driven by AI/ML modeling, could be used to precisely identify the manifestation of clinical biomarkers, resulting in more accurate diagnoses and treatment options in the context of personalized medicine.

Biotech client
Case studies

Additional VeriSIM Life Case Studies & Content


Predicting Patient-Specific Drug Bioavailability with AI

Read the full article

How to Evolve from Traditional Model-Informed Drug Discovery & Development to an AI-Informed Approach

Read the full article
Evolve your pipeline

Bring better drugs to market, faster, with BIOiSIM®

Now you can accelerate the discovery of new therapies based on existing compounds with VeriSIM Life’s BIOiSIM® computational platform – purpose-built to decode chemistry and biology at scale. With the industry’s most generalistic AI platform, your innovation is no longer limited to experimental constraints.

Contact us today to schedule a demonstration of BIOiSIM®