Machine learning identifies biomarkers that predict a patient’s response to immunotherapy

“Every day, new stories about cancer cures using ‘immunotherapy’ drugs appear online and in print. These treatments can cure even the most lethal cancers but only in a small fraction of patients,” explains Dr. Taran Gujral, an Associate Professor in the Human Biology Division. Currently, “we lack approaches that could help identify patients who will respond to immunotherapy,” adds Sid Vijay, a junior year computer science major at Columbia University. Vijay, who has been working with the Gujral Lab since he was in high school, applied machine learning to “develop an approach to accurately identify patients who will benefit from immunotherapy.” This study, co-authored by Vijay and Yuqi Kang of the Gujral Lab, was recently published iScience.

When the immune system attacks foreign invaders or potential threats, like viruses and cancers, it relies on a system of checkpoints to stop it from attacking normal, healthy cells. Many cancers are able to hide from the immune system by expressing proteins that disrupt these immune checkpoints. Recently, drugs that inhibit immune checkpoints have been developed that allow the immune system to now go after cancers that were previously in hiding. While a handful of these drugs were recently approved to treat specific cancers, it is difficult to identify individual patients with those cancers who will benefit from immune checkpoint targeted therapy. To address this challenge, Kang and Vijay sought to develop a computational program that could analyze patient gene expression data and identify critical genes that would accurately predict tumor susceptibility to immunotherapy. The program they developed, DeepGeneX, utilizes deep neural networks to use “molecular data from patients’ tumors to predict their drug response and clinical outcomes such as disease onset, severity, survival [or] disease reoccurrence,” Vijay states. These computational networks are “similar to neuron networks in our brain” as they “can capture complex, non-linear relationships in underlying data to estimate a predicted outcome. This neural network is the core component behind DeepGeneX,” he adds. In contrast to the black-box models of many deep neural networks, DeepGeneX possess a unique feature that sets it apart- an explainable algorithm. Here, this explainable algorithm is used to “extract the importance of genes in contributing to a clinical measurement” enabling “both the accuracy in prediction and the biological insights of how the model is working.”

To build and test their immunotherapy predictive program, the Gujral team utilized datasets from a recently published study on melanoma patients who received immunotherapy treatment. This study included single-cell gene expression datasets from 19 melanoma patients with varying responses to immunotherapy- classified by their responsiveness to immune checkpoint inhibitor treatment. The researchers began by using data from 18 patients to train their model and predict the remaining patient’s response to immunotherapy repeatedly for each patient, building an optimized model that could identify which genes could be used to indicate whether a patient’s cancer would respond to immunotherapy or not. Each gene was assigned a “gene importance score” which estimated the importance of a gene in contributing to whether a patient was responsive to immunotherapy treatment. After obtaining a rank ordered gene list, the authors took the top 1,000 genes and iteratively removed predicted 'unimportant' genes to narrow it down to just six genes (CCR7, SELL, GZMB, WARS, GZMH, and LGALS1). These six genes were used to build the final deep neural network model, DeepGeneX. All six genes were differentially expressed between immunotherapy “responders” and “non-responders”. Specifically, the research team identified two genes, SELL and CCR7 that were expressed at significantly higher levels in responders, while the other four genes were expressed at lower levels. Additionally, two genes with low expression in responders, LGALS1 and WARS were found highly expressed in a macrophage population of non-responders, suggesting that these macrophages could be a potential target for improving response to immune checkpoint therapies.

graphical abstract depicting molecular data and patient response input into deep neural network to predict patient response to immunotherapy — Overview of DeepGeneX- a neural network that utilizes molecular data and patient response outcomes to identify biomarkers predictive of how a patient will respond to immune checkpoint therapy. Image taken from original article.

To determine whether DeepGeneX-identified marker genes could predict a patient’s response to immunotherapy in other cancers, the authors compared the differential expression between responders and non-responders using single-cell gene expression data from basal cell carcinoma patients. They found that differential expression of four of the six genes (CCR7, SELL-responder high; WARS, LGALS1- non-responder high) in these datasets corresponded with the predicted treatment outcome, demonstrating the feasibility of applying their model to other datasets. The researchers then extended their model outside of skin cancer patients and assessed whether the expression pattern of the DeepGeneX biomarkers could predict patients’ overall survival across seventeen cancer types using the bulk gene expression and clinical data from The Cancer Genome Atlas. Patients with favorable expression of marker genes showed significantly better survival rates compared with those with unfavorable gene expression, supporting that their DeepGeneX-identified marker genes are applicable to a range of cancer types in addition to single-cell or bulk gene expression datasets.

Vijay explains that “there is a growing appreciation for the value of machine learning approaches in biomarker discovery, and yet predicting molecular determinants of clinical response remains difficult. DeepGeneX addresses a major bottleneck in the biomarker discovery process. We developed a non-linear, multilayer neural network to more accurately model patient tumor sequencing data to predict biomarkers of response to immunotherapy.” Furthermore, “DeepGeneX technology provides a broadly applicable approach to predicting clinical response to therapy and to identify novel biomarkers/gene signatures that underlie the response. With Next gen-sequencing data being collected often in hospitals today and in clinical trials, the DeepGeneX platform could allow oncologists to perform an in silico biomarker test by uploading patients’ molecular data and receiving clinical predictions on their patients,” the authors explained. Additionally, utilizing biomarkers to select appropriate patients to enroll in a clinical trial may improve the success of the tested treatment and FDA approval, an important factor for insurance coverage of treatment. However, only a small fraction of clinical trials actually utilize biomarkers to select participants, likely due to the limited number of biomarkers that have currently been identified and validated. Vijay states that “identifying biomarkers represents a tremendous clinical utility, but the lack of technology to identify biomarkers is apparent: this is where DeepGeneX comes in.”

Beyond this study, the Gujral team is working to apply DeepGeneX to other disease contexts including Alzheimer’s Disease (AD) and Meningioma. For Alzheimer’s, “we are looking to create an early diagnosis test driven by DeepGeneX that can predict AD using RNA-sequencing data – like a digital biomarker test.” Excitingly, preliminary results from profiling molecular data from over 2,000 patients shows that DeepGeneX “could predict the onset of AD with high accuracy.” In addition, the Gujral researchers are applying DeepGeneX to one of the largest clinical Meningioma datasets of over 1,000 patients.” Meningiomas are “the most common primary intracranial tumor, representing 39% of all primary brain tumors,” with a high percentage of recurrence after radiation and surgery. The Gujral team explains that a “major roadblock in meningioma treatment is identifying patients who will recur or progress post-treatment and identifying experimental therapies for these patients. We hope that DeepGeneX can fill this gap and help in identifying gene signatures/biomarkers for tumor recurrence and aid in identifying potential therapies. Overall, DeepGeneX is a broad platform that can be applied to build predictive models from any large-scale biological dataset.”

This work was supported by the National Science Foundation and the American Cancer Society.

Fred Hutch/University of Washington/Seattle Children's Cancer Consortium member Taran Gujral contributed to this research.

Kang Y, Vijay S, Gujral TS. 2022. Deep neural network modeling identifies biomarkers of response to immune-checkpoint therapy. iScience. 25(5):104228.