Machine learning predicts HIV pseudovirus antibody neutralization

From the Gilbert group, Vaccine and Infectious Disease Division

The enormous genetic diversity between individual HIV-1 viruses presents a difficult hurdle for vaccine design, as the vaccine-elicited antibodies must neutralize a large range of HIV envelope (Env) sequences. Promisingly, broadly neutralizing antibodies (bnAbs) isolated from HIV-infected patients bind to conserved regions of the HIV Env, effectively neutralizing many different HIV viruses. Specifically, the VRC01 bnAb is known to successfully neutralize a large breadth of HIV viruses in vitro and is currently the center of the Antibody Mediated Prevention (AMP) trials, a phase-2 clinical trial run by the HIV Vaccine Trials Network and HIV Prevention Trials Network. The primary goal of AMP is to test the protection from HIV infection afforded by intravenous passive transfer VRC01 antibodies; however, a secondary objective is to determine how VRC01 protection efficacy varies by HIV envelope amino acid (AA) sequence variation. This latter question will be addressed after the conclusion of the AMP trials, using phenotypic and genetic sieve analyses, where the characteristics of the viruses that escape VRC01-mediated neutralization will be compared to the viruses that infected patients who were not given the VRC01 antibody. However, a vast number of AA combinations exist, and individually testing each position in the HIV envelope results in a low statistical power, given the high-dimensional multiplicity adjustment that would need to be performed.  This necessitates a method of selecting a subset of AA positions that will be considered in the primary sieve analysis. Craig Magaret and colleagues of the Gilbert group in the Vaccine Infectious Diseases Division recently performed this initial analysis and published their results in PLoS Computational Biology.

The first objective of this analysis was to create a predictive model that would predict the in vitro sensitivity of a given HIV-1 pseudovirus to VRC01.  Using existing data from experiments that measured the in vitro neutralization of VRC01 against various HIV pseudoviruses, the authors evaluated different machine learning models to predict the neutralization or resistance of each pseudovirus to VRC01.  Due to its consistent accuracy and advantageous theoretical properties, they chose Super Learner -- a machine learning method that amalgamates several component models -- as the best candidate for HIV Env neutralization predictions.

The authors compared the predictive performance of Super Learner across 13 pre-selected groups of Env AA input variables, to determine which set of features would be best preditive of VRC01 neutralization.  These site sets include VRC01 binding footprint, CD4 binding sites, sites with exposed surface area, and sites important for glycosylation.  The neutralization data was randomly split into two groups.  Each half of the sequences were used to train the predictive models, using ten-fold cross validation to assess model performance.  A final model was then created for each data set, and these final models were evaluated on the other half of the sequences as a "hold out set."  The authors found that they were able to consistently predict VRC01 neutralization with an area under the receiver operating characteristic (ROC) curve of 0.85 and above.

A) The performance of predicting VRC01 neutralization sensitivity on two different sets of viral neutralization data from the CATNAP database.  Using only HIV-1 viral envelope amino acid sequence features, VRC01 neutralization with a cross-validated and hold-out validated AUC exceeding 0.85 can be reproducibly predicted.  B and C) Amino acid frequencies at key positions in the HIV-1 envelope that are responsible for discriminating sensitivity (panel B) vs. resistance (panel C) to the broadly neutralizing antibody VRC01.  These figures demonstrate that minority residues tend to be entirely or strongly associated with resistance, and that no strong discriminating signal exists at the individual position level.  This implies that a multivariate predictor is necessary to effectively predict VRC01 neutralization.
A) The performance of predicting VRC01 neutralization sensitivity on two different sets of viral neutralization data from the CATNAP database.  Using only HIV-1 viral envelope amino acid sequence features, VRC01 neutralization with a cross-validated and hold-out validated AUC exceeding 0.85 can be reproducibly predicted. B and C) Amino acid frequencies at key positions in the HIV-1 envelope that are responsible for discriminating sensitivity (panel B) vs. resistance (panel C) to the broadly neutralizing antibody VRC01.  These figures demonstrate that minority residues tend to be entirely or strongly associated with resistance, and that no strong discriminating signal exists at the individual position level.  This implies that a multivariate predictor is necessary to effectively predict VRC01 neutralization. Figure provided by Craig Magaret.

In addition to predicting VRC01 neutralization, the authors sought to determine the predictive importance of HIV envelope features, so that the highly predictive features could be analyzed for "sieve effects" during the AMP sieve analysis.  Super Learner identified the most important features associated with sensitivity or resistance to VRC01 neutralization, most of which were the presence of specific residues within the CD4 binding footprint. Additionally, features such as the length of the gp120 glycoprotein or presence of more cysteines in Env predicted resistance.  This analysis yielded a ranking of individual Env AA features by the magnitude of their contribution to predict VRC01 neutralization.  The top features in this list will specify the primary features to be studied in the AMP sieve analysis, which increases the sieve analysis's statistical power (compared to looking at the entire HIV envelope).

Magaret puts this works into context, explaining that “[b]eing able to predict VRC01 neutralization sensitivity for an arbitrary HIV-1 viral envelope gives us a new tool to evaluate how well a monoclonal antibody can prevent HIV-1 infection.  This will enable us to answer novel questions from this new generation of large-scale, antibody-based HIV-1 prevention trials.” Additionally, this work is relevant outside the VRC01 field, as “this method can be easily applied to other monoclonal antibodies, and we are currently extending this method to work in the context of multi-antibody cocktails.”

This work was supported by the National Institute of Allergy and Infectious Diseases and the U.S. Public Health Service Grant.

UW/Fred Hutch Cancer Consortium members Noah Simon, Paul Edlefsen, and Peter Gilbert contributed to this work.

Magaret CA, Benkeser DC, Williamson BD, Borate BR, Carpp LN, Georgiev IS, Setliff I, Dingens AS, Simon N, Carone M, Simpkins C, Montefiori D, Galit A, Yu WH, Juraska M, Edlefsen PT, Karuna S, Mgodi NM, Edugupanti S, Gilbert PB. 2019. Prediction of VRC01 neutralization sensitivity by HIV-1 gp160 sequence features. PLoS Computational Biology. e1006952. doi: 10.1371/journal.pcbi.1006952. eCollection 2019 Apr.