skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Improved prediction of MHC-peptide binding using protein language models
Major histocompatibility complex Class I (MHC-I) molecules bind to peptides derived from intracellular antigens and present them on the surface of cells, allowing the immune system (T cells) to detect them. Elucidating the process of this presentation is essential for regulation and potential manipulation of the cellular immune system. Predicting whether a given peptide binds to an MHC molecule is an important step in the above process and has motivated the introduction of many computational approaches to address this problem. NetMHCPan, a pan-specific model for predicting binding of peptides to any MHC molecule, is one of the most widely used methods which focuses on solving this binary classification problem using shallow neural networks. The recent successful results of Deep Learning (DL) methods, especially Natural Language Processing (NLP-based) pretrained models in various applications, including protein structure determination, motivated us to explore their use in this problem. Specifically, we consider the application of deep learning models pretrained on large datasets of protein sequences to predict MHC Class I-peptide binding. Using the standard performance metrics in this area, and the same training and test sets, we show that our models outperform NetMHCpan4.1, currently considered as the-state-of-the-art.  more » « less
Award ID(s):
2200052 1914792 1664644 2054251
PAR ID:
10487736
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
Frontiers
Date Published:
Journal Name:
Frontiers in Bioinformatics
Volume:
3
ISSN:
2673-7647
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The ability to accurately identify peptide ligands for a given major histocompatibility complex class I (MHC-I) molecule has immense value for targeted anticancer therapeutics. However, the highly polymorphic nature of the MHC-I protein makes universal prediction of peptide ligands challenging due to lack of experimental data describing most MHC-I variants. To address this challenge, we have developed a deep convolutional neural network, HLA-Inception, capable of predicting MHC-I peptide binding motifs using electrostatic properties of the MHC-I binding pocket. By approaching this immunological issue using molecular biophysics, we measure the impact of sidechain arrangement and topology on peptide binding, feature not captured by sequence-based MHC-I prediction methods. Through a combination of molecular modeling and simulation, 5821 MHC-I alleles were modeled, providing extensive coverage across human populations. Predicted peptide binding motifs fell into distinct clusters, each defined with different degrees of submotif heterogeneity. Peptide binding scores generated by HLA-Inception are strongly correlated with quantitative MHC-I binding data, indicating predicted peptides can be ranked, both within and between alleles. HLA-inception also showed high precision when predicting naturally presented peptides and can be used for rapid proteome-scale MHC-I peptide binding predictions. Finally, we show that the binding pocket diversity measured by HLA inception predicts response to checkpoint blockade. Citation Format: Eric A. Wilson, John Kevin Cava, Diego Chowell, Abhishek Singharoy, Karen S. Anderson. Protein structure-based modeling to improve MHC class I epitope predictions. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 5376. 
    more » « less
  2. Abstract MotivationMHC Class I protein plays an important role in immunotherapy by presenting immunogenic peptides to anti-tumor immune cells. The repertoires of peptides for various MHC Class I proteins are distinct, which can be reflected by their diverse binding motifs. To characterize binding motifs for MHC Class I proteins, in vitro experiments have been conducted to screen peptides with high binding affinities to hundreds of given MHC Class I proteins. However, considering tens of thousands of known MHC Class I proteins, conducting in vitro experiments for extensive MHC proteins is infeasible, and thus a more efficient and scalable way to characterize binding motifs is needed. ResultsWe presented a de novo generation framework, coined PepPPO, to characterize binding motif for any given MHC Class I proteins via generating repertoires of peptides presented by them. PepPPO leverages a reinforcement learning agent with a mutation policy to mutate random input peptides into positive presented ones. Using PepPPO, we characterized binding motifs for around 10 000 known human MHC Class I proteins with and without experimental data. These computed motifs demonstrated high similarities with those derived from experimental data. In addition, we found that the motifs could be used for the rapid screening of neoantigens at a much lower time cost than previous deep-learning methods. Availability and implementationThe software can be found in https://github.com/minrq/pMHC. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  3. George Bebis, Terry Gaasterland (Ed.)
    Major Histocompability Complex (MHC) Class I molecules provide a pathway for cells to present endogenous peptides to the immune system, allowing it to distinguish healthy cells from those infected by pathogens. Software tools based on neural networks such as NetMHC and NetMHCpan predict whether peptides will bind to variants of MHC molecules. These tools are trained with experimental data, consisting of the amino acid sequence of peptides and their observed binding strength. Such tools generally do not explicitly consider hydrophobicity, a significant biochemical factor relevant to peptide binding. It was observed that these tools predict that some highly hydrophobic peptides will be strong binders, which biochemical factors suggest is incorrect. This paper investigates the correlation of the hydrophobicity of 9-mer peptides with their predicted binding strength to the MHC variant HLA-A*0201 for these software tools. Two studies were performed, one using the data that the neural networks were trained on and the other using a sample of the human proteome. A significant bias within NetMHC-4.0 towards predicting highly hydrophobic peptides as strong binders was observed in both studies. This suggests that hydrophobicity should be included in the training data of the neural networks. Retraining the neural networks with such biochemical annotations of hydrophobicity could increase the accuracy of their predictions, increasing their impact in applications such as vaccine design and neoantigen identification. 
    more » « less
  4. The Class I Major Histocompatibility Complex (MHC) is a central protein in immunology as it binds to intracellular peptides and displays them at the cell surface for recognition by T-cells. The structural analysis of bound peptide-MHC complexes (pMHCs) holds the promise of interpretable and general binding prediction (i.e., testing whether a given peptide binds to a given MHC). However, structural analysis is limited in part by the difficulty in modelling pMHCs given the size and flexibility of the peptides that can be presented by MHCs. This article describes APE-Gen (Anchored Peptide-MHC Ensemble Generator), a fast method for generating ensembles of bound pMHC conformations. APE-Gen generates an ensemble of bound conformations by iterated rounds of (i) anchoring the ends of a given peptide near known pockets in the binding site of the MHC, (ii) sampling peptide backbone conformations with loop modelling, and then (iii) performing energy minimization to fix steric clashes, accumulating conformations at each round. APE-Gen takes only minutes on a standard desktop to generate tens of bound conformations, and we show the ability of APE-Gen to sample conformations found in X-ray crystallography even when only sequence information is used as input. APE-Gen has the potential to be useful for its scalability (i.e., modelling thousands of pMHCs or even non-canonical longer peptides) and for its use as a flexible search tool. We demonstrate an example for studying cross-reactivity. 
    more » « less
  5. Abstract Despite promising developments in computational tools, peptide‐class II MHC (MHCII) binding predictors continue to lag behind their peptide‐class I MHC counterparts. Consequently, peptide–MHCII binding is often evaluated experimentally using competitive binding assays, which tend to sacrifice throughput for quantitative binding detail. Here, we developed a high‐throughput semiquantitative peptide–MHCII screening strategy termed microsphere‐assisted peptide screening (MAPS) that aims to balance the accuracy of competitive binding assays with the throughput of computational tools. Using MAPS, we screened a peptide library from Zika virus envelope (E) protein for binding to four common MHCII alleles (DR1, DR4, DR7, DR15). Interestingly, MAPS revealed a significant overlap between peptides that promiscuously bind multiple MHCII alleles and antibody neutralization sites. This overlap was also observed for rotavirus outer capsid glycoprotein VP7, suggesting a deeper relationship between B cell and CD4+T cell specificity which can facilitate the design of broadly protective vaccines to Zika and other viruses. 
    more » « less