skip to main content


Title: Binding peptide generation for MHC Class I proteins with deep reinforcement learning
Abstract Motivation

MHC Class I protein plays an important role in immunotherapy by presenting immunogenic peptides to anti-tumor immune cells. The repertoires of peptides for various MHC Class I proteins are distinct, which can be reflected by their diverse binding motifs. To characterize binding motifs for MHC Class I proteins, in vitro experiments have been conducted to screen peptides with high binding affinities to hundreds of given MHC Class I proteins. However, considering tens of thousands of known MHC Class I proteins, conducting in vitro experiments for extensive MHC proteins is infeasible, and thus a more efficient and scalable way to characterize binding motifs is needed.

Results

We presented a de novo generation framework, coined PepPPO, to characterize binding motif for any given MHC Class I proteins via generating repertoires of peptides presented by them. PepPPO leverages a reinforcement learning agent with a mutation policy to mutate random input peptides into positive presented ones. Using PepPPO, we characterized binding motifs for around 10 000 known human MHC Class I proteins with and without experimental data. These computed motifs demonstrated high similarities with those derived from experimental data. In addition, we found that the motifs could be used for the rapid screening of neoantigens at a much lower time cost than previous deep-learning methods.

Availability and implementation

The software can be found in https://github.com/minrq/pMHC.

Supplementary information

Supplementary data are available at Bioinformatics online.

 
more » « less
Award ID(s):
2133650
NSF-PAR ID:
10396072
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
39
Issue:
2
ISSN:
1367-4811
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Human leukocyte antigen class I (HLA-I) molecules bind and present peptides at the cell surface to facilitate the induction of appropriate CD8+ T cell-mediated immune responses to pathogen- and self-derived proteins. The HLA-I peptide-binding cleft contains dominant anchor sites in the B and F pockets that interact primarily with amino acids at peptide position 2 and the C-terminus, respectively. Nonpocket peptide–HLA interactions also contribute to peptide binding and stability, but these secondary interactions are thought to be unique to individual HLA allotypes or to specific peptide antigens. Here, we show that two positively charged residues located near the top of peptide-binding cleft facilitate interactions with negatively charged residues at position 4 of presented peptides, which occur at elevated frequencies across most HLA-I allotypes. Loss of these interactions was shown to impair HLA-I/peptide binding and complex stability, as demonstrated by both in vitro and in silico experiments. Furthermore, mutation of these Arginine-65 (R65) and/or Lysine-66 (K66) residues in HLA-A*02:01 and A*24:02 significantly reduced HLA-I cell surface expression while also reducing the diversity of the presented peptide repertoire by up to 5-fold. The impact of the R65 mutation demonstrates that nonpocket HLA-I/peptide interactions can constitute anchor motifs that exert an unexpectedly broad influence on HLA-I-mediated antigen presentation. These findings provide fundamental insights into peptide antigen binding that could broadly inform epitope discovery in the context of viral vaccine development and cancer immunotherapy.

     
    more » « less
  2. Abstract Background

    Cells and organisms typically cannot survive in the absence of water. However, some animals including nematodes, tardigrades, rotifers, and some arthropods are able to survive near-complete desiccation. One class of proteins known to play a role in desiccation tolerance is the late embryogenesis abundant (LEA) proteins. These largely disordered proteins protect plants and animals from desiccation. A multitude of studies have characterized stress-protective capabilities of LEA proteins in vitro and in heterologous systems. However, the extent to which LEA proteins exhibit such functions in vivo, in their native contexts in animals, is unclear. Furthermore, little is known about the distribution of LEA proteins in multicellular organisms or tissue-specific requirements in conferring stress protection. Here, we used the nematodeC. elegansas a model to study the endogenous function of an LEA protein in an animal.

    Results

    We created a null mutant ofC. elegansLEA-1, as well as endogenous fluorescent reporters of the protein. LEA-1 mutant animals formed defective dauer larvae at high temperature. We confirmed thatC. eleganslacking LEA-1 are sensitive to desiccation. LEA-1 mutants were also sensitive to heat and osmotic stress and were prone to protein aggregation. During desiccation, LEA-1 expression increased and became more widespread throughout the body. LEA-1 was required at high levels in body wall muscle for animals to survive desiccation and osmotic stress, but expression in body wall muscle alone was not sufficient for stress resistance, indicating a likely requirement in multiple tissues. We identified minimal motifs withinC. elegansLEA-1 that were sufficient to increase desiccation survival ofE. coli. To test whether such motifs are central to LEA-1’s in vivo functions, we then replaced the sequence oflea-1with these minimal motifs and found thatC. elegansdauer larvae formed normally and survived osmotic stress and mild desiccation at the same levels as worms with the full-length protein.

    Conclusions

    Our results provide insights into the endogenous functions and expression dynamics of an LEA protein in a multicellular animal. The results show that LEA-1 buffers animals from a broad range of stresses. Our identification of LEA motifs that can function in both bacteria and in a multicellular organism in vivo suggests the possibility of engineering LEA-1-derived peptides for optimized desiccation protection.

     
    more » « less
  3. The Class I Major Histocompatibility Complex (MHC) is a central protein in immunology as it binds to intracellular peptides and displays them at the cell surface for recognition by T-cells. The structural analysis of bound peptide-MHC complexes (pMHCs) holds the promise of interpretable and general binding prediction (i.e., testing whether a given peptide binds to a given MHC). However, structural analysis is limited in part by the difficulty in modelling pMHCs given the size and flexibility of the peptides that can be presented by MHCs. This article describes APE-Gen (Anchored Peptide-MHC Ensemble Generator), a fast method for generating ensembles of bound pMHC conformations. APE-Gen generates an ensemble of bound conformations by iterated rounds of (i) anchoring the ends of a given peptide near known pockets in the binding site of the MHC, (ii) sampling peptide backbone conformations with loop modelling, and then (iii) performing energy minimization to fix steric clashes, accumulating conformations at each round. APE-Gen takes only minutes on a standard desktop to generate tens of bound conformations, and we show the ability of APE-Gen to sample conformations found in X-ray crystallography even when only sequence information is used as input. APE-Gen has the potential to be useful for its scalability (i.e., modelling thousands of pMHCs or even non-canonical longer peptides) and for its use as a flexible search tool. We demonstrate an example for studying cross-reactivity. 
    more » « less
  4. Abstract Motivation

    metal-binding proteins have a central role in maintaining life processes. Nearly one-third of known protein structures contain metal ions that are used for a variety of needs, such as catalysis, DNA/RNA binding, protein structure stability, etc. Identifying metal-binding proteins is thus crucial for understanding the mechanisms of cellular activity. However, experimental annotation of protein metal-binding potential is severely lacking, while computational techniques are often imprecise and of limited applicability.

    Results

    we developed a novel machine learning-based method, mebipred, for identifying metal-binding proteins from sequence-derived features. This method is over 80% accurate in recognizing proteins that bind metal ion-containing ligands; the specific identity of 11 ubiquitously present metal ions can also be annotated. mebipred is reference-free, i.e. no sequence alignments are involved, and is thus faster than alignment-based methods; it is also more accurate than other sequence-based prediction methods. Additionally, mebipred can identify protein metal-binding capabilities from short sequence stretches, e.g. translated sequencing reads, and, thus, may be useful for the annotation of metal requirements of metagenomic samples. We performed an analysis of available microbiome data and found that ocean, hot spring sediments and soil microbiomes use a more diverse set of metals than human host-related ones. For human microbiomes, physiological conditions explain the observed metal preferences. Similarly, subtle changes in ocean sample ion concentration affect the abundance of relevant metal-binding proteins. These results highlight mebipred’s utility in analyzing microbiome metal requirements.

    Availability and implementation

    mebipred is available as a web server at services.bromberglab.org/mebipred and as a standalone package at https://pypi.org/project/mymetal/.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Abstract Motivation

    Anti-cancer peptides (ACPs) have recently emerged as promising therapeutic agents for cancer treatment. Due to the avalanche of protein sequence data in the post-genomic era, there is an urgent need to develop automated computational methods to enable fast and accurate identification of novel ACPs within the vast number of candidate proteins and peptides.

    Results

    To address this, we propose a novel predictor named Anti-Cancer peptide Predictor with Feature representation Learning (ACPred-FL) for accurate prediction of ACPs based on sequence information. More specifically, we develop an effective feature representation learning model, with which we can extract and learn a set of informative features from a pool of support vector machine-based models trained using sequence-based feature descriptors. By doing so, the class label information of data samples is fully utilized. To improve the feature representation, we further employ a two-step feature selection technique, resulting in a most informative five-dimensional feature vector for the final peptide representation. Experimental results show that such five features provide the most discriminative power for identifying ACPs than currently available feature descriptors, highlighting the effectiveness of the proposed feature representation learning approach. The developed ACPred-FL method significantly outperforms state-of-the-art methods.

    Availability and implementation

    The web-server of ACPred-FL is available at http://server.malab.cn/ACPred-FL.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less