skip to main content


Title: Patterns of amino acid conservation in human and animal immunodeficiency viruses
Abstract Motivation

Due to their high genomic variability, RNA viruses and retroviruses present a unique opportunity for detailed study of molecular evolution. Lentiviruses, with HIV being a notable example, are one of the best studied viral groups: hundreds of thousands of sequences are available together with experimentally resolved three-dimensional structures for most viral proteins. In this work, we use these data to study specific patterns of evolution of the viral proteins, and their relationship to protein interactions and immunogenicity.

Results

We propose a method for identification of two types of surface residues clusters with abnormal conservation: extremely conserved and extremely variable clusters. We identify them on the surface of proteins from HIV and other animal immunodeficiency viruses. Both types of clusters are overrepresented on the interaction interfaces of viral proteins with other proteins, nucleic acids or low molecular-weight ligands, both in the viral particle and between the virus and its host. In the immunodeficiency viruses, the interaction interfaces are not more conserved than the corresponding proteins on an average, and we show that extremely conserved clusters coincide with protein–protein interaction hotspots, predicted as the residues with the largest energetic contribution to the interaction. Extremely variable clusters have been identified here for the first time. In the HIV-1 envelope protein gp120, they overlap with known antigenic sites. These antigenic sites also contain many residues from extremely conserved clusters, hence representing a unique interacting interface enriched both in extremely conserved and in extremely variable clusters of residues. This observation may have important implication for antiretroviral vaccine development.

Availability and Implementation

A Python package is available at https://bioinf.mpi-inf.mpg.de/publications/viral-ppi-pred/

Contact

voitenko@mpi-inf.mpg.de or kalinina@mpi-inf.mpg.de

Supplementary information

Supplementary data are available at Bioinformatics online.

 
more » « less
NSF-PAR ID:
10394850
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
32
Issue:
17
ISSN:
1367-4803
Page Range / eLocation ID:
p. i685-i692
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Human immunodeficiency virus type 1 (HIV-1) genome integration is closely related to clinical latency and viral rebound. In addition to human DNA sequences that directly interact with the integration machinery, the selection of HIV integration sites has also been shown to depend on the heterogeneous genomic context around a large region, which greatly hinders the prediction and mechanistic studies of HIV integration.

    Results

    We have developed an attention-based deep learning framework, named DeepHINT, to simultaneously provide accurate prediction of HIV integration sites and mechanistic explanations of the detected sites. Extensive tests on a high-density HIV integration site dataset showed that DeepHINT can outperform conventional modeling strategies by automatically learning the genomic context of HIV integration from primary DNA sequence alone or together with epigenetic information. Systematic analyses on diverse known factors of HIV integration further validated the biological relevance of the prediction results. More importantly, in-depth analyses of the attention values output by DeepHINT revealed intriguing mechanistic implications in the selection of HIV integration sites, including potential roles of several DNA-binding proteins. These results established DeepHINT as an effective and explainable deep learning framework for the prediction and mechanistic study of HIV integration.

    Availability and implementation

    DeepHINT is available as an open-source software and can be downloaded from https://github.com/nonnerdling/DeepHINT.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. null (Ed.)
    The bovine immune system is known for its unusual traits relating to immunoglobulin and antiviral responses. Peptidylarginine deiminases (PADs) are phylogenetically conserved enzymes that cause post-translational deimination, contributing to protein moonlighting in health and disease. PADs also regulate extracellular vesicle (EV) release, forming a critical part of cellular communication. As PAD-mediated mechanisms in bovine immunology and physiology remain to be investigated, this study profiled deimination signatures in serum and serum-EVs in Bos taurus. Bos EVs were poly-dispersed in a 70–500 nm size range and showed differences in deiminated protein cargo, compared with whole sera. Key immune, metabolic and gene regulatory proteins were identified to be post-translationally deiminated with some overlapping hits in sera and EVs (e.g., immunoglobulins), while some were unique to either serum or serum-EVs (e.g., histones). Protein–protein interaction network analysis of deiminated proteins revealed KEGG pathways common for serum and serum-EVs, including complement and coagulation cascades, viral infection (enveloped viruses), viral myocarditis, bacterial and parasitic infections, autoimmune disease, immunodeficiency intestinal IgA production, B-cell receptor signalling, natural killer cell mediated cytotoxicity, platelet activation and hematopoiesis, alongside metabolic pathways including ferroptosis, vitamin digestion and absorption, cholesterol metabolism and mineral absorption. KEGG pathways specific to EVs related to HIF-1 signalling, oestrogen signalling and biosynthesis of amino acids. KEGG pathways specific for serum only, related to Epstein–Barr virus infection, transcription mis-regulation in cancer, bladder cancer, Rap1 signalling pathway, calcium signalling pathway and ECM-receptor interaction. This indicates differences in physiological and pathological pathways for deiminated proteins in serum-EVs, compared with serum. Our findings may shed light on pathways underlying a number of pathological and anti-pathogenic (viral, bacterial, parasitic) pathways, with putative translatable value to human pathologies, zoonotic diseases and development of therapies for infections, including anti-viral therapies. 
    more » « less
  3. Abstract

    Understanding the molecular evolution of the SARS‐CoV‐2 virus as it continues to spread in communities around the globe is important for mitigation and future pandemic preparedness. Three‐dimensional structures of SARS‐CoV‐2 proteins and those of other coronavirusess archived in the Protein Data Bank were used to analyze viral proteome evolution during the first 6 months of the COVID‐19 pandemic. Analyses of spatial locations, chemical properties, and structural and energetic impacts of the observed amino acid changes in >48 000 viral isolates revealed how each one of 29 viral proteins have undergone amino acid changes. Catalytic residues in active sites and binding residues in protein–protein interfaces showed modest, but significant, numbers of substitutions, highlighting the mutational robustness of the viral proteome. Energetics calculations showed that the impact of substitutions on the thermodynamic stability of the proteome follows a universal bi‐Gaussian distribution. Detailed results are presented for potential drug discovery targets and the four structural proteins that comprise the virion, highlighting substitutions with the potential to impact protein structure, enzyme activity, and protein–protein and protein–nucleic acid interfaces. Characterizing the evolution of the virus in three dimensions provides testable insights into viral protein function and should aid in structure‐based drug discovery efforts as well as the prospective identification of amino acid substitutions with potential for drug resistance.

     
    more » « less
  4. null (Ed.)
    Abstract Recombination has been shown to contribute to human immunodeficiency virus-1 (HIV-1) evolution in vivo, but the underlying dynamics are extremely complex, depending on the nature of the fitness landscapes and of epistatic interactions. A less well-studied determinant of recombinant evolution is the mode of virus transmission in the cell population. HIV-1 can spread by free virus transmission, resulting largely in singly infected cells, and also by direct cell-to-cell transmission, resulting in the simultaneous infection of cells with multiple viruses. We investigate the contribution of these two transmission pathways to recombinant evolution, by applying mathematical models to in vitro experimental data on the growth of fluorescent reporter viruses under static conditions (where both transmission pathways operate), and under gentle shaking conditions, where cell-to-cell transmission is largely inhibited. The parameterized mathematical models are then used to extrapolate the viral evolutionary dynamics beyond the experimental settings. Assuming a fixed basic reproductive ratio of the virus (independent of transmission pathway), we find that recombinant evolution is fastest if virus spread is driven only by cell-to-cell transmission and slows down if both transmission pathways operate. Recombinant evolution is slowest if all virus spread occurs through free virus transmission. This is due to cell-to-cell transmission 1, increasing infection multiplicity; 2, promoting the co-transmission of different virus strains from cell to cell; and 3, increasing the rate at which point mutations are generated as a result of more reverse transcription events. This study further resulted in the estimation of various parameters that characterize these evolutionary processes. For example, we estimate that during cell-to-cell transmission, an average of three viruses successfully integrated into the target cell, which can significantly raise the infection multiplicity compared to free virus transmission. In general, our study points towards the importance of infection multiplicity and cell-to-cell transmission for HIV evolution. 
    more » « less
  5. Three protein targets from SARS-CoV-2, the viral pathogen that causes COVID-19, are studied: the main protease, the 2′-O-RNA methyltransferase, and the nucleocapsid (N) protein. For the main protease, the nucleophilicity of the catalytic cysteine C145 is enabled by coupling to three histidine residues, H163 and H164 and catalytic dyad partner H41. These electrostatic couplings enable significant population of the deprotonated state of C145. For the RNA methyltransferase, the catalytic lysine K6968 that serves as a Brønsted base has significant population of its deprotonated state via strong coupling with K6844 and Y6845. For the main protease, Partial Order Optimum Likelihood (POOL) predicts two clusters of biochemically active residues; one includes the catalytic H41 and C145 and neighboring residues. The other surrounds a second pocket adjacent to the catalytic site and includes S1 residues F140, L141, H163, E166, and H172 and also S2 residue D187. This secondary recognition site could serve as an alternative target for the design of molecular probes. From in silico screening of library compounds, ligands with predicted affinity for the secondary site are reported. For the NSP16-NSP10 complex that comprises the RNA methyltransferase, three different sites are predicted. One is the catalytic core at the conserved K-D-K-E motif that includes catalytic residues D6928, K6968, and E7001 plus K6844. The second site surrounds the catalytic core and consists of Y6845, C6849, I6866, H6867, F6868, V6894, D6895, D6897, I6926, S6927, Y6930, and K6935. The third is located at the heterodimer interface. Ligands predicted to have high affinity for the first or second sites are reported. Three sites are also predicted for the nucleocapsid protein. This work uncovers key interactions that contribute to the function of the three viral proteins and also suggests alternative sites for ligand design. 
    more » « less