skip to main content

Title: Interpretable network propagation with application to expanding the repertoire of human proteins that interact with SARS-CoV-2
Abstract Background

Network propagation has been widely used for nearly 20 years to predict gene functions and phenotypes. Despite the popularity of this approach, little attention has been paid to the question of provenance tracing in this context, e.g., determining how much any experimental observation in the input contributes to the score of every prediction.


We design a network propagation framework with 2 novel components and apply it to predict human proteins that directly or indirectly interact with SARS-CoV-2 proteins. First, we trace the provenance of each prediction to its experimentally validated sources, which in our case are human proteins experimentally determined to interact with viral proteins. Second, we design a technique that helps to reduce the manual adjustment of parameters by users. We find that for every top-ranking prediction, the highest contribution to its score arises from a direct neighbor in a human protein-protein interaction network. We further analyze these results to develop functional insights on SARS-CoV-2 that expand on known biology such as the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents.


We examine how our provenance-tracing method can be generalized to a broad class of network-based algorithms. We provide a useful resource for the SARS-CoV-2 community that implicates more » many previously undocumented proteins with putative functional relationships to viral infection. This resource includes potential drugs that can be opportunistically repositioned to target these proteins. We also discuss how our overall framework can be extended to other, newly emerging viruses.

« less
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  
Award ID(s):
1759858 1817736 2029543
Publication Date:
Journal Name:
Oxford University Press
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The rampant spread of COVID-19, an infectious disease caused by SARS-CoV-2, all over the world has led to over millions of deaths, and devastated the social, financial and political entities around the world. Without an existing effective medical therapy, vaccines are urgently needed to avoid the spread of this disease. In this study, we propose an in silico deep learning approach for prediction and design of a multi-epitope vaccine (DeepVacPred). By combining the in silico immunoinformatics and deep neural network strategies, the DeepVacPred computational framework directly predicts 26 potential vaccine subunits from the available SARS-CoV-2 spike protein sequence. We further use in silico methods to investigate the linear B-cell epitopes, Cytotoxic T Lymphocytes (CTL) epitopes, Helper T Lymphocytes (HTL) epitopes in the 26 subunit candidates and identify the best 11 of them to construct a multi-epitope vaccine for SARS-CoV-2 virus. The human population coverage, antigenicity, allergenicity, toxicity, physicochemical properties and secondary structure of the designed vaccine are evaluated via state-of-the-art bioinformatic approaches, showing good quality of the designed vaccine. The 3D structure of the designed vaccine is predicted, refined and validated by in silico tools. Finally, we optimize and insert the codon sequence into a plasmid to ensure themore »cloning and expression efficiency. In conclusion, this proposed artificial intelligence (AI) based vaccine discovery framework accelerates the vaccine design process and constructs a 694aa multi-epitope vaccine containing 16 B-cell epitopes, 82 CTL epitopes and 89 HTL epitopes, which is promising to fight the SARS-CoV-2 viral infection and can be further evaluated in clinical studies. Moreover, we trace the RNA mutations of the SARS-CoV-2 and ensure that the designed vaccine can tackle the recent RNA mutations of the virus.

    « less
  2. Abstract Predicting protein properties from amino acid sequences is an important problem in biology and pharmacology. Protein–protein interactions among SARS-CoV-2 spike protein, human receptors and antibodies are key determinants of the potency of this virus and its ability to evade the human immune response. As a rapidly evolving virus, SARS-CoV-2 has already developed into many variants with considerable variation in virulence among these variants. Utilizing the proteomic data of SARS-CoV-2 to predict its viral characteristics will, therefore, greatly aid in disease control and prevention. In this paper, we review and compare recent successful prediction methods based on long short-term memory (LSTM), transformer, convolutional neural network (CNN) and a similarity-based topological regression (TR) model and offer recommendations about appropriate predictive methodology depending on the similarity between training and test datasets. We compare the effectiveness of these models in predicting the binding affinity and expression of SARS-CoV-2 spike protein sequences. We also explore how effective these predictive methods are when trained on laboratory-created data and are tasked with predicting the binding affinity of the in-the-wild SARS-CoV-2 spike protein sequences obtained from the GISAID datasets. We observe that TR is a better method when the sample size is small and test protein sequencesmore »are sufficiently similar to the training sequence. However, when the training sample size is sufficiently large and prediction requires extrapolation, LSTM embedding and CNN-based predictive model show superior performance.« less
  3. Abstract Background

    SARS-CoV-2 is an RNA virus responsible for the coronavirus disease 2019 (COVID-19) pandemic. Viruses exist in complex microbial environments, and recent studies have revealed both synergistic and antagonistic effects of specific bacterial taxa on viral prevalence and infectivity. We set out to test whether specific bacterial communities predict SARS-CoV-2 occurrence in a hospital setting.


    We collected 972 samples from hospitalized patients with COVID-19, their health care providers, and hospital surfaces before, during, and after admission. We screened for SARS-CoV-2 using RT-qPCR, characterized microbial communities using 16S rRNA gene amplicon sequencing, and used these bacterial profiles to classify SARS-CoV-2 RNA detection with a random forest model.


    Sixteen percent of surfaces from COVID-19 patient rooms had detectable SARS-CoV-2 RNA, although infectivity was not assessed. The highest prevalence was in floor samples next to patient beds (39%) and directly outside their rooms (29%). Although bed rail samples more closely resembled the patient microbiome compared to floor samples, SARS-CoV-2 RNA was detected less often in bed rail samples (11%). SARS-CoV-2 positive samples had higher bacterial phylogenetic diversity in both human and surface samples and higher biomass in floor samples. 16S microbial community profiles enabled high classifier accuracy for SARS-CoV-2 status in not onlymore »nares, but also forehead, stool, and floor samples. Across these distinct microbial profiles, a single amplicon sequence variant from the genusRothiastrongly predicted SARS-CoV-2 presence across sample types, with greater prevalence in positive surface and human samples, even when compared to samples from patients in other intensive care units prior to the COVID-19 pandemic.


    These results contextualize the vast diversity of microbial niches where SARS-CoV-2 RNA is detected and identify specific bacterial taxa that associate with the viral RNA prevalence both in the host and hospital environment.

    « less
  4. Abstract Background

    Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Delta variant has caused a dramatic resurgence in infections in the United Sates, raising questions regarding potential transmissibility among vaccinated individuals.


    Between October 2020 and July 2021, we sequenced 4439 SARS-CoV-2 full genomes, 23% of all known infections in Alachua County, Florida, including 109 vaccine breakthrough cases. Univariate and multivariate regression analyses were conducted to evaluate associations between viral RNA burden and patient characteristics. Contact tracing and phylogenetic analysis were used to investigate direct transmissions involving vaccinated individuals.


    The majority of breakthrough sequences with lineage assignment were classified as Delta variants (74.6%) and occurred, on average, about 3 months (104 ± 57.5 days) after full vaccination, at the same time (June-July 2021) of Delta variant exponential spread within the county. Six Delta variant transmission pairs between fully vaccinated individuals were identified through contact tracing, 3 of which were confirmed by phylogenetic analysis. Delta breakthroughs exhibited broad viral RNA copy number values during acute infection (interquartile range, 1.2-8.64 Log copies/mL), on average 38% lower than matched unvaccinated patients (3.29-10.81 Log copies/mL, P < .00001). Nevertheless, 49% to 50% of all breakthroughs, and 56% to 60% of Delta-infected breakthroughs exhibited viral RNA levels above the transmissibility threshold (4more »Log copies/mL) irrespective of time after vaccination.


    Delta infection transmissibility and general viral RNA quantification patterns in vaccinated individuals suggest limited levels of sterilizing immunity that need to be considered by public health policies. In particular, ongoing evaluation of vaccine boosters should specifically address whether extra vaccine doses curb breakthrough contribution to epidemic spread.

    « less
  5. Abstract

    SARS-CoV-2 is an RNA enveloped virus responsible for the COVID-19 pandemic that conducted in 6 million deaths worldwide so far. SARS-CoV-2 particles are mainly composed of the 4 main structural proteins M, N, E and S to form 100 nm diameter viral particles. Based on productive assays, we propose an optimal transfected plasmid ratio mimicking the viral RNA ratio in infected cells. This allows SARS-CoV-2 Virus-Like Particle (VLPs) formation composed of the viral structural proteins M, N, E and mature S. Furthermore, fluorescent or photoconvertible VLPs were generated by adding a fluorescent protein tag on N or M mixing with unlabeled viral proteins and characterized by western blots, atomic force microscopy coupled to fluorescence and immuno-spotting. Thanks to live fluorescence and super-resolution microscopies, we quantified VLPs size and concentration. SARS-CoV-2 VLPs present a diameter of 110 and 140 nm respectively for MNE-VLPs and MNES-VLPs with a concentration of 10e12 VLP/ml. In this condition, we were able to establish the incorporation of the Spike in the fluorescent VLPs. Finally, the Spike functionality was assessed by monitoring fluorescent MNES-VLPs docking and internalization in human pulmonary cells expressing or not the receptor hACE2. Results show a preferential maturation of S on N(GFP) labeled VLPsmore »and an hACE2-dependent VLP internalization and a potential fusion in host cells. This work provides new insights on the use of non-fluorescent and fluorescent VLPs to study and visualize the SARS-CoV-2 viral life cycle in a safe environment (BSL-2 instead of BSL-3). Moreover, optimized SARS-CoV-2 VLP production can be further adapted to vaccine design strategies.

    « less