Abstract Establishing the host range for novel viruses remains a challenge. Here, we address the challenge of identifying non-human animal coronaviruses that may infect humans by creating an artificial neural network model that learns from spike protein sequences of alpha and beta coronaviruses and their binding annotation to their host receptor. The proposed method produces a human-Binding Potential (h-BiP) score that distinguishes, with high accuracy, the binding potential among coronaviruses. Three viruses, previously unknown to bind human receptors, were identified: Bat coronavirus BtCoV/133/2005 and Pipistrellus abramus bat coronavirus HKU5-related (both MERS related viruses), andRhinolophus affiniscoronavirus isolate LYRa3 (a SARS related virus). We further analyze the binding properties of BtCoV/133/2005 and LYRa3 using molecular dynamics. To test whether this model can be used for surveillance of novel coronaviruses, we re-trained the model on a set that excludes SARS-CoV-2 and all viral sequences released after the SARS-CoV-2 was published. The results predict the binding of SARS-CoV-2 with a human receptor, indicating that machine learning methods are an excellent tool for the prediction of host expansion events.
more »
« less
Interpretable network propagation with application to expanding the repertoire of human proteins that interact with SARS-CoV-2
Abstract BackgroundNetwork propagation has been widely used for nearly 20 years to predict gene functions and phenotypes. Despite the popularity of this approach, little attention has been paid to the question of provenance tracing in this context, e.g., determining how much any experimental observation in the input contributes to the score of every prediction. ResultsWe design a network propagation framework with 2 novel components and apply it to predict human proteins that directly or indirectly interact with SARS-CoV-2 proteins. First, we trace the provenance of each prediction to its experimentally validated sources, which in our case are human proteins experimentally determined to interact with viral proteins. Second, we design a technique that helps to reduce the manual adjustment of parameters by users. We find that for every top-ranking prediction, the highest contribution to its score arises from a direct neighbor in a human protein-protein interaction network. We further analyze these results to develop functional insights on SARS-CoV-2 that expand on known biology such as the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents. ConclusionsWe examine how our provenance-tracing method can be generalized to a broad class of network-based algorithms. We provide a useful resource for the SARS-CoV-2 community that implicates many previously undocumented proteins with putative functional relationships to viral infection. This resource includes potential drugs that can be opportunistically repositioned to target these proteins. We also discuss how our overall framework can be extended to other, newly emerging viruses.
more »
« less
- PAR ID:
- 10361118
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- GigaScience
- Volume:
- 10
- Issue:
- 12
- ISSN:
- 2047-217X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Predicting protein properties from amino acid sequences is an important problem in biology and pharmacology. Protein–protein interactions among SARS-CoV-2 spike protein, human receptors and antibodies are key determinants of the potency of this virus and its ability to evade the human immune response. As a rapidly evolving virus, SARS-CoV-2 has already developed into many variants with considerable variation in virulence among these variants. Utilizing the proteomic data of SARS-CoV-2 to predict its viral characteristics will, therefore, greatly aid in disease control and prevention. In this paper, we review and compare recent successful prediction methods based on long short-term memory (LSTM), transformer, convolutional neural network (CNN) and a similarity-based topological regression (TR) model and offer recommendations about appropriate predictive methodology depending on the similarity between training and test datasets. We compare the effectiveness of these models in predicting the binding affinity and expression of SARS-CoV-2 spike protein sequences. We also explore how effective these predictive methods are when trained on laboratory-created data and are tasked with predicting the binding affinity of the in-the-wild SARS-CoV-2 spike protein sequences obtained from the GISAID datasets. We observe that TR is a better method when the sample size is small and test protein sequences are sufficiently similar to the training sequence. However, when the training sample size is sufficiently large and prediction requires extrapolation, LSTM embedding and CNN-based predictive model show superior performance.more » « less
-
Abstract SARS-CoV-2 is an RNA enveloped virus responsible for the COVID-19 pandemic that conducted in 6 million deaths worldwide so far. SARS-CoV-2 particles are mainly composed of the 4 main structural proteins M, N, E and S to form 100 nm diameter viral particles. Based on productive assays, we propose an optimal transfected plasmid ratio mimicking the viral RNA ratio in infected cells. This allows SARS-CoV-2 Virus-Like Particle (VLPs) formation composed of the viral structural proteins M, N, E and mature S. Furthermore, fluorescent or photoconvertible VLPs were generated by adding a fluorescent protein tag on N or M mixing with unlabeled viral proteins and characterized by western blots, atomic force microscopy coupled to fluorescence and immuno-spotting. Thanks to live fluorescence and super-resolution microscopies, we quantified VLPs size and concentration. SARS-CoV-2 VLPs present a diameter of 110 and 140 nm respectively for MNE-VLPs and MNES-VLPs with a concentration of 10e12 VLP/ml. In this condition, we were able to establish the incorporation of the Spike in the fluorescent VLPs. Finally, the Spike functionality was assessed by monitoring fluorescent MNES-VLPs docking and internalization in human pulmonary cells expressing or not the receptor hACE2. Results show a preferential maturation of S on N(GFP) labeled VLPs and an hACE2-dependent VLP internalization and a potential fusion in host cells. This work provides new insights on the use of non-fluorescent and fluorescent VLPs to study and visualize the SARS-CoV-2 viral life cycle in a safe environment (BSL-2 instead of BSL-3). Moreover, optimized SARS-CoV-2 VLP production can be further adapted to vaccine design strategies.more » « less
-
Abstract BackgroundProtein–protein interactions play a crucial role in almost all cellular processes. Identifying interacting proteins reveals insight into living organisms and yields novel drug targets for disease treatment. Here, we present a publicly available, automated pipeline to predict genome-wide protein–protein interactions and produce high-quality multimeric structural models. ResultsApplication of our method to the Human and Yeast genomes yield protein–protein interaction networks similar in quality to common experimental methods. We identified and modeled Human proteins likely to interact with the papain-like protease of SARS-CoV2’s non-structural protein 3. We also produced models of SARS-CoV2’s spike protein (S) interacting with myelin-oligodendrocyte glycoprotein receptor and dipeptidyl peptidase-4. ConclusionsThe presented method is capable of confidently identifying interactions while providing high-quality multimeric structural models for experimental validation. The interactome modeling pipeline is available at usegalaxy.org and usegalaxy.eu.more » « less
-
Abstract Structure-based drug design targeting the SARS-CoV-2 virus has been greatly facilitated by available virus-related protein structures. However, there is an urgent need for effective, safe small-molecule drugs to control the spread of the virus and variants. While many efforts are devoted to searching for compounds that selectively target individual proteins, we investigated the potential interactions between eight proteins related to SARS-CoV-2 and more than 600 compounds from a traditional Chinese medicine which has proven effective at treating the viral infection. Our original ensemble docking and cooperative docking approaches, followed by a total of over 16-micorsecond molecular simulations, have identified at least 9 compounds that may generally bind to key SARS-CoV-2 proteins. Further, we found evidence that some of these compounds can simultaneously bind to the same target, potentially leading to cooperative inhibition to SARS-CoV-2 proteins like the Spike protein and the RNA-dependent RNA polymerase. These results not only present a useful computational methodology to systematically assess the anti-viral potential of small molecules, but also point out a new avenue to seek cooperative compounds toward cocktail therapeutics to target more SARS-CoV-2-related proteins.more » « less