Abstract Predicting protein properties from amino acid sequences is an important problem in biology and pharmacology. Protein–protein interactions among SARS-CoV-2 spike protein, human receptors and antibodies are key determinants of the potency of this virus and its ability to evade the human immune response. As a rapidly evolving virus, SARS-CoV-2 has already developed into many variants with considerable variation in virulence among these variants. Utilizing the proteomic data of SARS-CoV-2 to predict its viral characteristics will, therefore, greatly aid in disease control and prevention. In this paper, we review and compare recent successful prediction methods based on long short-term memory (LSTM), transformer, convolutional neural network (CNN) and a similarity-based topological regression (TR) model and offer recommendations about appropriate predictive methodology depending on the similarity between training and test datasets. We compare the effectiveness of these models in predicting the binding affinity and expression of SARS-CoV-2 spike protein sequences. We also explore how effective these predictive methods are when trained on laboratory-created data and are tasked with predicting the binding affinity of the in-the-wild SARS-CoV-2 spike protein sequences obtained from the GISAID datasets. We observe that TR is a better method when the sample size is small and test protein sequences are sufficiently similar to the training sequence. However, when the training sample size is sufficiently large and prediction requires extrapolation, LSTM embedding and CNN-based predictive model show superior performance. 
                        more » 
                        « less   
                    
                            
                            Single-Particle Characterization of SARS-CoV-2 Isoelectric Point and Comparison to Variants of Interest
                        
                    
    
            SARS-CoV-2, the cause of COVID-19, is a new, highly pathogenic coronavirus, which is the third coronavirus to emerge in the past 2 decades and the first to become a global pandemic. The virus has demonstrated itself to be extremely transmissible and deadly. Recent data suggest that a targeted approach is key to mitigating infectivity. Due to the proliferation of cataloged protein and nucleic acid sequences in databases, the function of the nucleic acid, and genetic encoded proteins, we make predictions by simply aligning sequences and exploring their homology. Thus, similar amino acid sequences in a protein usually confer similar biochemical function, even from distal or unrelated organisms. To understand viral transmission and adhesion, it is key to elucidate the structural, surface, and functional properties of each viral protein. This is typically first modeled in highly pathogenic species by exploring folding, hydrophobicity, and isoelectric point (IEP). Recent evidence from viral RNA sequence modeling and protein crystals have been inadequate, which prevent full understanding of the IEP and other viral properties of SARS-CoV-2. We have thus experimentally determined the IEP of SARS-CoV-2. Our findings suggest that for enveloped viruses, such as SARS-CoV-2, estimates of IEP by the amino acid sequence alone may be unreliable. We compared the experimental IEP of SARS-CoV-2 to variants of interest (VOIs) using their amino acid sequence, thus providing a qualitative comparison of the IEP of VOIs. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1451959
- PAR ID:
- 10329679
- Date Published:
- Journal Name:
- Microorganisms
- Volume:
- 9
- Issue:
- 8
- ISSN:
- 2076-2607
- Page Range / eLocation ID:
- 1606
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract The coronavirus disease 2019 (COVID-19) is a highly contagious and fatal disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). In general, the diagnostic tests for COVID-19 are based on the detection of nucleic acid, antibodies, and protein. Among different analytes, the gold standard of the COVID-19 test is the viral nucleic acid detection performed by the quantitative reverse transcription polymerase chain reaction (qRT-PCR) method. However, the gold standard test is time-consuming and requires expensive instrumentation, as well as trained personnel. Herein, we report an ultrasensitive electrochemical biosensor based on zinc sulfide/graphene (ZnS/graphene) nanocomposite for rapid and direct nucleic acid detection of SARS-CoV-2. We demonstrated a simple one-step route for manufacturing ZnS/graphene by employing an ultrafast (90 s) microwave-based non-equilibrium heating approach. The biosensor assay involves the hybridization of target DNA or RNA samples with probes that are immersed into a redox active electrolyte, which are detectable by electrochemical measurements. In this study, we have performed the tests for synthetic DNA samples and, SARS-CoV-2 standard samples. Experimental results revealed that the proposed biosensor could detect low concentrations of all different SARS-CoV-2 samples, using such as S, ORF 1a, and ORF 1b gene sequences as targets. This microwave-synthesized ZnS/graphene-based biosensor could be reliably used as an on-site, real-time, and rapid diagnostic test for COVID-19.more » « less
- 
            Protein sequence models for prediction and comparative analysis of the SARS-CoV-2 —human interactomenull (Ed.)Viruses such as the novel coronavirus, SARS-CoV-2, that is wreaking havoc on the world, depend on interactions of its own proteins with those of the human host cells. Relatively small changes in sequence such as between SARS-CoV and SARS-CoV-2 can dramatically change clinical phenotypes of the virus, including transmission rates and severity of the disease. On the other hand, highly dissimilar virus families such as Coronaviridae, Ebola, and HIV have overlap in functions. In this work we aim to analyze the role of protein sequence in the binding of SARS-CoV-2 virus proteins towards human proteins and compare it to that of the above other viruses. We build supervised machine learning models, using Generalized Additive Models to predict interactions based on sequence features and find that our models perform well with an AUC-PR of 0.65 in a class-skew of 1:10. Analysis of the novel predictions using an independent dataset showed statistically significant enrichment. We further map the importance of specific amino-acid sequence features in predicting binding and summarize what combinations of sequences from the virus and the host is correlated with an interaction. By analyzing the sequence-based embeddings of the interactomes from different viruses and clustering them together we find some functionally similar proteins from different viruses. For example, vif protein from HIV-1, vp24 from Ebola and orf3b from SARS-CoV all function as interferon antagonists. Furthermore, we can differentiate the functions of similar viruses, for example orf3a’s interactions are more diverged than orf7b interactions when comparing SARS-CoV and SARS-CoV-2.more » « less
- 
            Interactions of SARS-CoV-2 and MERS-CoV fusion peptides measured using single-molecule force methodsWe address the challenge of understanding how hydrophobic interactions are encoded by fusion peptide sequences within coronavirus (CoV) spike proteins. Within the fusion peptides of SARS-CoV-2 and MERS-CoV, a largely conserved peptide sequence called FP1 (SFIEDLLFNK and SAIEDLLFDK in SARS-2 and MERS, respectively) has been proposed to play a key role in encoding hydrophobic interactions that drive viral-host cell membrane fusion. While a non-polar triad (LLF) is common to both FP1 sequences, and thought to dominate the encoding of hydrophobic interactions, FP1 from SARS and MERS differ in two residues (Phe 2 versus Ala 2 and Asn 9 versus Asp 9s, respectively). Here we explore if single molecule force measurements can quantify hydrophobic interactions encoded by FP1 sequences, and then ask if sequence variations between FP1 from SARS-2 and MERS lead to significant differences in hydrophobic interactions. We find that both SARS-2 and MERS wild-type FP1 generate measurable hydrophobic interactions at the single molecule level, but that SARS-2 FP1 encodes a substantially stronger hydrophobic interaction than its MERS counterpart (1.91 ± 0.03 nN versus 0.68 ± 0.03 nN, respectively). By performing force measurements with FP1 sequences with single amino acid substitutions, we determine that a single residue mutation (Phe 2 versus Ala 2) causes the almost threefold difference in the hydrophobic interaction strength generated by the FP1 of SARS-2 versus MERS, despite the presence of LLF in both sequences. Infrared spectroscopy and circular dichroism measurements support the proposal that the outsized influence of Phe 2 versus Ala 2 on the hydrophobic interaction arises from variation in the secondary structure adopted by FP1. Overall, these insights reveal how single residue diversity in viral fusion peptides, including FP1 of SARS-CoV-2 and MERS-CoV, can lead to substantial changes in intermolecular interactions proposed to play a key role in viral fusion, and hint at strategies for regulating hydrophobic interactions of peptides in a range of contexts.more » « less
- 
            Abstract Understanding the molecular evolution of the SARS‐CoV‐2 virus as it continues to spread in communities around the globe is important for mitigation and future pandemic preparedness. Three‐dimensional structures of SARS‐CoV‐2 proteins and those of other coronavirusess archived in the Protein Data Bank were used to analyze viral proteome evolution during the first 6 months of the COVID‐19 pandemic. Analyses of spatial locations, chemical properties, and structural and energetic impacts of the observed amino acid changes in >48 000 viral isolates revealed how each one of 29 viral proteins have undergone amino acid changes. Catalytic residues in active sites and binding residues in protein–protein interfaces showed modest, but significant, numbers of substitutions, highlighting the mutational robustness of the viral proteome. Energetics calculations showed that the impact of substitutions on the thermodynamic stability of the proteome follows a universal bi‐Gaussian distribution. Detailed results are presented for potential drug discovery targets and the four structural proteins that comprise the virion, highlighting substitutions with the potential to impact protein structure, enzyme activity, and protein–protein and protein–nucleic acid interfaces. Characterizing the evolution of the virus in three dimensions provides testable insights into viral protein function and should aid in structure‐based drug discovery efforts as well as the prospective identification of amino acid substitutions with potential for drug resistance.more » « less
 An official website of the United States government
An official website of the United States government 
				
			
 
                                    