skip to main content


Search for: All records

Award ID contains: 2136095

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Within cells, cytoskeletal filaments are often arranged into loosely aligned bundles. These fibrous bundles are dense enough to exhibit a certain regularity and mean direction, however, their packing is not sufficient to impose a symmetry between—or specific shape on—individual filaments. This intermediate regularity is computationally difficult to handle because individual filaments have a certain directional freedom, however, the filament densities are not well segmented from each other (especially in the presence of noise, such as in cryo-electron tomography). In this paper, we develop a dynamic programming-based framework, Spaghetti Tracer, to characterizing the structural arrangement of filaments in the challenging 3D maps of subcellular components. Assuming that the tomogram can be rotated such that the filaments are oriented in a mean direction, the proposed framework first identifies local seed points for candidate filament segments, which are then grown from the seeds using a dynamic programming algorithm. We validate various algorithmic variations of our framework on simulated tomograms that closely mimic the noise and appearance of experimental maps. As we know the ground truth in the simulated tomograms, the statistical analysis consisting of precision, recall, and F1 scores allows us to optimize the performance of this new approach. We find that a bipyramidal accumulation scheme for path density is superior to straight-line accumulation. In addition, the multiplication of forward and backward path densities provides for an efficient filter that lifts the filament density above the noise level. Resulting from our tests is a robust method that can be expected to perform well (F1 scores 0.86–0.95) under experimental noise conditions. 
    more » « less
  2. G protein-coupled receptors (GPCRs) are the largest class of cell-surface receptor proteins with important functions in signal transduction and often serve as therapeutic drug targets. With the rapidly growing public data on three dimensional (3D) structures of GPCRs and GPCR-ligand interactions, computational prediction of GPCR ligand binding becomes a convincing option to high throughput screening and other experimental approaches during the beginning phases of ligand discovery. In this work, we set out to computationally uncover and understand the binding of a single ligand to GPCRs from several different families. Three-dimensional structural comparisons of the GPCRs that bind to the same ligand revealed local 3D structural similarities and often these regions overlap with locations of binding pockets. These pockets were found to be similar (based on backbone geometry and side-chain orientation using APoc), and they correlate positively with electrostatic properties of the pockets. Moreover, the more similar the pockets, the more likely a ligand binding to the pockets will interact with similar residues, have similar conformations, and produce similar binding affinities across the pockets. These findings can be exploited to improve protein function inference, drug repurposing and drug toxicity prediction, and accelerate the development of new drugs. 
    more » « less
  3. Calculation of protein–ligand binding affinity is a cornerstone of drug discovery. Classic implicit solvent models, which have been widely used to accomplish this task, lack accuracy compared to experimental references. Emerging data-driven models, on the other hand, are often accurate yet not fully interpretable and also likely to be overfitted. In this research, we explore the application of Theory-Guided Data Science in studying protein–ligand binding. A hybrid model is introduced by integrating Graph Convolutional Network (data-driven model) with the GBNSR6 implicit solvent (physics-based model). The proposed physics-data model is tested on a dataset of 368 complexes from the PDBbind refined set and 72 host–guest systems. Results demonstrate that the proposed Physics-Guided Neural Network can successfully improve the “accuracy” of the pure data-driven model. In addition, the “interpretability” and “transferability” of our model have boosted compared to the purely data-driven model. Further analyses include evaluating model robustness and understanding relationships between the physical features. 
    more » « less
  4. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a high mutation rate and many variants have emerged in the last 2 years, including Alpha, Beta, Delta, Gamma and Omicron. Studies showed that the host-genome similarity (HGS) of SARS-CoV-2 is higher than SARS-CoV and the HGS of open reading frame (ORF) in coronavirus genome is closely related to suppression of innate immunity. Many works have shown that ORF 6 and ORF 8 of SARS-CoV-2 play an important role in suppressing IFN-β signaling pathway in vivo. However, the relation between HGS and the adaption of SARS-CoV-2 variants is still not clear. This work investigates HGS of SARS-CoV-2 variants based on a dataset containing more than 40,000 viral genomes. The relation between HGS of viral ORFs and the suppression of antivirus response is studied. The results show that ORF 7b, ORF 6 and ORF 8 are the top 3 genes with the highest HGS. In the past 2 years, the HGS values of ORF 8 and ORF 7B of SARS-CoV-2 have increased greatly. A remarkable correlation is discovered between HGS and inhibition of antivirus response of immune system, which suggests that the similarity between coronavirus and host gnome may be an indicator of the suppression of innate immunity. Among the five variants (Alpha, Beta, Delta, Gamma and Omicron), Delta has the highest HGS and Omicron has the lowest HGS. This finding implies that the high HGS in Delta variant may indicate further suppression of host innate immunity. However, the relatively low HGS of Omicron is still a puzzle. By comparing the mutations in genomes of Alpha, Delta and Omicron variants, a commonly shared mutation ACT > ATT is identified in high-HGS strain populations. The high HGS mutations among the three variants are quite different. This finding strongly suggests that mutations in high HGS strains are different in different variants. Only a few common mutations survive, which may play important role in improving the adaptability of SARS-CoV-2. However, the mechanism for how the mutations help SARS-CoV-2 escape immunity is still unclear. HGS analysis is a new method to study virus–host interaction and may provide a way to understand the rapid mutation and adaption of SARS-CoV-2. 
    more » « less
  5. Cryo-electron microscopy (cryo-EM) is a structural technique that has played a significant role in protein structure determination in recent years. Compared to the traditional methods of X-ray crystallography and NMR spectroscopy, cryo-EM is capable of producing images of much larger protein complexes. However, cryo-EM reconstructions are limited to medium-resolution (~4–10 Å) for some cases. At this resolution range, a cryo-EM density map can hardly be used to directly determine the structure of proteins at atomic level resolutions, or even at their amino acid residue backbones. At such a resolution, only the position and orientation of secondary structure elements (SSEs) such as α-helices and β-sheets are observable. Consequently, finding the mapping of the secondary structures of the modeled structure (SSEs-A) to the cryo-EM map (SSEs-C) is one of the primary concerns in cryo-EM modeling. To address this issue, this study proposes a novel automatic computational method to identify SSEs correspondence in three-dimensional (3D) space. Initially, through a modeling of the target sequence with the aid of extracting highly reliable features from a generated 3D model and map, the SSEs matching problem is formulated as a 3D vector matching problem. Afterward, the 3D vector matching problem is transformed into a 3D graph matching problem. Finally, a similarity-based voting algorithm combined with the principle of least conflict (PLC) concept is developed to obtain the SSEs correspondence. To evaluate the accuracy of the method, a testing set of 25 experimental and simulated maps with a maximum of 65 SSEs is selected. Comparative studies are also conducted to demonstrate the superiority of the proposed method over some state-of-the-art techniques. The results demonstrate that the method is efficient, robust, and works well in the presence of errors in the predicted secondary structures of the cryo-EM images. 
    more » « less