skip to main content


Search for: All records

Award ID contains: 1661391

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Protein sequence matching presently fails to identify many structures that are highly similar, even when they are known to have the same function. The high packing densities in globular proteins lead to interdependent substitutions, which have not previously been considered for amino acid similarities. At present, sequence matching compares sequences based only upon the similarities of single amino acids, ignoring the fact that in densely packed protein, there are additional conservative substitutions representing exchanges between two interacting amino acids, such as a small‐large pair changing to a large‐small pair substitutions that are not individually so conservative. Here we show that including information for such pairs of substitutions yields improved sequence matches, and that these yield significant gains in the agreements between sequence alignments and structure matches of the same protein pair. The result shows sequence segments matched where structure segments are aligned. There are gains for all 2002 collected cases where the sequence alignments that were not previously congruent with the structure matches. Our results also demonstrate a significant gain in detecting homology for “twilight zone” protein sequences. The amino acid substitution metrics derived have many other potential applications, for annotations, protein design, mutagenesis design, and empirical potential derivation.

     
    more » « less
  2. Abstract

    Proteins are the active players in performing essential molecular activities throughout biology, and their dynamics has been broadly demonstrated to relate to their mechanisms. The intrinsic fluctuations have often been used to represent their dynamics and then compared to the experimental B‐factors. However, proteins do not move in a vacuum and their motions are modulated by solvent that can impose forces on the structure. In this paper, we introduce a new structural concept, which has been called the structural compliance, for the evaluation of the global and local deformability of the protein structure in response to intramolecular and solvent forces. Based on the application of pairwise pulling forces to a protein elastic network, this structural quantity has been computed and sometimes is even found to yield an improved correlation with the experimental B‐factors, meaning that it may serve as a better metric for protein flexibility. The inverse of structural compliance, namely the structural stiffness, has also been defined, which shows a clear anticorrelation with the experimental data. Although the present applications are made to proteins, this approach can also be applied to other biomolecular structures such as RNA. This present study considers only elastic network models, but the approach could be applied further to conventional atomic molecular dynamics. Compliance is found to have a slightly better agreement with the experimental B‐factors, perhaps reflecting its bias toward the effects of local perturbations, in contrast to mean square fluctuations. The code for calculating protein compliance and stiffness is freely accessible athttps://jerniganlab.github.io/Software/PACKMAN/Tutorials/compliance.

     
    more » « less
  3. Abstract

    Binding sites in proteins can be either specifically functional binding sites (active sites) that bind specific substrates with high affinity or regulatory binding sites (allosteric sites), that modulate the activity of functional binding sites through effector molecules. Owing to their significance in determining protein function, the identification of protein functional and regulatory binding sites is widely acknowledged as an important biological problem. In this work, we present a novel binding site prediction method, Active and Regulatory site Prediction (AR‐Pred), which supplements protein geometry, evolutionary, and physicochemical features with information about protein dynamics to predict putative active and allosteric site residues. As the intrinsic dynamics of globular proteins plays an essential role in controlling binding events, we find it to be an important feature for the identification of protein binding sites. We train and validate our predictive models on multiple balanced training and validation sets with random forest machine learning and obtain an ensemble of discrete models for each prediction type. Our models for active site prediction yield a median area under the curve (AUC) of 91% and Matthews correlation coefficient (MCC) of 0.68, whereas the less well‐defined allosteric sites are predicted at a lower level with a median AUC of 80% and MCC of 0.48. When tested on an independent set of proteins, our models for active site prediction show comparable performance to two existing methods and gains compared to two others, while the allosteric site models show gains when tested against three existing prediction methods. AR‐Pred is available as a free downloadable package athttps://github.com/sambitmishra0628/AR-PRED_source.

     
    more » « less
  4. null (Ed.)
    Abstract Proteins are tiny players involved in the activation and deactivation of multiple signaling cascades through interactions in cells. The TNFR1 and MADD interact with each other and mediate downstream protein signaling pathways which cause neuronal cell death and Alzheimer’s disease. In the current study, a molecular docking approach was employed to explore the interactive behavior of TNFR1 and MADD proteins and their role in the activation of downstream signaling pathways. The computational sequential and structural conformational results revealed that Asp400, Arg58, Arg59 were common residues of TNFR1 and MADD which are involved in the activation of downstream signaling pathways. Aspartic acid in negatively charged residues is involved in the biosynthesis of protein. However, arginine is a positively charged residue with the potential to interact with oppositely charged amino acids. Furthermore, our molecular dynamic simulation results also ensured the stability of the backbone of TNFR1 and MADD death domains (DDs) in binding interactions. This DDs interaction mediates some conformational changes in TNFR1 which leads to the activation of mediators proteins in the cellular signaling pathways. Taken together, a better understanding of TNFR1 and MADD receptors and their activated signaling cascade may help treat Alzheimer’s disease. The death domains of TNFR1 and MADD could be used as a novel pharmacological target for the treatment of Alzheimer’s disease by inhibiting the MAPK pathway. 
    more » « less
  5. null (Ed.)
    Two new computational approaches are described to aid in the design of new peptide-based drugs by evaluating ensembles of protein structures from their dynamics and through the assessing of structures using empirical contact potential. These approaches build on the concept that conformational variability can aid in the binding process and, for disordered proteins, can even facilitate the binding of more diverse ligands. This latter consideration indicates that such a design process should be less restrictive so that multiple inhibitors might be effective. The example chosen here focuses on proteins/peptides that bind to hemagglutinin (HA) to block the large-scale conformational change for activation. Variability in the conformations is considered from sets of experimental structures, or as an alternative, from their simple computed dynamics; the set of designe peptides/small proteins from the David Baker lab designed to bind to hemagglutinin, is the large set considered and is assessed with the new empirical contact potentials. 
    more » « less
  6. null (Ed.)
    Abstract We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/. 
    more » « less
  7. We discuss the use of the regularized linear discriminant analysis (LDA) as a model reduction technique combined with particle swarm optimization (PSO) in protein tertiary structure prediction, followed by structure refinement based on singular value decomposition (SVD) and PSO. The algorithm presented in this paper corresponds to the category of template-based modeling. The algorithm performs a preselection of protein templates before constructing a lower dimensional subspace via a regularized LDA. The protein coordinates in the reduced spaced are sampled using a highly explorative optimization algorithm, regressive–regressive PSO (RR-PSO). The obtained structure is then projected onto a reduced space via singular value decomposition and further optimized via RR-PSO to carry out a structure refinement. The final structures are similar to those predicted by best structure prediction tools, such as Rossetta and Zhang servers. The main advantage of our methodology is that alleviates the ill-posed character of protein structure prediction problems related to high dimensional optimization. It is also capable of sampling a wide range of conformational space due to the application of a regularized linear discriminant analysis, which allows us to expand the differences over a reduced basis set. 
    more » « less
  8. Accurate prediction of protein stability changes resulting from amino acid substitutions is of utmost importance in medicine to better understand which mutations are deleterious, leading to diseases, and which are neutral. Since conducting wet lab experiments to get a better understanding of protein mutations is costly and time consuming, and because of huge number of possible mutations the need of computational methods that could accurately predict effects of amino acid mutations is of greatest importance. In this research, we present a robust methodology to predict the energy changes of a proteins upon mutations. The proposed prediction scheme is based on two step algorithm that is a Holdout Random Sampler followed by a neural network model for regression. The Holdout Random Sampler is utilized to analysis the energy change, the corresponding uncertainty, and to obtain a set of admissible energy changes, expressed as a cumulative distribution function. These values are further utilized to train a simple neural network model that can predict the energy changes. Results were blindly tested (validated) against experimental energy changes, giving Pearson correlation coefficients of 0.66 for Single Point Mutations and 0.77 for Multiple Point Mutations. These results confirm the successfulness of our method, since it outperforms majority of previous studies in this field. 
    more » « less
  9. Disordered binding regions (DBRs), which are embedded within intrinsically disordered proteins or regions (IDPs or IDRs), enable IDPs or IDRs to mediate multiple protein-protein interactions. DBR-protein complexes were collected from the Protein Data Bank for which two or more DBRs having different amino acid sequences bind to the same (100% sequence identical) globular protein partner, a type of interaction herein called many-to-one binding. Two distinct binding profiles were identified: independent and overlapping. For the overlapping binding profiles, the distinct DBRs interact by means of almost identical binding sites (herein called “similar”), or the binding sites contain both common and divergent interaction residues (herein called “intersecting”). Further analysis of the sequence and structural differences among these three groups indicate how IDP flexibility allows different segments to adjust to similar, intersecting, and independent binding pockets. 
    more » « less