NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

SSIPe: accurately estimating protein–protein binding affinity change upon mutations using evolutionary profiles in combination with an optimized physical energy function

https://doi.org/10.1093/bioinformatics/btz926

Huang, Xiaoqiang; Zheng, Wei; Pearce, Robin; Zhang, Yang; Valencia, ed., Alfonso (December 2019, Bioinformatics)

Abstract MotivationMost proteins perform their biological functions through interactions with other proteins in cells. Amino acid mutations, especially those occurring at protein interfaces, can change the stability of protein–protein interactions (PPIs) and impact their functions, which may cause various human diseases. Quantitative estimation of the binding affinity changes (ΔΔGbind) caused by mutations can provide critical information for protein function annotation and genetic disease diagnoses. ResultsWe present SSIPe, which combines protein interface profiles, collected from structural and sequence homology searches, with a physics-based energy function for accurate ΔΔGbind estimation. To offset the statistical limits of the PPI structure and sequence databases, amino acid-specific pseudocounts were introduced to enhance the profile accuracy. SSIPe was evaluated on large-scale experimental data containing 2204 mutations from 177 proteins, where training and test datasets were stringently separated with the sequence identity between proteins from the two datasets below 30%. The Pearson correlation coefficient between estimated and experimental ΔΔGbind was 0.61 with a root-mean-square-error of 1.93 kcal/mol, which was significantly better than the other methods. Detailed data analyses revealed that the major advantage of SSIPe over other traditional approaches lies in the novel combination of the physical energy function with the new knowledge-based interface profile. SSIPe also considerably outperformed a former profile-based method (BindProfX) due to the newly introduced sequence profiles and optimized pseudocount technique that allows for consideration of amino acid-specific prior mutation probabilities. Availability and implementationWeb-server/standalone program, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/SSIPe and https://github.com/tommyhuangthu/SSIPe. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less
EvoEF2: accurate and fast energy function for computational protein design

https://doi.org/10.1093/bioinformatics/btz740

Huang, Xiaoqiang; Pearce, ,. Robin; Zhang, Yang; Elofsson, ed., Arne (October 2019, Bioinformatics)

Abstract MotivationThe accuracy and success rate of de novo protein design remain limited, mainly due to the parameter over-fitting of current energy functions and their inability to discriminate incorrect designs from correct designs. ResultsWe developed an extended energy function, EvoEF2, for efficient de novo protein sequence design, based on a previously proposed physical energy function, EvoEF. Remarkably, EvoEF2 recovered 32.5%, 47.9% and 22.3% of all, core and surface residues for 148 test monomers, and was generally applicable to protein–protein interaction design, as it recapitulated 30.9%, 42.4%, 31.3% and 21.4% of all, core, interface and surface residues for 88 test dimers, significantly outperforming EvoEF on the native sequence recapitulation. We further used I-TASSER to evaluate the foldability of the 148 designed monomer sequences, where all of them were predicted to fold into structures with high fold- and atomic-level similarity to their corresponding native structures, as demonstrated by the fact that 87.8% of the predicted structures shared a root-mean-square-deviation less than 2 Å to their native counterparts. The study also demonstrated that the usefulness of physical energy functions is highly correlated with the parameter optimization processes, and EvoEF2, with parameters optimized using sequence recapitulation, is more suitable for computational protein sequence design than EvoEF, which was optimized on thermodynamic mutation data. Availability and implementationThe source code of EvoEF2 and the benchmark datasets are freely available at https://zhanglab.ccmb.med.umich.edu/EvoEF. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less
pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model

Pawel, Pratyush; Suresh, Pokharel; Hiroto, Saigo; Dukka, KC (February 2023, BMC bioinformatics)

Protein S-nitrosylation (SNO) plays a key role in transferring nitric oxide-mediated signals in both animals and plants and has emerged as an important mechanism for regulating protein functions and cell signaling of all main classes of protein. It is involved in several biological processes including immune response, protein stability, transcription regulation, post translational regulation, DNA damage repair, redox regulation, and is an emerging paradigm of redox signaling for protection against oxidative stress. The development of robust computational tools to predict protein SNO sites would contribute to further interpretation of the pathological and physiological mechanisms of SNO.
more » « less
Full Text Available
DeepNGlyPred: A Deep neural network-based approach for human N-linked glycosylation site prediction

Pakhrin, Subash; Aoki-Kinoshita, Kiyoko; Caragea, Doina; Dukka, KC (December 2021, Molecules)

Abstract Protein N-linked glycosylation is a post-translational modification that plays an important role in a myriad of biological processes. Computational prediction approaches serve as complementary methods for the characterization of glycosylation sites. Most of the existing predictors for N-linked glycosylation utilize the information that the glycosylation site occurs at the N-X-[S/T] sequon, where X is any amino acid except proline. Not all N-X-[S/T] sequons are glycosylated, thus the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In that regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem. Here, we report DeepNGlyPred a deep learning-based approach that encodes the positive and negative sequences in the human proteome dataset (extracted from N-GlycositeAtlas) using sequence-based features (gapped-dipeptide), predicted structural features, and evolutionary information. DeepNGlyPred produces SN, SP, MCC, and ACC of 88.62%, 73.92%, 0.60, and 79.41%, respectively on N-GlyDE independent test set, which is better than the compared approaches. These results demonstrate that DeepNGlyPred is a robust computational technique to predict N-Linked glycosylation sites confined to N-X-[S/T] sequon. DeepNGlyPred will be a useful resource for the glycobiology community.
more » « less
Full Text Available
ADDRESS: A Database of Disease-associated Human Variants Incorporating Protein Structure and Folding Stabilities

https://doi.org/10.1016/j.jmb.2021.166840

Woodard, Jaie; Zhang, Chengxin; Zhang, Yang (May 2021, Journal of Molecular Biology)
null (Ed.)
Full Text Available
Effects of SARS‐CoV‐2 mutations on protein structures and intraviral protein–protein interactions

https://doi.org/10.1002/jmv.26597

Wu, Siqi; Tian, Chang; Liu, Panpan; Guo, Dongjie; Zheng, Wei; Huang, Xiaoqiang; Zhang, Yang; Liu, Lijun (April 2021, Journal of Medical Virology)
null (Ed.)
Full Text Available
Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks

https://doi.org/10.1371/journal.pcbi.1008865

Li, Yang; Zhang, Chengxin; Bell, Eric W.; Zheng, Wei; Zhou, Xiaogen; Yu, Dong-Jun; Zhang, Yang (March 2021, PLOS Computational Biology)
Kolodny, Rachel (Ed.)
The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training. TripletRes was tested on a large set of 245 non-homologous proteins from CASP 11&12 and CAMEO experiments and outperformed other top methods from CASP12 by at least 58.4% for the CASP 11&12 targets and 44.4% for the CAMEO targets in the top- L long-range contact precision. On the 31 FM targets from the latest CASP13 challenge, TripletRes achieved the highest precision (71.6%) for the top- L /5 long-range contact predictions. It was also shown that a simple re-training of the TripletRes model with more proteins can lead to further improvement with precisions comparable to state-of-the-art methods developed after CASP13. These results demonstrate a novel efficient approach to extend the power of deep convolutional networks for high-accuracy medium- and long-range protein contact-map predictions starting from primary sequences, which are critical for constructing 3D structure of proteins that lack homologous templates in the PDB library.
more » « less
Full Text Available
Functions of Essential Genes and a Scale-Free Protein Interaction Network Revealed by Structure-Based Function and Interaction Prediction for a Minimal Genome

https://doi.org/10.1021/acs.jproteome.0c00359

Zhang, Chengxin; Zheng, Wei; Cheng, Micah; Omenn, Gilbert S.; Freddolino, Peter L.; Zhang, Yang (February 2021, Journal of Proteome Research)
null (Ed.)
Full Text Available
Computational design of SARS-CoV-2 spike glycoproteins to increase immunogenicity by T cell epitope engineering

https://doi.org/10.1016/j.csbj.2020.12.039

Ong, Edison; Huang, Xiaoqiang; Pearce, Robin; Zhang, Yang; He, Yongqun (January 2021, Computational and Structural Biotechnology Journal)
null (Ed.)
Full Text Available
Identifying the Zoonotic Origin of SARS-CoV-2 by Modeling the Binding Affinity between the Spike Receptor-Binding Domain and Host ACE2

https://doi.org/10.1021/acs.jproteome.0c00717

Huang, Xiaoqiang; Zhang, Chengxin; Pearce, Robin; Omenn, Gilbert S.; Zhang, Yang (December 2020, Journal of Proteome Research)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records