NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Integrating end-to-end learning with deep geometrical potentials for ab initio RNA structure prediction

https://doi.org/10.1038/s41467-023-41303-9

Li, Yang; Zhang, Chengxin; Feng, Chenjie; Pearce, Robin; Lydia Freddolino, P.; Zhang, Yang (December 2023, Nature Communications)

Abstract RNAs are fundamental in living cells and perform critical functions determined by their tertiary architectures. However, accurate modeling of 3D RNA structure remains a challenging problem. We present a novel method, DRfold, to predict RNA tertiary structures by simultaneous learning of local frame rotations and geometric restraints from experimentally solved RNA structures, where the learned knowledge is converted into a hybrid energy potential to guide RNA structure assembly. The method significantly outperforms previous approaches by >73.3% in TM-score on a sequence-nonredundant dataset containing recently released structures. Detailed analyses showed that the major contribution to the improvements arise from the deep end-to-end learning supervised with the atom coordinates and the composite energy function integrating complementary information from geometry restraints and end-to-end learning models. The open-source DRfold program with fast training protocol allows large-scale application of high-resolution RNA structure modeling and can be further improved with future expansion of RNA structure databases.
more » « less
Full Text Available
Computational design of SARS-CoV-2 spike glycoproteins to increase immunogenicity by T cell epitope engineering

https://doi.org/10.1016/j.csbj.2020.12.039

Ong, Edison; Huang, Xiaoqiang; Pearce, Robin; Zhang, Yang; He, Yongqun (January 2021, Computational and Structural Biotechnology Journal)
null (Ed.)
Full Text Available
De novo design of protein peptides to block association of the SARS-CoV-2 spike protein with human ACE2

https://doi.org/10.18632/aging.103416

Huang, Xiaoqiang; Pearce, Robin; Zhang, Yang (June 2020, Aging)
null (Ed.)
Full Text Available
Identifying the Zoonotic Origin of SARS-CoV-2 by Modeling the Binding Affinity between the Spike Receptor-Binding Domain and Host ACE2

https://doi.org/10.1021/acs.jproteome.0c00717

Huang, Xiaoqiang; Zhang, Chengxin; Pearce, Robin; Omenn, Gilbert S.; Zhang, Yang (December 2020, Journal of Proteome Research)
null (Ed.)
Full Text Available
FASPR: an open-source tool for fast and accurate protein side-chain packing

https://doi.org/10.1093/bioinformatics/btaa234

Huang, Xiaoqiang; Pearce, Robin; Zhang, Yang; Elofsson, Arne (April 2020, Bioinformatics)

Abstract Motivation Protein structure and function are essentially determined by how the side-chain atoms interact with each other. Thus, accurate protein side-chain packing (PSCP) is a critical step toward protein structure prediction and protein design. Despite the importance of the problem, however, the accuracy and speed of current PSCP programs are still not satisfactory. Results We present FASPR for fast and accurate PSCP by using an optimized scoring function in combination with a deterministic searching algorithm. The performance of FASPR was compared with four state-of-the-art PSCP methods (CISRR, RASP, SCATD and SCWRL4) on both native and non-native protein backbones. For the assessment on native backbones, FASPR achieved a good performance by correctly predicting 69.1% of all the side-chain dihedral angles using a stringent tolerance criterion of 20°, compared favorably with SCWRL4, CISRR, RASP and SCATD which successfully predicted 68.8%, 68.6%, 67.8% and 61.7%, respectively. Additionally, FASPR achieved the highest speed for packing the 379 test protein structures in only 34.3 s, which was significantly faster than the control methods. For the assessment on non-native backbones, FASPR showed an equivalent or better performance on I-TASSER predicted backbones and the backbones perturbed from experimental structures. Detailed analyses showed that the major advantage of FASPR lies in the optimal combination of the dead-end elimination and tree decomposition with a well optimized scoring function, which makes FASPR of practical use for both protein structure modeling and protein design studies. Availability and implementation The web server, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/FASPR and https://github.com/tommyhuangthu/FASPR. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Toward the Accuracy and Speed of Protein Side-Chain Packing: A Systematic Study on Rotamer Libraries

https://doi.org/10.1021/acs.jcim.9b00812

Huang, Xiaoqiang; Pearce, Robin; Zhang, Yang (December 2019, Journal of Chemical Information and Modeling)

Full Text Available
FUpred: detecting protein domains through deep-learning-based contact map prediction

https://doi.org/10.1093/bioinformatics/btaa217

Zheng, Wei; Zhou, Xiaogen; Wuyun, Qiqige; Pearce, Robin; Li, Yang; Zhang, Yang; Elofsson, Arne (March 2020, Bioinformatics)

Abstract Motivation Protein domains are subunits that can fold and function independently. Correct domain boundary assignment is thus a critical step toward accurate protein structure and function analyses. There is, however, no efficient algorithm available for accurate domain prediction from sequence. The problem is particularly challenging for proteins with discontinuous domains, which consist of domain segments that are separated along the sequence. Results We developed a new algorithm, FUpred, which predicts protein domain boundaries utilizing contact maps created by deep residual neural networks coupled with coevolutionary precision matrices. The core idea of the algorithm is to retrieve domain boundary locations by maximizing the number of intra-domain contacts, while minimizing the number of inter-domain contacts from the contact maps. FUpred was tested on a large-scale dataset consisting of 2549 proteins and generated correct single- and multi-domain classifications with a Matthew’s correlation coefficient of 0.799, which was 19.1% (or 5.3%) higher than the best machine learning (or threading)-based method. For proteins with discontinuous domains, the domain boundary detection and normalized domain overlapping scores of FUpred were 0.788 and 0.521, respectively, which were 17.3% and 23.8% higher than the best control method. The results demonstrate a new avenue to accurately detect domain composition from sequence alone, especially for discontinuous, multi-domain proteins. Availability and implementation https://zhanglab.ccmb.med.umich.edu/FUpred. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
EvoEF2: accurate and fast energy function for computational protein design

https://doi.org/10.1093/bioinformatics/btz740

Huang, Xiaoqiang; Pearce, ,. Robin; Zhang, Yang; Elofsson, ed., Arne (October 2019, Bioinformatics)

Abstract MotivationThe accuracy and success rate of de novo protein design remain limited, mainly due to the parameter over-fitting of current energy functions and their inability to discriminate incorrect designs from correct designs. ResultsWe developed an extended energy function, EvoEF2, for efficient de novo protein sequence design, based on a previously proposed physical energy function, EvoEF. Remarkably, EvoEF2 recovered 32.5%, 47.9% and 22.3% of all, core and surface residues for 148 test monomers, and was generally applicable to protein–protein interaction design, as it recapitulated 30.9%, 42.4%, 31.3% and 21.4% of all, core, interface and surface residues for 88 test dimers, significantly outperforming EvoEF on the native sequence recapitulation. We further used I-TASSER to evaluate the foldability of the 148 designed monomer sequences, where all of them were predicted to fold into structures with high fold- and atomic-level similarity to their corresponding native structures, as demonstrated by the fact that 87.8% of the predicted structures shared a root-mean-square-deviation less than 2 Å to their native counterparts. The study also demonstrated that the usefulness of physical energy functions is highly correlated with the parameter optimization processes, and EvoEF2, with parameters optimized using sequence recapitulation, is more suitable for computational protein sequence design than EvoEF, which was optimized on thermodynamic mutation data. Availability and implementationThe source code of EvoEF2 and the benchmark datasets are freely available at https://zhanglab.ccmb.med.umich.edu/EvoEF. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less
SSIPe: accurately estimating protein–protein binding affinity change upon mutations using evolutionary profiles in combination with an optimized physical energy function

https://doi.org/10.1093/bioinformatics/btz926

Huang, Xiaoqiang; Zheng, Wei; Pearce, Robin; Zhang, Yang; Valencia, ed., Alfonso (December 2019, Bioinformatics)

Abstract MotivationMost proteins perform their biological functions through interactions with other proteins in cells. Amino acid mutations, especially those occurring at protein interfaces, can change the stability of protein–protein interactions (PPIs) and impact their functions, which may cause various human diseases. Quantitative estimation of the binding affinity changes (ΔΔGbind) caused by mutations can provide critical information for protein function annotation and genetic disease diagnoses. ResultsWe present SSIPe, which combines protein interface profiles, collected from structural and sequence homology searches, with a physics-based energy function for accurate ΔΔGbind estimation. To offset the statistical limits of the PPI structure and sequence databases, amino acid-specific pseudocounts were introduced to enhance the profile accuracy. SSIPe was evaluated on large-scale experimental data containing 2204 mutations from 177 proteins, where training and test datasets were stringently separated with the sequence identity between proteins from the two datasets below 30%. The Pearson correlation coefficient between estimated and experimental ΔΔGbind was 0.61 with a root-mean-square-error of 1.93 kcal/mol, which was significantly better than the other methods. Detailed data analyses revealed that the major advantage of SSIPe over other traditional approaches lies in the novel combination of the physical energy function with the new knowledge-based interface profile. SSIPe also considerably outperformed a former profile-based method (BindProfX) due to the newly introduced sequence profiles and optimized pseudocount technique that allows for consideration of amino acid-specific prior mutation probabilities. Availability and implementationWeb-server/standalone program, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/SSIPe and https://github.com/tommyhuangthu/SSIPe. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less

Search for: All records