skip to main content


Search for: All records

Creators/Authors contains: "Pearce, Robin"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    RNAs are fundamental in living cells and perform critical functions determined by their tertiary architectures. However, accurate modeling of 3D RNA structure remains a challenging problem. We present a novel method, DRfold, to predict RNA tertiary structures by simultaneous learning of local frame rotations and geometric restraints from experimentally solved RNA structures, where the learned knowledge is converted into a hybrid energy potential to guide RNA structure assembly. The method significantly outperforms previous approaches by >73.3% in TM-score on a sequence-nonredundant dataset containing recently released structures. Detailed analyses showed that the major contribution to the improvements arise from the deep end-to-end learning supervised with the atom coordinates and the composite energy function integrating complementary information from geometry restraints and end-to-end learning models. The open-source DRfold program with fast training protocol allows large-scale application of high-resolution RNA structure modeling and can be further improved with future expansion of RNA structure databases.

     
    more » « less
    Free, publicly-accessible full text available December 1, 2024
  2. Abstract

    Sequence-based contact prediction has shown considerable promise in assisting non-homologous structure modeling, but it often requires many homologous sequences and a sufficient number of correct contacts to achieve correct folds. Here, we developed a method, C-QUARK, that integrates multiple deep-learning and coevolution-based contact-maps to guide the replica-exchange Monte Carlo fragment assembly simulations. The method was tested on 247 non-redundant proteins, where C-QUARK could fold 75% of the cases with TM-scores (template-modeling scores) ≥0.5, which was 2.6 times more than that achieved by QUARK. For the 59 cases that had either low contact accuracy or few homologous sequences, C-QUARK correctly folded 6 times more proteins than other contact-based folding methods. C-QUARK was also tested on 64 free-modeling targets from the 13th CASP (critical assessment of protein structure prediction) experiment and had an average GDT_TS (global distance test) score that was 5% higher than the best CASP predictors. These data demonstrate, in a robust manner, the progress in modeling non-homologous protein structures using low-accuracy and sparse contact-map predictions.

     
    more » « less
  3. null (Ed.)
  4. null (Ed.)
  5. Abstract Motivation Protein structure and function are essentially determined by how the side-chain atoms interact with each other. Thus, accurate protein side-chain packing (PSCP) is a critical step toward protein structure prediction and protein design. Despite the importance of the problem, however, the accuracy and speed of current PSCP programs are still not satisfactory. Results We present FASPR for fast and accurate PSCP by using an optimized scoring function in combination with a deterministic searching algorithm. The performance of FASPR was compared with four state-of-the-art PSCP methods (CISRR, RASP, SCATD and SCWRL4) on both native and non-native protein backbones. For the assessment on native backbones, FASPR achieved a good performance by correctly predicting 69.1% of all the side-chain dihedral angles using a stringent tolerance criterion of 20°, compared favorably with SCWRL4, CISRR, RASP and SCATD which successfully predicted 68.8%, 68.6%, 67.8% and 61.7%, respectively. Additionally, FASPR achieved the highest speed for packing the 379 test protein structures in only 34.3 s, which was significantly faster than the control methods. For the assessment on non-native backbones, FASPR showed an equivalent or better performance on I-TASSER predicted backbones and the backbones perturbed from experimental structures. Detailed analyses showed that the major advantage of FASPR lies in the optimal combination of the dead-end elimination and tree decomposition with a well optimized scoring function, which makes FASPR of practical use for both protein structure modeling and protein design studies. Availability and implementation The web server, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/FASPR and https://github.com/tommyhuangthu/FASPR. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  6. Abstract Motivation Protein domains are subunits that can fold and function independently. Correct domain boundary assignment is thus a critical step toward accurate protein structure and function analyses. There is, however, no efficient algorithm available for accurate domain prediction from sequence. The problem is particularly challenging for proteins with discontinuous domains, which consist of domain segments that are separated along the sequence. Results We developed a new algorithm, FUpred, which predicts protein domain boundaries utilizing contact maps created by deep residual neural networks coupled with coevolutionary precision matrices. The core idea of the algorithm is to retrieve domain boundary locations by maximizing the number of intra-domain contacts, while minimizing the number of inter-domain contacts from the contact maps. FUpred was tested on a large-scale dataset consisting of 2549 proteins and generated correct single- and multi-domain classifications with a Matthew’s correlation coefficient of 0.799, which was 19.1% (or 5.3%) higher than the best machine learning (or threading)-based method. For proteins with discontinuous domains, the domain boundary detection and normalized domain overlapping scores of FUpred were 0.788 and 0.521, respectively, which were 17.3% and 23.8% higher than the best control method. The results demonstrate a new avenue to accurately detect domain composition from sequence alone, especially for discontinuous, multi-domain proteins. Availability and implementation https://zhanglab.ccmb.med.umich.edu/FUpred. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  7. Abstract Motivation

    The accuracy and success rate of de novo protein design remain limited, mainly due to the parameter over-fitting of current energy functions and their inability to discriminate incorrect designs from correct designs.

    Results

    We developed an extended energy function, EvoEF2, for efficient de novo protein sequence design, based on a previously proposed physical energy function, EvoEF. Remarkably, EvoEF2 recovered 32.5%, 47.9% and 22.3% of all, core and surface residues for 148 test monomers, and was generally applicable to protein–protein interaction design, as it recapitulated 30.9%, 42.4%, 31.3% and 21.4% of all, core, interface and surface residues for 88 test dimers, significantly outperforming EvoEF on the native sequence recapitulation. We further used I-TASSER to evaluate the foldability of the 148 designed monomer sequences, where all of them were predicted to fold into structures with high fold- and atomic-level similarity to their corresponding native structures, as demonstrated by the fact that 87.8% of the predicted structures shared a root-mean-square-deviation less than 2 Å to their native counterparts. The study also demonstrated that the usefulness of physical energy functions is highly correlated with the parameter optimization processes, and EvoEF2, with parameters optimized using sequence recapitulation, is more suitable for computational protein sequence design than EvoEF, which was optimized on thermodynamic mutation data.

    Availability and implementation

    The source code of EvoEF2 and the benchmark datasets are freely available at https://zhanglab.ccmb.med.umich.edu/EvoEF.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  8. Abstract Motivation

    Most proteins perform their biological functions through interactions with other proteins in cells. Amino acid mutations, especially those occurring at protein interfaces, can change the stability of protein–protein interactions (PPIs) and impact their functions, which may cause various human diseases. Quantitative estimation of the binding affinity changes (ΔΔGbind) caused by mutations can provide critical information for protein function annotation and genetic disease diagnoses.

    Results

    We present SSIPe, which combines protein interface profiles, collected from structural and sequence homology searches, with a physics-based energy function for accurate ΔΔGbind estimation. To offset the statistical limits of the PPI structure and sequence databases, amino acid-specific pseudocounts were introduced to enhance the profile accuracy. SSIPe was evaluated on large-scale experimental data containing 2204 mutations from 177 proteins, where training and test datasets were stringently separated with the sequence identity between proteins from the two datasets below 30%. The Pearson correlation coefficient between estimated and experimental ΔΔGbind was 0.61 with a root-mean-square-error of 1.93 kcal/mol, which was significantly better than the other methods. Detailed data analyses revealed that the major advantage of SSIPe over other traditional approaches lies in the novel combination of the physical energy function with the new knowledge-based interface profile. SSIPe also considerably outperformed a former profile-based method (BindProfX) due to the newly introduced sequence profiles and optimized pseudocount technique that allows for consideration of amino acid-specific prior mutation probabilities.

    Availability and implementation

    Web-server/standalone program, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/SSIPe and https://github.com/tommyhuangthu/SSIPe.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less