skip to main content

Search for: All records

Creators/Authors contains: "Mortuza, S. M."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Sequence-based contact prediction has shown considerable promise in assisting non-homologous structure modeling, but it often requires many homologous sequences and a sufficient number of correct contacts to achieve correct folds. Here, we developed a method, C-QUARK, that integrates multiple deep-learning and coevolution-based contact-maps to guide the replica-exchange Monte Carlo fragment assembly simulations. The method was tested on 247 non-redundant proteins, where C-QUARK could fold 75% of the cases with TM-scores (template-modeling scores) ≥0.5, which was 2.6 times more than that achieved by QUARK. For the 59 cases that had either low contact accuracy or few homologous sequences, C-QUARK correctly folded 6 times more proteins than other contact-based folding methods. C-QUARK was also tested on 64 free-modeling targets from the 13th CASP (critical assessment of protein structure prediction) experiment and had an average GDT_TS (global distance test) score that was 5% higher than the best CASP predictors. These data demonstrate, in a robust manner, the progress in modeling non-homologous protein structures using low-accuracy and sparse contact-map predictions.

    more » « less
  2. Abstract Motivation The success of genome sequencing techniques has resulted in rapid explosion of protein sequences. Collections of multiple homologous sequences can provide critical information to the modeling of structure and function of unknown proteins. There are however no standard and efficient pipeline available for sensitive multiple sequence alignment (MSA) collection. This is particularly challenging when large whole-genome and metagenome databases are involved. Results We developed DeepMSA, a new open-source method for sensitive MSA construction, which has homologous sequences and alignments created from multi-sources of whole-genome and metagenome databases through complementary hidden Markov model algorithms. The practical usefulness of the pipeline was examined in three large-scale benchmark experiments based on 614 non-redundant proteins. First, DeepMSA was utilized to generate MSAs for residue-level contact prediction by six coevolution and deep learning-based programs, which resulted in an accuracy increase in long-range contacts by up to 24.4% compared to the default programs. Next, multiple threading programs are performed for homologous structure identification, where the average TM-score of the template alignments has over 7.5% increases with the use of the new DeepMSA profiles. Finally, DeepMSA was used for secondary structure prediction and resulted in statistically significant improvements in the Q3 accuracy. It is noted that all these improvements were achieved without re-training the parameters and neural-network models, demonstrating the robustness and general usefulness of the DeepMSA in protein structural bioinformatics applications, especially for targets without homologous templates in the PDB library. Availability and implementation Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  3. Abstract

    We report the results of two fully automated structure prediction pipelines, “Zhang‐Server” and “QUARK”, in CASP13. The pipelines were built upon the C‐I‐TASSER and C‐QUARK programs, which in turn are based on I‐TASSER and QUARK but with three new modules: (a) a novel multiple sequence alignment (MSA) generation protocol to construct deep sequence‐profiles for contact prediction; (b) an improved meta‐method, NeBcon, which combines multiple contact predictors, including ResPRE that predicts contact‐maps by coupling precision‐matrices with deep residual convolutional neural‐networks; and (c) an optimized contact potential to guide structure assembly simulations. For 50 CASP13 FM domains that lacked homologous templates, average TM‐scores of the first models produced by C‐I‐TASSER and C‐QUARK were 28% and 56% higher than those constructed by I‐TASSER and QUARK, respectively. For the first time, contact‐map predictions demonstrated usefulness on TBM domains with close homologous templates, where TM‐scores of C‐I‐TASSER models were significantly higher than those of I‐TASSER models with aP‐value <.05. Detailed data analyses showed that the success of C‐I‐TASSER and C‐QUARK was mainly due to the increased accuracy of deep‐learning‐based contact‐maps, as well as the careful balance between sequence‐based contact restraints, threading templates, and generic knowledge‐based potentials. Nevertheless, challenges still remain for predicting quaternary structure of multi‐domain proteins, due to the difficulties in domain partitioning and domain reassembly. In addition, contact prediction in terminal regions was often unsatisfactory due to the sparsity of MSAs. Development of new contact‐based domain partitioning and assembly methods and training contact models on sparse MSAs may help address these issues.

    more » « less