NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data

https://doi.org/10.1038/s41592-023-02130-4

Zheng, Wei; Wuyun, Qiqige; Li, Yang; Zhang, Chengxin; Freddolino, Lydia; Zhang, Yang (January 2024, Nature Methods)

Abstract Leveraging iterative alignment search through genomic and metagenome sequence databases, we report the DeepMSA2 pipeline for uniform protein single- and multichain multiple-sequence alignment (MSA) construction. Large-scale benchmarks show that DeepMSA2 MSAs can remarkably increase the accuracy of protein tertiary and quaternary structure predictions compared with current state-of-the-art methods. An integrated pipeline with DeepMSA2 participated in the most recent CASP15 experiment and created complex structural models with considerably higher quality than the AlphaFold2-Multimer server (v.2.2.0). Detailed data analyses show that the major advantage of DeepMSA2 lies in its balanced alignment search and effective model selection, and in the power of integrating huge metagenomics databases. These results demonstrate a new avenue to improve deep learning protein structure prediction through advanced MSA construction and provide additional evidence that optimization of input information to deep learning-based structure prediction methods must be considered with as much care as the design of the predictor itself.
more » « less
LOMETS3: integrating deep learning and profile alignment for advanced protein template recognition and function annotation

https://doi.org/10.1093/nar/gkac248

Zheng, Wei; Wuyun, Qiqige; Zhou, Xiaogen; Li, Yang; Freddolino, Lydia; Zhang, Yang (April 2022, Nucleic Acids Research)

Abstract Deep learning techniques have significantly advanced the field of protein structure prediction. LOMETS3 (https://zhanglab.ccmb.med.umich.edu/LOMETS/) is a new generation meta-server approach to template-based protein structure prediction and function annotation, which integrates newly developed deep learning threading methods. For the first time, we have extended LOMETS3 to handle multi-domain proteins and to construct full-length models with gradient-based optimizations. Starting from a FASTA-formatted sequence, LOMETS3 performs four steps of domain boundary prediction, domain-level template identification, full-length template/model assembly and structure-based function prediction. The output of LOMETS3 contains (i) top-ranked templates from LOMETS3 and its component threading programs, (ii) up to 5 full-length structure models constructed by L-BFGS (limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm) optimization, (iii) the 10 closest Protein Data Bank (PDB) structures to the target, (iv) structure-based functional predictions, (v) domain partition and assembly results, and (vi) the domain-level threading results, including items (i)–(iii) for each identified domain. LOMETS3 was tested in large-scale benchmarks and the blind CASP14 (14th Critical Assessment of Structure Prediction) experiment, where the overall template recognition and function prediction accuracy is significantly beyond its predecessors and other state-of-the-art threading approaches, especially for hard targets without homologous templates in the PDB. Based on the improved developments, LOMETS3 should help significantly advance the capability of broader biomedical community for template-based protein structure and function modelling.
more » « less
An Application of Random Walk Resampling to Phylogenetic HMM Inference and Learning

https://doi.org/10.1109/TNB.2020.2991302

Wang, Wei; Wuyun, Qiqige; Liu, Kevin J. (July 2020, IEEE Transactions on NanoBioscience)
An Application of Random Walk Resampling to Phylogenetic HMM Inference and Learning

https://doi.org/10.1109/BIBM47256.2019.8983223

Wang, Wei; Wuyun, Qiqige; Liu, Kevin J. (November 2019, An application of random walk resampling to phylogenetic HMM inference and learning)

Full Text Available
FUpred: detecting protein domains through deep-learning-based contact map prediction

https://doi.org/10.1093/bioinformatics/btaa217

Zheng, Wei; Zhou, Xiaogen; Wuyun, Qiqige; Pearce, Robin; Li, Yang; Zhang, Yang; Elofsson, Arne (March 2020, Bioinformatics)

Abstract Motivation Protein domains are subunits that can fold and function independently. Correct domain boundary assignment is thus a critical step toward accurate protein structure and function analyses. There is, however, no efficient algorithm available for accurate domain prediction from sequence. The problem is particularly challenging for proteins with discontinuous domains, which consist of domain segments that are separated along the sequence. Results We developed a new algorithm, FUpred, which predicts protein domain boundaries utilizing contact maps created by deep residual neural networks coupled with coevolutionary precision matrices. The core idea of the algorithm is to retrieve domain boundary locations by maximizing the number of intra-domain contacts, while minimizing the number of inter-domain contacts from the contact maps. FUpred was tested on a large-scale dataset consisting of 2549 proteins and generated correct single- and multi-domain classifications with a Matthew’s correlation coefficient of 0.799, which was 19.1% (or 5.3%) higher than the best machine learning (or threading)-based method. For proteins with discontinuous domains, the domain boundary detection and normalized domain overlapping scores of FUpred were 0.788 and 0.521, respectively, which were 17.3% and 23.8% higher than the best control method. The results demonstrate a new avenue to accurately detect domain composition from sequence alone, especially for discontinuous, multi-domain proteins. Availability and implementation https://zhanglab.ccmb.med.umich.edu/FUpred. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Scalable Statistical Introgression Mapping Using Approximate Coalescent-Based Inference

https://doi.org/10.1145/3307339.3343352

Wuyun, Qiqige; VanKuren, Nicholas W.; Kronforst, Marcus; Mullen, Sean P.; Liu, Kevin J. (September 2019, Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics)

Full Text Available
Scalable Statistical Introgression Mapping Using Approximate Coalescent-Based Inference

https://doi.org/10.1145/3307339.3342165

Wuyun, Qiqige; VanKuren, Nicholas W.; Kronforst, Marcus; Mullen, Sean P.; Liu, Kevin J. (September 2019, Scalable statistical introgression mapping using approximate coalescent-based inference)

Full Text Available
Integrating deep learning, threading alignments, and a multi‐MSA strategy for high‐quality protein monomer and complex structure prediction in CASP15

https://doi.org/10.1002/prot.26585

Zheng, Wei; Wuyun, Qiqige; Freddolino, Lydia; Zhang, Yang (August 2023, Proteins: Structure, Function, and Bioinformatics)

Abstract We report the results of the “UM‐TBM” and “Zheng” groups in CASP15 for protein monomer and complex structure prediction. These prediction sets were obtained using the D‐I‐TASSER and DMFold‐Multimer algorithms, respectively. For monomer structure prediction, D‐I‐TASSER introduced four new features during CASP15: (i) a multiple sequence alignment (MSA) generation protocol that combines multi‐source MSA searching and a structural modeling‐based MSA ranker; (ii) attention‐network based spatial restraints; (iii) a multi‐domain module containing domain partition and arrangement for domain‐level templates and spatial restraints; (iv) an optimized I‐TASSER‐based folding simulation system for full‐length model creation guided by a combination of deep learning restraints, threading alignments, and knowledge‐based potentials. For 47 free modeling targets in CASP15, the final models predicted by D‐I‐TASSER showed average TM‐score 19% higher than the standard AlphaFold2 program. We thus showed that traditional Monte Carlo‐based folding simulations, when appropriately coupled with deep learning algorithms, can generate models with improved accuracy over end‐to‐end deep learning methods alone. For protein complex structure prediction, DMFold‐Multimer generated models by integrating a new MSA generation algorithm (DeepMSA2) with the end‐to‐end modeling module from AlphaFold2‐Multimer. For the 38 complex targets, DMFold‐Multimer generated models with an average TM‐score of 0.83 and Interface Contact Score of 0.60, both significantly higher than those of competing complex prediction tools. Our analyses on complexes highlighted the critical role played by MSA generating, ranking, and pairing in protein complex structure prediction. We also discuss future room for improvement in the areas of viral protein modeling and complex model ranking.
more » « less
Disentangling Population History and Character Evolution among Hybridizing Lineages

https://doi.org/10.1093/molbev/msaa004

Mullen, Sean P; VanKuren, Nicholas W; Zhang, Wei; Nallu, Sumitha; Kristiansen, Evan B; Wuyun, Qiqige; Liu, Kevin; Hill, Ryan I; Briscoe, Adriana D; Kronforst, Marcus R; et al (January 2020, Molecular Biology and Evolution)

Abstract Understanding the origin and maintenance of adaptive phenotypic novelty is a central goal of evolutionary biology. However, both hybridization and incomplete lineage sorting can lead to genealogical discordance between the regions of the genome underlying adaptive traits and the remainder of the genome, decoupling inferences about character evolution from population history. Here, to disentangle these effects, we investigated the evolutionary origins and maintenance of Batesian mimicry between North American admiral butterflies (Limenitis arthemis) and their chemically defended model (Battus philenor) using a combination of de novo genome sequencing, whole-genome resequencing, and statistical introgression mapping. Our results suggest that balancing selection, arising from geographic variation in the presence or absence of the unpalatable model, has maintained two deeply divergent color patterning haplotypes that have been repeatedly sieved among distinct mimetic and nonmimetic lineages of Limenitis via introgressive hybridization.
more » « less
Full Text Available

Search for: All records