Search for: All records

Creators/Authors contains: "Zhang, Chengxin"

« Prev Next »

Total Resources

23

Resource Type
Conference Paper

0

Conference Proceeding

0

Dataset

0

Journal Article

23

Workshop Report

0

Availability
Full Text / Resource Available

22

Citation Only

1

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Integrating end-to-end learning with deep geometrical potentials for ab initio RNA structure prediction

https://doi.org/10.1038/s41467-023-41303-9

Li, Yang ; Zhang, Chengxin ; Feng, Chenjie ; Pearce, Robin ; Lydia Freddolino, P. ; Zhang, Yang ( December 2023 , Nature Communications)

Abstract
RNAs are fundamental in living cells and perform critical functions determined by their tertiary architectures. However, accurate modeling of 3D RNA structure remains a challenging problem. We present a novel method, DRfold, to predict RNA tertiary structures by simultaneous learning of local frame rotations and geometric restraints from experimentally solved RNA structures, where the learned knowledge is converted into a hybrid energy potential to guide RNA structure assembly. The method significantly outperforms previous approaches by >73.3% in TM-score on a sequence-nonredundant dataset containing recently released structures. Detailed analyses showed that the major contribution to the improvements arise from the deep end-to-end learning supervised with the atom coordinates and the composite energy function integrating complementary information from geometry restraints and end-to-end learning models. The open-source DRfold program with fast training protocol allows large-scale application of high-resolution RNA structure modeling and can be further improved with future expansion of RNA structure databases.

more » « less
Free, publicly-accessible full text available December 1, 2024
Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data

https://doi.org/10.1038/s41592-023-02130-4

Zheng, Wei ; Wuyun, Qiqige ; Li, Yang ; Zhang, Chengxin ; Freddolino, P. Lydia ; Zhang, Yang ( January 2024 , Nature Methods)

Abstract
Leveraging iterative alignment search through genomic and metagenome sequence databases, we report the DeepMSA2 pipeline for uniform protein single- and multichain multiple-sequence alignment (MSA) construction. Large-scale benchmarks show that DeepMSA2 MSAs can remarkably increase the accuracy of protein tertiary and quaternary structure predictions compared with current state-of-the-art methods. An integrated pipeline with DeepMSA2 participated in the most recent CASP15 experiment and created complex structural models with considerably higher quality than the AlphaFold2-Multimer server (v.2.2.0). Detailed data analyses show that the major advantage of DeepMSA2 lies in its balanced alignment search and effective model selection, and in the power of integrating huge metagenomics databases. These results demonstrate a new avenue to improve deep learning protein structure prediction through advanced MSA construction and provide additional evidence that optimization of input information to deep learning-based structure prediction methods must be considered with as much care as the design of the predictor itself.

more » « less
BioLiP2: an updated structure database for biologically relevant ligand–protein interactions

https://doi.org/10.1093/nar/gkad630

Zhang, Chengxin ; Zhang, Xi ; Freddolino, Peter L. ; Zhang, Yang ( July 2023 , Nucleic Acids Research)

Abstract
With the progress of structural biology, the Protein Data Bank (PDB) has witnessed rapid accumulation of experimentally solved protein structures. Since many structures are determined with purification and crystallization additives that are unrelated to a protein's in vivo function, it is nontrivial to identify the subset of protein–ligand interactions that are biologically relevant. We developed the BioLiP2 database (https://zhanggroup.org/BioLiP) to extract biologically relevant protein–ligand interactions from the PDB database. BioLiP2 assesses the functional relevance of the ligands by geometric rules and experimental literature validations. The ligand binding information is further enriched with other function annotations, including Enzyme Commission numbers, Gene Ontology terms, catalytic sites, and binding affinities collected from other databases and a manual literature survey. Compared to its predecessor BioLiP, BioLiP2 offers significantly greater coverage of nucleic acid-protein interactions, and interactions involving large complexes that are unavailable in PDB format. BioLiP2 also integrates cutting-edge structural alignment algorithms with state-of-the-art structure prediction techniques, which for the first time enables composite protein structure and sequence-based searching and significantly enhances the usefulness of the database in structure-based function annotations. With these new developments, BioLiP2 will continue to be an important and comprehensive database for docking, virtual screening, and structure-based protein function analyses.

more » « less
TripletGO: Integrating Transcript Expression Profiles with Protein Homology Inferences for Gene Function Prediction

https://doi.org/10.1016/j.gpb.2022.03.001

Zhu, Yi-Heng ; Zhang, Chengxin ; Liu, Yan ; Omenn, Gilbert S. ; Freddolino, Peter L. ; Yu, Dong-Jun ; Zhang, Yang ( October 2022 , Genomics, Proteomics & Bioinformatics)

Full Text Available
GPU-I-TASSER: a GPU accelerated I-TASSER protein structure prediction tool

https://doi.org/10.1093/bioinformatics/btab871

MacCarthy, Elijah A. ; Zhang, Chengxin ; Zhang, Yang ; KC, Dukka B. ; Cowen, ed., Lenore ( January 2022 , Bioinformatics)

Abstract Motivation
Accurate and efficient predictions of protein structures play an important role in understanding their functions. Iterative Threading Assembly Refinement (I-TASSER) is one of the most successful and widely used protein structure prediction methods in the recent community-wide CASP experiments. Yet, the computational efficiency of I-TASSER is one of the limiting factors that prevent its application for large-scale structure modeling.
Results
We present I-TASSER for Graphics Processing Units (GPU-I-TASSER), a GPU accelerated I-TASSER protein structure prediction tool for fast and accurate protein structure prediction. Our implementation is based on OpenACC parallelization of the replica-exchange Monte Carlo simulations to enhance the speed of I-TASSER by extending its capabilities to the GPU architecture. On a benchmark dataset of 71 protein structures, GPU-I-TASSER achieves on average a 10× speedup with comparable structure prediction accuracy compared to the CPU version of the I-TASSER.
Availability and implementation
The complete source code for GPU-I-TASSER can be downloaded and used without restriction from https://zhanggroup.org/GPU-I-TASSER/.
Supplementary information
Supplementary data are available at Bioinformatics online.

more » « less
ADDRESS: A Database of Disease-associated Human Variants Incorporating Protein Structure and Folding Stabilities

https://doi.org/10.1016/j.jmb.2021.166840

Woodard, Jaie ; Zhang, Chengxin ; Zhang, Yang ( May 2021 , Journal of Molecular Biology)
null (Ed.)
Full Text Available
Improving fragment-based ab initio protein structure assembly using low-accuracy contact-map predictions

https://doi.org/10.1038/s41467-021-25316-w

Mortuza, S. M. ; Zheng, Wei ; Zhang, Chengxin ; Li, Yang ; Pearce, Robin ; Zhang, Yang ( August 2021 , Nature Communications)

Abstract
Sequence-based contact prediction has shown considerable promise in assisting non-homologous structure modeling, but it often requires many homologous sequences and a sufficient number of correct contacts to achieve correct folds. Here, we developed a method, C-QUARK, that integrates multiple deep-learning and coevolution-based contact-maps to guide the replica-exchange Monte Carlo fragment assembly simulations. The method was tested on 247 non-redundant proteins, where C-QUARK could fold 75% of the cases with TM-scores (template-modeling scores) ≥0.5, which was 2.6 times more than that achieved by QUARK. For the 59 cases that had either low contact accuracy or few homologous sequences, C-QUARK correctly folded 6 times more proteins than other contact-based folding methods. C-QUARK was also tested on 64 free-modeling targets from the 13th CASP (critical assessment of protein structure prediction) experiment and had an average GDT_TS (global distance test) score that was 5% higher than the best CASP predictors. These data demonstrate, in a robust manner, the progress in modeling non-homologous protein structures using low-accuracy and sparse contact-map predictions.

more » « less
Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks

https://doi.org/10.1371/journal.pcbi.1008865

Li, Yang ; Zhang, Chengxin ; Bell, Eric W. ; Zheng, Wei ; Zhou, Xiaogen ; Yu, Dong-Jun ; Zhang, Yang ( March 2021 , PLOS Computational Biology)
Kolodny, Rachel (Ed.)
The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training. TripletRes was tested on a large set of 245 non-homologous proteins from CASP 11&12 and CAMEO experiments and outperformed other top methods from CASP12 by at least 58.4% for the CASP 11&12 targets and 44.4% for the CAMEO targets in the top- L long-range contact precision. On the 31 FM targets from the latest CASP13 challenge, TripletRes achieved the highest precision (71.6%) for the top- L /5 long-range contact predictions. It was also shown that a simple re-training of the TripletRes model with more proteins can lead to further improvement with precisions comparable to state-of-the-art methods developed after CASP13. These results demonstrate a novel efficient approach to extend the power of deep convolutional networks for high-accuracy medium- and long-range protein contact-map predictions starting from primary sequences, which are critical for constructing 3D structure of proteins that lack homologous templates in the PDB library.
more » « less
Full Text Available
Functions of Essential Genes and a Scale-Free Protein Interaction Network Revealed by Structure-Based Function and Interaction Prediction for a Minimal Genome

https://doi.org/10.1021/acs.jproteome.0c00359

Zhang, Chengxin ; Zheng, Wei ; Cheng, Micah ; Omenn, Gilbert S. ; Freddolino, Peter L. ; Zhang, Yang ( February 2021 , Journal of Proteome Research)
null (Ed.)
Full Text Available
Identifying the Zoonotic Origin of SARS-CoV-2 by Modeling the Binding Affinity between the Spike Receptor-Binding Domain and Host ACE2

https://doi.org/10.1021/acs.jproteome.0c00717

Huang, Xiaoqiang ; Zhang, Chengxin ; Pearce, Robin ; Omenn, Gilbert S. ; Zhang, Yang ( December 2020 , Journal of Proteome Research)
null (Ed.)
Full Text Available

« Prev Next »