Abstract Substantial progresses in protein structure prediction have been made by utilizing deep‐learning and residue‐residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning‐based protein inter‐residue distance predictor to improve template‐free (ab initio) tertiary structure prediction, (b) an enhanced template‐based tertiary structure prediction method, and (c) distance‐based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter‐domain structure prediction. The results demonstrate that the template‐free modeling based on deep learning and residue‐residue distance prediction can predict the correct topology for almost all template‐based modeling targets and a majority of hard targets (template‐free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template‐free modeling performs better than the template‐based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template‐free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available athttps://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3andhttps://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.
more »
« less
qFit 3: Protein and ligand multiconformer modeling for X‐ray crystallographic and single‐particle cryo‐EM density maps
Abstract New X‐ray crystallography and cryo‐electron microscopy (cryo‐EM) approaches yield vast amounts of structural data from dynamic proteins and their complexes. Modeling the full conformational ensemble can provide important biological insights, but identifying and modeling an internally consistent set of alternate conformations remains a formidable challenge. qFit efficiently automates this process by generating a parsimonious multiconformer model. We refactored qFit from a distributed application into software that runs efficiently on a small server, desktop, or laptop. We describe the new qFit 3 software and provide some examples. qFit 3 is open‐source under the MIT license, and is available athttps://github.com/ExcitedStates/qfit-3.0.
more »
« less
- Award ID(s):
- 1231306
- PAR ID:
- 10454329
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Protein Science
- Volume:
- 30
- Issue:
- 1
- ISSN:
- 0961-8368
- Page Range / eLocation ID:
- p. 270-285
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract In recent years, significant advancements have been made in deep learning‐based computational modeling of proteins, with DeepMind's AlphaFold2 standing out as a landmark achievement. These computationally modeled protein structures not only provide atomic coordinates but also include self‐confidence metrics to assess the relative quality of the modeling, either for individual residues or the entire protein. However, these self‐confidence scores are not always reliable; for instance, poorly modeled regions of a protein may sometimes be assigned high confidence. To address this limitation, we introduce Equivariant Quality Assessment Folding (EQAFold), an enhanced framework that refines the Local Distance Difference Test prediction head of AlphaFold to generate more accurate self‐confidence scores. Our results demonstrate that EQAFold outperforms the standard AlphaFold architecture and recent model quality assessment protocols in providing more reliable confidence metrics. Source code for EQAFold is available athttps://github.com/kiharalab/EQAFold_public.more » « less
-
dadi-cli: Automated and distributed population genetic model inference from allele frequency spectraAbstract Summarydadi is a popular software package for inferring models of demographic history and natural selection from population genomic data. But using dadi requires Python scripting and manual parallelization of optimization jobs. We developed dadi-cli to simplify dadi usage and also enable straighforward distributed computing. Availability and Implementationdadi-cli is implemented in Python and released under the Apache License 2.0. The source code is available athttps://github.com/xin-huang/dadi-cli. dadi-cli can be installed via PyPI and conda, and is also available through Cacao on Jetstream2https://cacao.jetstream-cloud.org/.more » « less
-
Abstract We present a new method and software tool called that applies a pangenome index to the problem of inferring genotypes from short-read sequencing data. The method uses a novel indexing structure called the marker array. Using the marker array, we can genotype variants with respect from large panels like the 1000 Genomes Project while reducing the reference bias that results when aligning to a single linear reference. can infer accurate genotypes in less time and memory compared to existing graph-based methods. The method is implemented in the open source software tool available athttps://github.com/alshai/rowbowt.more » « less
-
Abstract We present a critical analysis of physics-informed neural operators (PINOs) to solve partial differential equations (PDEs) that are ubiquitous in the study and modeling of physics phenomena using carefully curated datasets. Further, we provide a benchmarking suite which can be used to evaluate PINOs in solving such problems. We first demonstrate that our methods reproduce the accuracy and performance of other neural operators published elsewhere in the literature to learn the 1D wave equation and the 1D Burgers equation. Thereafter, we apply our PINOs to learn new types of equations, including the 2D Burgers equation in the scalar, inviscid and vector types. Finally, we show that our approach is also applicable to learn the physics of the 2D linear and nonlinear shallow water equations, which involve three coupled PDEs. We release our artificial intelligence surrogates and scientific software to produce initial data and boundary conditions to study a broad range of physically motivated scenarios. We provide thesource code, an interactivewebsiteto visualize the predictions of our PINOs, and a tutorial for their use at theData and Learning Hub for Science.more » « less
An official website of the United States government
