Abstract Substantial progresses in protein structure prediction have been made by utilizing deep‐learning and residue‐residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning‐based protein inter‐residue distance predictor to improve template‐free (ab initio) tertiary structure prediction, (b) an enhanced template‐based tertiary structure prediction method, and (c) distance‐based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter‐domain structure prediction. The results demonstrate that the template‐free modeling based on deep learning and residue‐residue distance prediction can predict the correct topology for almost all template‐based modeling targets and a majority of hard targets (template‐free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template‐free modeling performs better than the template‐based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template‐free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available athttps://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3andhttps://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.
more »
« less
qFit 3: Protein and ligand multiconformer modeling for X‐ray crystallographic and single‐particle cryo‐EM density maps
Abstract New X‐ray crystallography and cryo‐electron microscopy (cryo‐EM) approaches yield vast amounts of structural data from dynamic proteins and their complexes. Modeling the full conformational ensemble can provide important biological insights, but identifying and modeling an internally consistent set of alternate conformations remains a formidable challenge. qFit efficiently automates this process by generating a parsimonious multiconformer model. We refactored qFit from a distributed application into software that runs efficiently on a small server, desktop, or laptop. We describe the new qFit 3 software and provide some examples. qFit 3 is open‐source under the MIT license, and is available athttps://github.com/ExcitedStates/qfit-3.0.
more »
« less
- Award ID(s):
- 1231306
- PAR ID:
- 10454329
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Protein Science
- Volume:
- 30
- Issue:
- 1
- ISSN:
- 0961-8368
- Page Range / eLocation ID:
- p. 270-285
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
dadi-cli: Automated and distributed population genetic model inference from allele frequency spectraAbstract Summarydadi is a popular software package for inferring models of demographic history and natural selection from population genomic data. But using dadi requires Python scripting and manual parallelization of optimization jobs. We developed dadi-cli to simplify dadi usage and also enable straighforward distributed computing. Availability and Implementationdadi-cli is implemented in Python and released under the Apache License 2.0. The source code is available athttps://github.com/xin-huang/dadi-cli. dadi-cli can be installed via PyPI and conda, and is also available through Cacao on Jetstream2https://cacao.jetstream-cloud.org/.more » « less
-
Abstract We present a new method and software tool called that applies a pangenome index to the problem of inferring genotypes from short-read sequencing data. The method uses a novel indexing structure called the marker array. Using the marker array, we can genotype variants with respect from large panels like the 1000 Genomes Project while reducing the reference bias that results when aligning to a single linear reference. can infer accurate genotypes in less time and memory compared to existing graph-based methods. The method is implemented in the open source software tool available athttps://github.com/alshai/rowbowt.more » « less
-
Abstract We present a critical analysis of physics-informed neural operators (PINOs) to solve partial differential equations (PDEs) that are ubiquitous in the study and modeling of physics phenomena using carefully curated datasets. Further, we provide a benchmarking suite which can be used to evaluate PINOs in solving such problems. We first demonstrate that our methods reproduce the accuracy and performance of other neural operators published elsewhere in the literature to learn the 1D wave equation and the 1D Burgers equation. Thereafter, we apply our PINOs to learn new types of equations, including the 2D Burgers equation in the scalar, inviscid and vector types. Finally, we show that our approach is also applicable to learn the physics of the 2D linear and nonlinear shallow water equations, which involve three coupled PDEs. We release our artificial intelligence surrogates and scientific software to produce initial data and boundary conditions to study a broad range of physically motivated scenarios. We provide thesource code, an interactivewebsiteto visualize the predictions of our PINOs, and a tutorial for their use at theData and Learning Hub for Science.more » « less
-
Mathematical models based on systems of ordinary differential equations (ODEs) are frequently applied in various scientific fields to assess hypotheses, estimate key model parameters, and generate predictions about the system's state. To support their application, we present a comprehensive, easy‐to‐use, and flexible MATLAB toolbox,QuantDiffForecast, and associated tutorial to estimate parameters and generate short‐term forecasts with quantified uncertainty from dynamical models based on systems of ODEs. We provide software (https://github.com/gchowell/paramEstimation_forecasting_ODEmodels/) and detailed guidance on estimating parameters and forecasting time‐series trajectories that are characterized using ODEs with quantified uncertainty through a parametric bootstrapping approach. It includes functions that allow the user to infer model parameters and assess forecasting performance for different ODE models specified by the user, using different estimation methods and error structures in the data. The tutorial is intended for a diverse audience, including students training in dynamic systems, and will be broadly applicable to estimate parameters and generate forecasts from models based on ODEs. The functions included in the toolbox are illustrated using epidemic models with varying levels of complexity applied to data from the 1918 influenza pandemic in San Francisco. A tutorial video that demonstrates the functionality of the toolbox is included.more » « less
An official website of the United States government
