skip to main content


Title: OSPREY 3.0: Open‐source protein redesign for you, with powerful new features

We presentosprey3.0, a new and greatly improved release of theospreyprotein design software.Osprey3.0 features a convenient new Python interface, which greatly improves its ease of use. It is over two orders of magnitude faster than previous versions ofospreywhen running the same algorithms on the same hardware. Moreover,osprey3.0 includes several new algorithms, which introduce substantial speedups as well as improved biophysical modeling. It also includes GPU support, which provides an additional speedup of over an order of magnitude. Like previous versions ofosprey,osprey3.0 offers a unique package of advantages over other design software, including provable design algorithms that account for continuous flexibility during design and model conformational entropy. Finally, we show here empirically thatosprey3.0 accurately predicts the effect of mutations on protein–protein binding.Osprey3.0 is available athttp://www.cs.duke.edu/donaldlab/osprey.phpas free and open‐source software. © 2018 Wiley Periodicals, Inc.

 
more » « less
NSF-PAR ID:
10078269
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Journal of Computational Chemistry
Volume:
39
Issue:
30
ISSN:
0192-8651
Page Range / eLocation ID:
p. 2494-2507
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    We consider the problem of covering multiple submodular constraints. Given a finite ground setN, a weight function$$w: N \rightarrow \mathbb {R}_+$$w:NR+,rmonotone submodular functions$$f_1,f_2,\ldots ,f_r$$f1,f2,,froverNand requirements$$k_1,k_2,\ldots ,k_r$$k1,k2,,krthe goal is to find a minimum weight subset$$S \subseteq N$$SNsuch that$$f_i(S) \ge k_i$$fi(S)kifor$$1 \le i \le r$$1ir. We refer to this problem asMulti-Submod-Coverand it was recently considered by Har-Peled and Jones (Few cuts meet many point sets. CoRR.arxiv:abs1808.03260Har-Peled and Jones 2018) who were motivated by an application in geometry. Even with$$r=1$$r=1Multi-Submod-Covergeneralizes the well-known Submodular Set Cover problem (Submod-SC), and it can also be easily reduced toSubmod-SC. A simple greedy algorithm gives an$$O(\log (kr))$$O(log(kr))approximation where$$k = \sum _i k_i$$k=ikiand this ratio cannot be improved in the general case. In this paper, motivated by several concrete applications, we consider two ways to improve upon the approximation given by the greedy algorithm. First, we give a bicriteria approximation algorithm forMulti-Submod-Coverthat covers each constraint to within a factor of$$(1-1/e-\varepsilon )$$(1-1/e-ε)while incurring an approximation of$$O(\frac{1}{\epsilon }\log r)$$O(1ϵlogr)in the cost. Second, we consider the special case when each$$f_i$$fiis a obtained from a truncated coverage function and obtain an algorithm that generalizes previous work on partial set cover (Partial-SC), covering integer programs (CIPs) and multiple vertex cover constraints Bera et al. (Theoret Comput Sci 555:2–8 Bera et al. 2014). Both these algorithms are based on mathematical programming relaxations that avoid the limitations of the greedy algorithm. We demonstrate the implications of our algorithms and related ideas to several applications ranging from geometric covering problems to clustering with outliers. Our work highlights the utility of the high-level model and the lens of submodularity in addressing this class of covering problems.

     
    more » « less
  2. Abstract

    CHESS 3 represents an improved human gene catalog based on nearly 10,000 RNA-seq experiments across 54 body sites. It significantly improves current genome annotation by integrating the latest reference data and algorithms, machine learning techniques for noise filtering, and new protein structure prediction methods. CHESS 3 contains 41,356 genes, including 19,839 protein-coding genes and 158,377 transcripts, with 14,863 protein-coding transcripts not in other catalogs. It includes all MANE transcripts and at least one transcript for most RefSeq and GENCODE genes. On the CHM13 human genome, the CHESS 3 catalog contains an additional 129 protein-coding genes. CHESS 3 is available athttp://ccb.jhu.edu/chess.

     
    more » « less
  3. Abstract

    Structural information of protein–protein interactions is essential for characterization of life processes at the molecular level. While a small fraction of known protein interactions has experimentally determined structures, computational modeling of protein complexes (protein docking) has to fill the gap. TheDockgroundresource (http://dockground.compbio.ku.edu) provides a collection of datasets for the development and testing of protein docking techniques. Currently,Dockgroundcontains datasets for the bound and the unbound (experimentally determined and simulated) protein structures, model–model complexes, docking decoys of experimentally determined and modeled proteins, and templates for comparative docking. TheDockgroundbound proteins dataset is a core set, from which otherDockgrounddatasets are generated. It is devised as a relational PostgreSQL database containing information on experimentally determined protein–protein complexes. This report on theDockgroundresource describes current status of the datasets, new automated update procedures and further development of the core datasets. We also present a newDockgroundinteractive web interface, which allows search by various parameters, such as release date, multimeric state, complex type, structure resolution, and so on, visualization of the search results with a number of customizable parameters, as well as downloadable datasets with predefined levels of sequence and structure redundancy.

     
    more » « less
  4. Abstract

    We present Symphony, a compilation of 262 cosmological, cold-dark-matter-only zoom-in simulations spanning four decades of host halo mass, from 1011–1015M. This compilation includes three existing simulation suites at the cluster and Milky Way–mass scales, and two new suites: 39 Large Magellanic Cloud-mass (1011M) and 49 strong-lens-analog (1013M) group-mass hosts. Across the entire host halo mass range, the highest-resolution regions in these simulations are resolved with a dark matter particle mass of ≈3 × 10−7times the host virial mass and a Plummer-equivalent gravitational softening length of ≈9 × 10−4times the host virial radius, on average. We measure correlations between subhalo abundance and host concentration, formation time, and maximum subhalo mass, all of which peak at the Milky Way host halo mass scale. Subhalo abundances are ≈50% higher in clusters than in lower-mass hosts at fixed sub-to-host halo mass ratios. Subhalo radial distributions are approximately self-similar as a function of host mass and are less concentrated than hosts’ underlying dark matter distributions. We compare our results to the semianalytic modelGalacticus, which predicts subhalo mass functions with a higher normalization at the low-mass end and radial distributions that are slightly more concentrated than Symphony. We useUniverseMachineto model halo and subhalo star formation histories in Symphony, and we demonstrate that these predictions resolve the formation histories of the halos that host nearly all currently observable satellite galaxies in the universe. To promote open use of Symphony, data products are publicly available athttp://web.stanford.edu/group/gfc/symphony.

     
    more » « less
  5. Abstract

    Characterization of life processes at the molecular level requires structural details of protein interactions. The number of experimentally determined structures of protein–protein complexes accounts only for a fraction of known protein interactions. This gap in structural description of the interactome has to be bridged by modeling. An essential part of the development of structural modeling/docking techniques for protein interactions is databases of protein–protein complexes. They are necessary for studying protein interfaces, providing a knowledge base for docking algorithms, and developing intermolecular potentials, search procedures, and scoring functions. Development of protein–protein docking techniques requires thorough benchmarking of different parts of the docking protocols on carefully curated sets of protein–protein complexes. We present a comprehensive description of the Dockgroundresource (http://dockground.compbio.ku.edu) for structural modeling of protein interactions, including previously unpublished unbound docking benchmark set 4, and the X‐ray docking decoy set 2. The resource offers a variety of interconnected datasets of protein–protein complexes and other data for the development and testing of different aspects of protein docking methodologies. Based on protein–protein complexes extracted from the PDB biounit files, Dockgroundoffers sets of X‐ray unbound, simulated unbound, model, and docking decoy structures. All datasets are freely available for download, as a whole or selecting specific structures, through a user‐friendly interface on one integrated website.

     
    more » « less