skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A deep learning solution for crystallographic structure determination
The generalde novosolution of the crystallographic phase problem is difficult and only possible under certain conditions. This paper develops an initial pathway to a deep learning neural network approach for the phase problem in protein crystallography, based on a synthetic dataset of small fragments derived from a large well curated subset of solved structures in the Protein Data Bank (PDB). In particular, electron-density estimates of simple artificial systems are produced directly from corresponding Patterson maps using a convolutional neural network architecture as a proof of concept.  more » « less
Award ID(s):
2019745 1231306
PAR ID:
10512516
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Moffat, K
Publisher / Repository:
CC
Date Published:
Journal Name:
IUCrJ
Volume:
10
Issue:
4
ISSN:
2052-2525
Page Range / eLocation ID:
487 to 496
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. To verify safety and robustness of neural networks, researchers have successfully appliedabstract interpretation, primarily using the interval abstract domain. In this paper, we study the theoretical power and limits of the interval domain for neural-network verification. First, we introduce theinterval universal approximation(IUA) theorem. IUA shows that neural networks not only can approximate any continuous functionf(universal approximation) as we have known for decades, but we can find a neural network, using any well-behaved activation function, whose interval bounds are an arbitrarily close approximation of the set semantics off(the result of applyingfto a set of inputs). We call this notion of approximationinterval approximation. Our theorem generalizes the recent result of Baader et al. from ReLUs to a rich class of activation functions that we callsquashable functions. Additionally, the IUA theorem implies that we can always construct provably robust neural networks under ℓ-norm using almost any practical activation function. Second, we study the computational complexity of constructing neural networks that are amenable to precise interval analysis. This is a crucial question, as our constructive proof of IUA is exponential in the size of the approximation domain. We boil this question down to the problem of approximating the range of a neural network with squashable activation functions. We show that the range approximation problem (RA) is a Δ2-intermediate problem, which is strictly harder thanNP-complete problems, assumingcoNP⊄NP. As a result,IUA is an inherently hard problem: No matter what abstract domain or computational tools we consider to achieve interval approximation, there is no efficient construction of such a universal approximator. This implies that it is hard to construct a provably robust network, even if we have a robust network to start with. 
    more » « less
  2. Abstract Deep learning requires solving a nonconvex optimization problem of a large size to learn a deep neural network (DNN). The current deep learning model is of asingle-grade, that is, it trains a DNN end-to-end, by solving a single nonconvex optimization problem. When the layer number of the neural network is large, it is computationally challenging to carry out such a task efficiently. The complexity of the task comes from learning all weight matrices and bias vectors from one single nonconvex optimization problem of a large size. Inspired by the human education process which arranges learning in grades, we propose a multi-grade learning model: instead of solving one single optimization problem of a large size, we successively solve a number of optimization problems of small sizes, which are organized in grades, to learn a shallow neural network (a network having a few hidden layers) for each grade. Specifically, the current grade is to learn the leftover from the previous grade. In each of the grades, we learn a shallow neural network stacked on the top of the neural network, learned in the previous grades, whose parameters remain unchanged in training of the current and future grades. By dividing the task of learning a DDN into learning several shallow neural networks, one can alleviate the severity of the nonconvexity of the original optimization problem of a large size. When all grades of the learning are completed, the final neural network learned is astair-shapeneural network, which is thesuperpositionof networks learned from all grades. Such a model enables us to learn a DDN much more effectively and efficiently. Moreover, multi-grade learning naturally leads to adaptive learning. We prove that in the context of function approximation if the neural network generated by a new grade is nontrivial, the optimal error of a new grade is strictly reduced from the optimal error of the previous grade. Furthermore, we provide numerical examples which confirm that the proposed multi-grade model outperforms significantly the standard single-grade model and is much more robust to noise than the single-grade model. They include three proof-of-concept examples, classification on two benchmark data sets MNIST and Fashion MNIST with two noise rates, which is to find classifiers, functions of 784 dimensions, and as well as numerical solutions of the one-dimensional Helmholtz equation. 
    more » « less
  3. In this paper, we propose a new deep learning method for the nonlinear Poisson-Boltzmann problems with applications in computational biology. To tackle the discontinuity of the solution, e.g., across protein surfaces, we approximate the solution by a piecewise mesh-free neural network that can capture the dramatic change in the solution across the interface. The partial differential equation problem is first reformulated as a least-squares physics-informed neural network (PINN)-type problem and then discretized to an objective function using mean squared error via sampling. The solution is obtained by minimizing the designed objective function via standard training algorithms such as the stochastic gradient descent method. Finally, the effectiveness and efficiency of the neural network are validated using complex protein interfaces on various manufactured functions with different frequencies. 
    more » « less
  4. Predicting protein side-chains is important for both protein structure prediction and protein design. Modeling approaches to predict side-chains such as SCWRL4 have become one of the most widely used tools of its type due to fast and highly accurate predictions. Motivated by the recent success of AlphaFold2 in CASP14, our group adapted a 3D equivariant neural network architecture to predict protein side-chain conformations, specifically within a protein-protein interface, a problem that has not been fully addressed by AlphaFold2. 
    more » « less
  5. Protein stability is subject to environmental perturbations such as pressure and crowding, as well as sticking to other macromolecules and quinary structure. Thus, the environment inside and outside the cell plays a key role in how proteins fold, interact, and function on the scale from a few molecules to macroscopic ensembles. This review discusses three aspects of protein phase diagrams: first, the relevance of phase diagrams to protein folding and functionin vitroand in cells; next, how the evolution of protein surfaces impacts on interaction phase diagrams; and finally, how phase separation plays a role on much larger length‐scales than individual proteins or oligomers, when liquid phase‐separated regions form to assist protein function and cell homeostasis. 
    more » « less