skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Machine learning in biological physics: From biomolecular prediction to design
Machine learning has been proposed as an alternative to theoretical modeling when dealing with complex problems in biological physics. However, in this perspective, we argue that a more successful approach is a proper combination of these two methodologies. We discuss how ideas coming from physical modeling neuronal processing led to early formulations of computational neural networks, e.g., Hopfield networks. We then show how modern learning approaches like Potts models, Boltzmann machines, and the transformer architecture are related to each other, specifically, through a shared energy representation. We summarize recent efforts to establish these connections and provide examples on how each of these formulations integrating physical modeling and machine learning have been successful in tackling recent problems in biomolecular structure, dynamics, function, evolution, and design. Instances include protein structure prediction; improvement in computational complexity and accuracy of molecular dynamics simulations; better inference of the effects of mutations in proteins leading to improved evolutionary modeling and finally how machine learning is revolutionizing protein engineering and design. Going beyond naturally existing protein sequences, a connection to protein design is discussed where synthetic sequences are able to fold to naturally occurring motifs driven by a model rooted in physical principles. We show that this model is “learnable” and propose its future use in the generation of unique sequences that can fold into a target structure.  more » « less
Award ID(s):
1943442 2210291 2019745
PAR ID:
10554685
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
National Academy of Sciences
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
121
Issue:
27
ISSN:
0027-8424
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Computational protein design has been successful in modeling fixed backbone proteins in a single conformation. However, when modeling large ensembles of flexible proteins, current methods in protein design have been insufficient. Large barriers in the energy landscape are difficult to traverse while redesigning a protein sequence, and as a result current design methods only sample a fraction of available sequence space. We propose a new computational approach that combines traditional structure-based modeling using the ROSETTA software suite with machine learning and integer linear programming to overcome limitations in the ROSETTA sampling methods. We demonstrate the effectiveness of this method, which we call BROAD, by benchmarking the performance on increasing predicted breadth of anti-HIV antibodies. We use this novel method to increase predicted breadth of naturally-occurring antibody VRC23 against a panel of 180 divergent HIV viral strains and achieve 100% predicted binding against the panel. In addition, we compare the performance of this method to state-of-the-art multistate design in ROSETTA and show that we can outperform the existing method significantly. We further demonstrate that sequences recovered by this method recover known binding motifs of broadly neutralizing anti-HIV antibodies. Finally, our approach is general and can be extended easily to other protein systems. 
    more » « less
  2. In tribology, a considerable number of computational and experimental approaches to understand the interfacial characteristics of material surfaces in motion and tribological behaviors of materials have been considered to date. Despite being useful in providing important insights on the tribological properties of a system, at different length scales, a vast amount of data generated from these state-of-the-art techniques remains underutilized due to lack of analysis methods or limitations of existing analysis techniques. In principle, this data can be used to address intractable tribological problems including structure–property relationships in tribological systems and efficient lubricant design in a cost and time effective manner with the aid of machine learning. Specifically, data-driven machine learning methods have shown potential in unraveling complicated processes through the development of structure–property/functionality relationships based on the collected data. For example, neural networks are incredibly effective in modeling non-linear correlations and identifying primary hidden patterns associated with these phenomena. Here we present several exemplary studies that have demonstrated the proficiency of machine learning in understanding these critical factors. A successful implementation of neural networks, supervised, and stochastic learning approaches in identifying structure–property relationships have shed light on how machine learning may be used in certain tribological applications. Moreover, ranging from the design of lubricants, composites, and experimental processes to studying fretting wear and frictional mechanism, machine learning has been embraced either independently or integrated with optimization algorithms by scientists to study tribology. Accordingly, this review aims at providing a perspective on the recent advances in the applications of machine learning in tribology. The review on referenced simulation approaches and subsequent applications of machine learning in experimental and computational tribology shall motivate researchers to introduce the revolutionary approach of machine learning in efficiently studying tribology. 
    more » « less
  3. Valencia, Alfonso (Ed.)
    Abstract Motivation Protein structure prediction remains as one of the most important problems in computational biology and biophysics. In the past few years, protein residue–residue contact prediction has undergone substantial improvement, which has made it a critical driving force for successful protein structure prediction. Boosting the accuracy of contact predictions has, therefore, become the forefront of protein structure prediction. Results We show a novel contact map refinement method, ContactGAN, which uses Generative Adversarial Networks (GAN). ContactGAN was able to make a significant improvement over predictions made by recent contact prediction methods when tested on three datasets including protein structure modeling targets in CASP13 and CASP14. We show improvement of precision in contact prediction, which translated into improvement in the accuracy of protein tertiary structure models. On the other hand, observed improvement over trRosetta was relatively small, reasons for which are discussed. ContactGAN will be a valuable addition in the structure prediction pipeline to achieve an extra gain in contact prediction accuracy. Availability and implementation https://github.com/kiharalab/ContactGAN. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  4. Protein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose mutational effect transfer learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling. Using the METL framework, we pretrain transformer-based neural networks on biophysical simulation data to capture fundamental relationships between protein sequence, structure and energetics. We fine-tune METL on experimental sequence–function data to harness these biophysical signals and apply them when predicting protein properties like thermostability, catalytic activity and fluorescence. METL excels in challenging protein engineering tasks like generalizing from small training sets and position extrapolation, although existing methods that train on evolutionary signals remain powerful for many types of experimental assays. We demonstrate METL’s ability to design functional green fluorescent protein variants when trained on only 64 examples, showcasing the potential of biophysics-based protein language models for protein engineering. 
    more » « less
  5. Proteins play a central role in biology from immune recognition to brain activity. While major advances in machine learning have improved our ability to predict protein structure from sequence, determining protein function from its sequence or structure remains a major challenge. Here, we introduce holographic convolutional neural network (H-CNN) for proteins, which is a physically motivated machine learning approach to model amino acid preferences in protein structures. H-CNN reflects physical interactions in a protein structure and recapitulates the functional information stored in evolutionary data. H-CNN accurately predicts the impact of mutations on protein stability and binding of protein complexes. Our interpretable computational model for protein structure–function maps could guide design of novel proteins with desired function. 
    more » « less