skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on November 6, 2025

Title: Physics-Based Machine Learning Trains Hamiltonians and Decodes the Sequence-Conformation Relation in the Disordered Proteome
Intrinsically disordered proteins and regions (IDPs) are involved in vital biological processes. To understand the IDP function, often controlled by conformation, we need to find the link between sequence and conformation. We decode this link by integrating theory, simulation, and machine learning (ML) where sequence-dependent electrostatics is modeled analytically while nonelectrostatic interaction is extracted from simulations for many sequences and subsequently trained using ML. The resulting Hamiltonian, combining physics-based electrostatics and machine-learned nonelectrostatics, accurately predicts sequence-specific global and local measures of conformations beyond the original observable used from the simulation. This is in contrast to traditional ML approaches that train and predict a specific observable, not a Hamiltonian. Our formalism reproduces experimental measurements, predicts multiple conformational features directly from sequence with high throughput that will give insights into IDP design and evolution, and illustrates the broad utility of using physics-based ML to train unknown parts of a Hamiltonian, rather than a specific observable, in combination with known physics.  more » « less
Award ID(s):
2213103
PAR ID:
10572189
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
American Chemical Society
Date Published:
Journal Name:
Journal of Chemical Theory and Computation
Volume:
20
Issue:
22
ISSN:
1549-9618
Page Range / eLocation ID:
10266 to 10274
Subject(s) / Keyword(s):
intrinsically disordered proteins machine learning polymer physics
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Clusters of hydrophobic residues are known to promote structured protein stability and drive protein aggregation. Recent work has shown that identifying contiguous hydrophobic residue clusters (termed “blobs”) has proven useful in both intrinsically disordered protein (IDP) simulation and human genome studies. However, a graphical interface was unavailable. Here, we present the blobulator: an interactive and intuitive web interface to detect intrinsic modularity in any protein sequence based on hydrophobicity. We demonstrate three use cases of the blobulator and show how identifying blobs with biologically relevant parameters provides useful information about a globular protein, two orthologous membrane proteins, and an IDP. Other potential applications are discussed, including: predicting protein segments with critical roles in tertiary interactions, providing a definition of local order and disorder with clear edges, and aiding in predicting protein features from sequence. The blobulator GUI can be found atwww.blobulator.branniganlab.org, and the source code with pip installable command line tool can be found on GitHub at www.GitHub.com/BranniganLab/blobulator. 
    more » « less
  2. Abstract Conformations and dynamics of an intrinsically disordered protein (IDP) depend on its composition of charged and uncharged amino acids, and their specific placement in the protein sequence. In general, the charge (positive or negative) on an amino acid residue in the protein is not a fixed quantity. Each of the ionizable groups can exist in an equilibrated distribution of fully ionized state (monopole) and an ion-pair (dipole) state formed between the ionizing group and its counterion from the background electrolyte solution. The dipole formation (counterion condensation) depends on the protein conformation, which in turn depends on the distribution of charges and dipoles on the molecule. Consequently, effective charges of ionizable groups in the IDP backbone may differ from their chemical charges in isolation—a phenomenon termed charge-regulation. Accounting for the inevitable dipolar interactions, that have so far been ignored, and using a self-consistent procedure, we present a theory of charge-regulation as a function of sequence, temperature, and ionic strength. The theory quantitatively agrees with both charge reduction and salt-dependent conformation data of Prothymosin-alpha and makes several testable predictions. We predict charged groups are less ionized in sequences where opposite charges are well mixed compared to sequences where they are strongly segregated. Emergence of dipolar interactions from charge-regulation allows spontaneous coexistence of two phases having different conformations and charge states, sensitively depending on the charge patterning. These findings highlight sequence dependent charge-regulation and its potential exploitation by biological regulators such as phosphorylation and mutations in controlling protein conformation and function. 
    more » « less
  3. First-principle simulations are at the heart of the high-energy physics research program. They link the vast data output of multi-purpose detectors with fundamental theory predictions and interpretation. This review illustrates a wide range of applications of modern machine learning to event generation and simulation-based inference, including conceptional developments driven by the specific requirements of particle physics. New ideas and tools developed at the interface of particle physics and machine learning will improve the speed and precision of forward simulations, handle the complexity of collision data, and enhance inference as an inverse simulation problem. 
    more » « less
  4. First-principle simulations are at the heart of the high-energy physics research program. They link the vast data output of multi-purpose detectors with fundamental theory predictions and interpretation. This review illustrates a wide range of applications of modern machine learning to event generation and simulation-based inference, including conceptional developments driven by the specific requirements of particle physics. New ideas and tools developed at the interface of particle physics and machine learning will improve the speed and precision of forward simulations, handle the complexity of collision data, and enhance inference as an inverse simulation problem. 
    more » « less
  5. Abstract Large discrepancies between well-mixed reaction rates and effective reactions rates estimated under fluid flow conditions have been a major issue for predicting reactive transport in porous media systems. In this study, we introduce a framework that accurately predicts effective reaction rates directly from pore structural features by combining 3D pore-scale numerical simulations with machine learning (ML). We first perform pore-scale reactive transport simulations with fluid–solid reactions in hundreds of porous media and calculate effective reaction rates from pore-scale concentration fields. We then train a Random Forests model with 11 pore structural features and effective reaction rates to quantify the importance of structural features in determining effective reaction rates. Based on the importance information, we train artificial neural networks with varying number of features and demonstrate that effective reaction rates can be accurately predicted with only three pore structural features, which are specific surface, pore sphericity, and coordination number. Finally, global sensitivity analyses using the ML model elucidates how the three structural features affect effective reaction rates. The proposed framework enables accurate predictions of effective reaction rates directly from a few measurable pore structural features, and the framework is readily applicable to a wide range of applications involving porous media flows. 
    more » « less