skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on July 8, 2026

Title: Predicting rare DNA conformations via dynamical graphical models: a case study of the B→A transition
Abstract DNA exhibits local conformational preferences that affect its ability to adopt biologically relevant conformations, such as those required for binding proteins. Traditional methods, like Markov state models and molecular dynamics (MD) simulations, have advanced our understanding but often struggle to capture these rare conformational states due to high computational demands. Here, we introduce a novel AI framework based on dynamical graphical models (DGMs), a generative machine learning approach trained on equilibrium MD data, to predict DNA conformational transitions that are never seen in the MD ensembles. By leveraging local DNA interactions, DGMs generate a comprehensive transition matrix that captures both thermodynamic and kinetic properties of unsampled states, enabling accurate predictions of rare global conformations without the need for extensive sampling. Applying this model to the B→A transition, we demonstrate that DGMs can efficiently predict sequence-dependent A-DNA preferences, achieving results that align closely with replica exchange umbrella sampling simulations. DGMs provide new insights into DNA sequence–structure relationships, paving the way for applications in DNA sequence design and optimization.  more » « less
Award ID(s):
2235785
PAR ID:
10646510
Author(s) / Creator(s):
;
Publisher / Repository:
Oxford UNiversity Press
Date Published:
Journal Name:
Nucleic Acids Research
Volume:
53
Issue:
13
ISSN:
0305-1048
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Electron paramagnetic resonance (EPR) has become a powerful probe of conformational heterogeneity and dynamics of biomolecules. In this Review, we discuss different computational modeling techniques that enrich the interpretation of EPR measurements of dynamics or distance restraints. A variety of spin labels are surveyed to provide a background for the discussion of modeling tools. Molecular dynamics (MD) simulations of models containing spin labels provide dynamical properties of biomolecules and their labels. These simulations can be used to predict EPR spectra, sample stable conformations and sample rotameric preferences of label sidechains. For molecular motions longer than milliseconds, enhanced sampling strategies and de novo prediction software incorporating or validated by EPR measurements are able to efficiently refine or predict protein conformations, respectively. To sample large‐amplitude conformational transition, a coarse‐grained or an atomistic weighted ensemble (WE) strategy can be guided with EPR insights. Looking forward, we anticipate an integrative strategy for efficient sampling of alternate conformations by de novo predictions, followed by validations by systematic EPR measurements and MD simulations. Continuous pathways between alternate states can be further sampled by WE‐MD including all intermediate states. 
    more » « less
  2. Abstract Structural, regulatory and enzymatic proteins interact with DNA to maintain a healthy and functional genome. Yet, our structural understanding of how proteins interact with DNA is limited. We present MELD-DNA, a novel computational approach to predict the structures of protein–DNA complexes. The method combines molecular dynamics simulations with general knowledge or experimental information through Bayesian inference. The physical model is sensitive to sequence-dependent properties and conformational changes required for binding, while information accelerates sampling of bound conformations. MELD-DNA can: (i) sample multiple binding modes; (ii) identify the preferred binding mode from the ensembles; and (iii) provide qualitative binding preferences between DNA sequences. We first assess performance on a dataset of 15 protein–DNA complexes and compare it with state-of-the-art methodologies. Furthermore, for three selected complexes, we show sequence dependence effects of binding in MELD predictions. We expect that the results presented herein, together with the freely available software, will impact structural biology (by complementing DNA structural databases) and molecular recognition (by bringing new insights into aspects governing protein–DNA interactions). 
    more » « less
  3. Abstract Deep learning approaches like AlphaFold 2 (AF2) have revolutionized structural biology by accurately predicting the ground state structures of proteins. Recently, clustering and subsampling techniques that manipulate multiple sequence alignment (MSA) inputs into AlphaFold to generate conformational ensembles of proteins have also been proposed. Although many of these techniques have been made open source, they often require integrating multiple packages and can be challenging for researchers who have a limited programming background to employ. This is especially true when researchers are interested in subsampling to produce predictions of protein conformational ensembles, which require multiple computational steps. This manuscript introduces FastConformation, a Python-based application that integrates MSA generation, structure prediction via AF2, and interactive analysis of protein conformations and their distributions, all in one place. FastConformation is accessible through a user-friendly GUI suitable for non-programmers, allowing users to iteratively refine subsampling parameters based on their analyses to achieve diverse conformational ensembles. Starting from an amino acid sequence, users can make protein conformation predictions and analyze results in just a few hours on their local machines, which is significantly faster than traditional molecular dynamics (MD) simulations. Uniquely, by leveraging the subsampling of MSAs, our tool enables the generation of alternative protein conformations. We demonstrate the utility of FastConformation on proteins including the Abl1 kinase, LAT1 transporter, and CCR5 receptor, showcasing its ability to predict and analyze the protein conformational ensembles and effects of mutations on a variety of proteins. This tool enables a wide range of high-throughput applications in protein biochemistry, drug discovery, and protein engineering. 
    more » « less
  4. Rapid computational exploration of the free energy landscape of biological molecules remains an active area of research due to the difficulty of sampling rare state transitions in molecular dynamics (MD) simulations. In recent years, an increasing number of studies have exploited machine learning (ML) models to enhance and analyze MD simulations. Notably, unsupervised models that extract kinetic information from a set of parallel trajectories have been proposed including the variational approach for Markov processes (VAMP), VAMPNets, and time-lagged variational autoencoders (TVAE). In this work, we propose a combination of adaptive sampling with active learning of kinetic models to accelerate the discovery of the conformational landscape of biomolecules. In particular, we introduce and compare several techniques that combine kinetic models with two adaptive sampling regimes (least counts and multiagent reinforcement learning- based adaptive sampling) to enhance the exploration of conformational ensembles without introducing biasing forces. Moreover, inspired by the active learning approach of uncertainty-based sampling, we also present MaxEnt VAMPNet. This technique consists of restarting simulations from the microstates that maximize the Shannon entropy of a VAMPNet trained to perform the soft discretization of metastable states. By running simulations on two test systems, the WLALL pentapeptide and the villin headpiece subdomain, we empirically demonstrate that MaxEnt VAMPNet results in faster exploration of conformational landscapes compared with the baseline and other proposed methods. 
    more » « less
  5. Abstract This paper presents an innovative approach for predicting the relative populations of protein conformations using AlphaFold 2, an AI-powered method that has revolutionized biology by enabling the accurate prediction of protein structures. While AlphaFold 2 has shown exceptional accuracy and speed, it is designed to predict proteins’ ground state conformations and is limited in its ability to predict conformational landscapes. Here, we demonstrate how AlphaFold 2 can directly predict the relative populations of different protein conformations by subsampling multiple sequence alignments. We tested our method against nuclear magnetic resonance experiments on two proteins with drastically different amounts of available sequence data, Abl1 kinase and the granulocyte-macrophage colony-stimulating factor, and predicted changes in their relative state populations with more than 80% accuracy. Our subsampling approach worked best when used to qualitatively predict the effects of mutations or evolution on the conformational landscape and well-populated states of proteins. It thus offers a fast and cost-effective way to predict the relative populations of protein conformations at even single-point mutation resolution, making it a useful tool for pharmacology, analysis of experimental results, and predicting evolution. 
    more » « less