skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 12, 2025

Title: Efficient High-Throughput DNA Breathing Features Generation Using Jax-EPBD
Abstract DNA breathing dynamics—transient base-pair opening and closing due to thermal fluctuations—are vital for processes like transcription, replication, and repair. Traditional models, such as the Extended Peyrard-Bishop-Dauxois (EPBD), provide insights into these dynamics but are computationally limited for long sequences. We presentJAX-EPBD, a high-throughput Langevin molecular dynamics framework leveragingJAXfor GPU-accelerated simulations, achieving up to 30x speedup and superior scalability compared to the original C-based EPBD implementation.JAX-EPBDefficiently captures time-dependent behaviors, including bubble lifetimes and base flipping kinetics, enabling genome-scale analyses. Applying it to transcription factor (TF) binding affinity prediction using SELEX datasets, we observed consistent improvements inR2values when incorporating breathing features with sequence data. Validating on the 77-bp AAV P5 promoter,JAX-EPBDrevealed sequence-specific differences in bubble dynamics correlating with transcriptional activity. These findings establishJAX-EPBDas a powerful and scalable tool for understanding DNA breathing dynamics and their role in gene regulation and transcription factor binding.  more » « less
Award ID(s):
2310113
PAR ID:
10612835
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
bioRxiv
Date Published:
Format(s):
Medium: X
Institution:
bioRxiv
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Simulating DNA breathing dynamics, for instance Extended Peyrard-Bishop-Dauxois (EPBD) model, across the entire human genome using traditional biophysical methods like pyDNA-EPBD is computationally prohibitive due to intensive techniques such as Markov Chain Monte Carlo (MCMC) and Langevin dynamics. To overcome this limitation, we propose a deep surrogate generative model utilizing a conditional Denoising Diffusion Probabilistic Model (DDPM) trained on DNA sequence-EPBD feature pairs. This surrogate model efficiently generates high-fidelity DNA breathing features conditioned on DNA sequences, reducing computational time from months to hours–a speedup of over 1000 times. By integrating these features into the EPBDxDNABERT-2 model, we enhance the accuracy of transcription factor (TF) binding site predictions. Experiments demonstrate that the surrogate-generated features perform comparably to those obtained from the original EPBD framework, validating the model’s efficacy and fidelity. This advancement enables real-time, genome-wide analyses, significantly accelerating genomic research and offering powerful tools for disease understanding and therapeutic development. 
    more » « less
  2. Abstract Widespread manganese-sensing transcriptional riboswitches effect the dependable gene regulation needed for bacterial manganese homeostasis in changing environments. Riboswitches – like most structured RNAs – are believed to fold co-transcriptionally, subject to both ligand binding and transcription events; yet how these processes are orchestrated for robust regulation is poorly understood. Through a combination of single-molecule and bulk approaches, we discover how a single Mn2+ion and the transcribing RNA polymerase (RNAP), paused immediately downstream by a DNA template sequence, are coordinated by the bridging switch helix P1.1 in the representativeLactococcus lactisriboswitch. This coordination achieves a heretofore-overlooked semi-docked global conformation of the nascent RNA, P1.1 base pair stabilization, transcription factor NusA ejection, and RNAP pause extension, thereby enforcing transcription readthrough. Our work demonstrates how a central, adaptable RNA helix functions analogous to a molecular fulcrum of a first-class lever system to integrate disparate signals for finely balanced gene expression control. 
    more » « less
  3. Abstract Bacteria contain conserved mechanisms to control the intracellular levels of metal ions. Metalloregulatory transcription factors bind metal cations and play a central role in regulating gene expression of metal transporters. Often, these transcription factors regulate transcription by binding to a specific DNA sequence in the promoter region of target genes. Understanding the preferred DNA‐binding sequence for transcriptional regulators can help uncover novel gene targets and provide insight into the biological role of the transcription factor in the host organism. Here, we identify consensus DNA‐binding sequences and subsequent transcription regulatory networks for two metalloregulators from the ferric uptake regulator (FUR) and diphtheria toxin repressor (DtxR) superfamilies inThermus thermophilusHB8. By homology search, we classify the DtxR homolog as a manganese‐specific, MntR (TtMntR), and the FUR homolog as a peroxide‐sensing, PerR (TtPerR). Both transcription factors repress separate ZIP transporter genes in vivo, andTtPerR acts as a bifunctional transcription regulator by activating the expression of ferric and hemin transport systems. We showTtPerR andTtMntR bind DNA in the presence of manganese in vitro and in vivo; however,TtPerR is unable to bind DNA in the presence of iron, likely due to iron‐mediated histidine oxidation. Unlike canonical PerR homologs,TtPerR does not appear to contribute to peroxide detoxification. Instead, theTtPerR regulon and DNA binding sequence are more reminiscent of Fur or Mur homologs. Collectively, these results highlight the similarities and differences between two metalloregulatory superfamilies and underscore the interplay of manganese and iron in transcription factor regulation. 
    more » « less
  4. Abstract Transcription factors carry long intrinsically disordered regions often containing multiple activation domains. Despite numerous recent high‐throughput identifications and characterizations of activation domains, the interplay between sequence motifs, activation domains, and regulator binding in intrinsically disordered transcription factor regions remains unresolved. Here, we map sequence motifs and activation domains in anArabidopsis thalianaNAC transcription factor clade, revealing that although sequence motifs and activation domains often coincide, no systematic overlap exists. Biophysical analyses using NMR spectroscopy show that the long intrinsically disordered region of senescence‐associated transcription factor ANAC046 is devoid of residual structure. We identify two activation domain/sequence motif regions, one at each end that both bind a panel of six positive and negative regulator domains from biologically relevant regulators promiscuously. Binding affinities measured using isothermal titration calorimetry reveal a hierarchy for regulator binding of the two ANAC046 activation domain/sequence motif regions defining these as regulatory hotspots. Despite extensive dynamic intramolecular contacts along the disordered chain revealed using paramagnetic relaxation enhancement experiments and simulations, the regions remain uncoupled in binding. Together, the results imply rheostatic regulation by ANAC046 through concentration‐dependent regulator competition, a mechanism likely mirrored in other transcription factors with distantly located activation domains. 
    more » « less
  5. Abstract We provide a functional characterization of transcription factor NF-κB in protists and provide information about the evolution and diversification of this biologically important protein. We characterized NF-κB in two protists using phylogenetic, cellular, and biochemical techniques. NF-κB of the holozoanCapsaspora owczarzaki(Co) has an N-terminal DNA-binding domain and a C-terminal Ankyrin repeat (ANK) domain, and its DNA-binding specificity is more similar to metazoan NF-κB proteins than to Rel proteins. Removal of the ANK domain allowsCo-NF-κB to enter the nucleus, bind DNA, and activate transcription. However, C-terminal processing ofCo-NF-κB is not induced by IκB kinases in human cells. OverexpressedCo-NF-κB localizes to the cytoplasm inCocells.Co-NF-κB mRNA and DNA-binding levels differ across threeCapsasporalife stages. RNA-sequencing and GO analyses identify possible gene targets ofCo-NF-κB. Three NF-κB-like proteins from the choanoflagellateAcanthoeca spectabilis(As) contain conserved Rel Homology domain sequences, but lack C-terminal ANK repeats. All threeAs-NF-κB proteins constitutively enter the nucleus of cells, but differ in their DNA-binding abilities, transcriptional activation activities, and dimerization properties. These results provide a basis for understanding the evolutionary origins of this key transcription factor and could have implications for the origins of regulated immunity in higher taxa. 
    more » « less