skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Efficient High-Throughput DNA Breathing Features Generation Using Jax-EPBD
Abstract DNA breathing dynamics—transient base-pair opening and closing due to thermal fluctuations—are vital for processes like transcription, replication, and repair. Traditional models, such as the Extended Peyrard-Bishop-Dauxois (EPBD), provide insights into these dynamics but are computationally limited for long sequences. We presentJAX-EPBD, a high-throughput Langevin molecular dynamics framework leveragingJAXfor GPU-accelerated simulations, achieving up to 30x speedup and superior scalability compared to the original C-based EPBD implementation.JAX-EPBDefficiently captures time-dependent behaviors, including bubble lifetimes and base flipping kinetics, enabling genome-scale analyses. Applying it to transcription factor (TF) binding affinity prediction using SELEX datasets, we observed consistent improvements inR2values when incorporating breathing features with sequence data. Validating on the 77-bp AAV P5 promoter,JAX-EPBDrevealed sequence-specific differences in bubble dynamics correlating with transcriptional activity. These findings establishJAX-EPBDas a powerful and scalable tool for understanding DNA breathing dynamics and their role in gene regulation and transcription factor binding.  more » « less
Award ID(s):
2310113
PAR ID:
10612835
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
bioRxiv
Date Published:
Format(s):
Medium: X
Institution:
bioRxiv
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Widespread manganese-sensing transcriptional riboswitches effect the dependable gene regulation needed for bacterial manganese homeostasis in changing environments. Riboswitches – like most structured RNAs – are believed to fold co-transcriptionally, subject to both ligand binding and transcription events; yet how these processes are orchestrated for robust regulation is poorly understood. Through a combination of single-molecule and bulk approaches, we discover how a single Mn2+ion and the transcribing RNA polymerase (RNAP), paused immediately downstream by a DNA template sequence, are coordinated by the bridging switch helix P1.1 in the representativeLactococcus lactisriboswitch. This coordination achieves a heretofore-overlooked semi-docked global conformation of the nascent RNA, P1.1 base pair stabilization, transcription factor NusA ejection, and RNAP pause extension, thereby enforcing transcription readthrough. Our work demonstrates how a central, adaptable RNA helix functions analogous to a molecular fulcrum of a first-class lever system to integrate disparate signals for finely balanced gene expression control. 
    more » « less
  2. Abstract Simulating DNA breathing dynamics, for instance Extended Peyrard-Bishop-Dauxois (EPBD) model, across the entire human genome using traditional biophysical methods like pyDNA-EPBD is computationally prohibitive due to intensive techniques such as Markov Chain Monte Carlo (MCMC) and Langevin dynamics. To overcome this limitation, we propose a deep surrogate generative model utilizing a conditional Denoising Diffusion Probabilistic Model (DDPM) trained on DNA sequence-EPBD feature pairs. This surrogate model efficiently generates high-fidelity DNA breathing features conditioned on DNA sequences, reducing computational time from months to hours–a speedup of over 1000 times. By integrating these features into the EPBDxDNABERT-2 model, we enhance the accuracy of transcription factor (TF) binding site predictions. Experiments demonstrate that the surrogate-generated features perform comparably to those obtained from the original EPBD framework, validating the model’s efficacy and fidelity. This advancement enables real-time, genome-wide analyses, significantly accelerating genomic research and offering powerful tools for disease understanding and therapeutic development. 
    more » « less
  3. Abstract Bacteria contain conserved mechanisms to control the intracellular levels of metal ions. Metalloregulatory transcription factors bind metal cations and play a central role in regulating gene expression of metal transporters. Often, these transcription factors regulate transcription by binding to a specific DNA sequence in the promoter region of target genes. Understanding the preferred DNA‐binding sequence for transcriptional regulators can help uncover novel gene targets and provide insight into the biological role of the transcription factor in the host organism. Here, we identify consensus DNA‐binding sequences and subsequent transcription regulatory networks for two metalloregulators from the ferric uptake regulator (FUR) and diphtheria toxin repressor (DtxR) superfamilies inThermus thermophilusHB8. By homology search, we classify the DtxR homolog as a manganese‐specific, MntR (TtMntR), and the FUR homolog as a peroxide‐sensing, PerR (TtPerR). Both transcription factors repress separate ZIP transporter genes in vivo, andTtPerR acts as a bifunctional transcription regulator by activating the expression of ferric and hemin transport systems. We showTtPerR andTtMntR bind DNA in the presence of manganese in vitro and in vivo; however,TtPerR is unable to bind DNA in the presence of iron, likely due to iron‐mediated histidine oxidation. Unlike canonical PerR homologs,TtPerR does not appear to contribute to peroxide detoxification. Instead, theTtPerR regulon and DNA binding sequence are more reminiscent of Fur or Mur homologs. Collectively, these results highlight the similarities and differences between two metalloregulatory superfamilies and underscore the interplay of manganese and iron in transcription factor regulation. 
    more » « less
  4. Abstract Transcription factors carry long intrinsically disordered regions often containing multiple activation domains. Despite numerous recent high‐throughput identifications and characterizations of activation domains, the interplay between sequence motifs, activation domains, and regulator binding in intrinsically disordered transcription factor regions remains unresolved. Here, we map sequence motifs and activation domains in anArabidopsis thalianaNAC transcription factor clade, revealing that although sequence motifs and activation domains often coincide, no systematic overlap exists. Biophysical analyses using NMR spectroscopy show that the long intrinsically disordered region of senescence‐associated transcription factor ANAC046 is devoid of residual structure. We identify two activation domain/sequence motif regions, one at each end that both bind a panel of six positive and negative regulator domains from biologically relevant regulators promiscuously. Binding affinities measured using isothermal titration calorimetry reveal a hierarchy for regulator binding of the two ANAC046 activation domain/sequence motif regions defining these as regulatory hotspots. Despite extensive dynamic intramolecular contacts along the disordered chain revealed using paramagnetic relaxation enhancement experiments and simulations, the regions remain uncoupled in binding. Together, the results imply rheostatic regulation by ANAC046 through concentration‐dependent regulator competition, a mechanism likely mirrored in other transcription factors with distantly located activation domains. 
    more » « less
  5. Svensson, Sarah L (Ed.)
    ABSTRACT In starvingBacillus subtilisbacteria,the initiation of two survival programs—biofilm formation and sporulation—is controlled by the same phosphorylated master regulator, Spo0A~P. Its gene,spo0A,is transcribed from two promoters, Pvand Ps,that are, respectively, regulated by RNA polymerase (RNAP) holoenzymes bearing σAand σH. Notably, transcription is directly autoregulated by Spo0A~P binding sites known as 0A1, 0A2, and 0A3 box, located in between the two promoters. It remains unclear whether, at the onset of starvation, these boxes activate or repressspo0Aexpression, and whether the Spo0A~P transcriptional feedback plays a role in the increase inspo0Aexpression. Based on the experimental data of the promoter activities under systematic perturbation of the promoter architecture, we developed a biophysical model of transcriptional regulation ofspo0Aby Spo0A~P binding to each of the 0A boxes. The model predicts that Spo0A~P binding to its boxes does not affect the RNAP recruitment to the promoters but instead affects the transcriptional initiation rate. Moreover, the effects of Spo0A~P binding to 0A boxes are mainly repressive and saturated early at the onset of starvation. Therefore, the increase inspo0Aexpression is mainly driven by the increase in RNAP holoenzyme levels. Additionally, we reveal that Spo0A~P affinity to 0A boxes is strongest at 0A3 and weakest at 0A2 and that there are attractive forces between the occupied 0A boxes. Our findings, in addition to clarifying how the sporulation master regulator is controlled, offer a framework to predict regulatory outcomes of complex gene-regulatory mechanisms. IMPORTANCECell differentiation is often critical for survival. In bacteria, differentiation decisions are controlled by transcriptional master regulators under transcriptional feedback control. Therefore, understanding how master regulators are transcriptionally regulated is required to understand differentiation. However, in many cases, the underlying regulation is complex, with multiple transcription factor binding sites and multiple promoters, making it challenging to dissect the exact mechanisms. Here, we address this problem for theBacillus subtilismaster regulator Spo0A. Using a biophysical model, we quantitatively characterize the effect of individual transcription factor binding sites on eachspo0Apromoter. Furthermore, the model allows us to identify the specific transcription step that is affected by transcription factor binding. Such a model is promising for the quantitative study of a wide range of master regulators involved in transcriptional feedback. 
    more » « less