skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Generating information-dense promoter sequences with optimal string packing
Dense arrangements of binding sites within nucleotide sequences can collectively influence downstream transcription rates or initiate biomolecular interactions. For example, natural promoter regions can harbor many overlapping transcription factor binding sites that influence the rate of transcription initiation. Despite the prevalence of overlapping binding sites in nature, rapid design of nucleotide sequences with many overlapping sites remains a challenge. Here, we show that this is an NP-hard problem, coined here as the nucleotide String Packing Problem (SPP). We then introduce a computational technique that efficiently assembles sets of DNA-protein binding sites into dense, contiguous stretches of double-stranded DNA. For the efficient design of nucleotide sequences spanning hundreds of base pairs, we reduce the SPP to an Orienteering Problem with integer distances, and then leverage modern integer linear programming solvers. Our method optimally packs sets of 20–100 binding sites into dense nucleotide arrays of 50–300 base pairs in 0.05–10 seconds. Unlike approximation algorithms or meta-heuristics, our approach finds provably optimal solutions. We demonstrate how our method can generate large sets of diverse sequences suitable for library generation, where the frequency of binding site usage across the returned sequences can be controlled by modulating the objective function. As an example, we then show how adding additional constraints, like the inclusion of sequence elements with fixed positions, allows for the design of bacterial promoters. The nucleotide string packing approach we present can accelerate the design of sequences with complex DNA-protein interactions. When used in combination with synthesis and high-throughput screening, this design strategy could help interrogate how complex binding site arrangements impact either gene expression or biomolecular mechanisms in varied cellular contexts.  more » « less
Award ID(s):
2324909 2143289
PAR ID:
10526795
Author(s) / Creator(s):
; ;
Editor(s):
Klumpp, Stefan
Publisher / Repository:
PLOS Computational Biology
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
20
Issue:
7
ISSN:
1553-7358
Page Range / eLocation ID:
e1012276
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    How homeodomain proteins gain sufficient specificity to control different cell fates has been a long-standing problem in developmental biology. The conserved Gsx homeodomain proteins regulate specific aspects of neural development in animals from flies to mammals, and yet they belong to a large transcription factor family that bind nearly identical DNA sequences in vitro. Here, we show that the mouse and fly Gsx factors unexpectedly gain DNA binding specificity by forming cooperative homodimers on precisely spaced and oriented DNA sites. High-resolution genomic binding assays revealed that Gsx2 binds both monomer and homodimer sites in the developing mouse ventral telencephalon. Importantly, reporter assays showed that Gsx2 mediates opposing outcomes in a DNA binding site-dependent manner: Monomer Gsx2 binding represses transcription, whereas homodimer binding stimulates gene expression. In Drosophila , the Gsx homolog, Ind, similarly represses or stimulates transcription in a site-dependent manner via an autoregulatory enhancer containing a combination of monomer and homodimer sites. Integrating these findings, we test a model showing how the homodimer to monomer site ratio and the Gsx protein levels defines gene up-regulation versus down-regulation. Altogether, these data serve as a new paradigm for how cooperative homeodomain transcription factor binding can increase target specificity and alter regulatory outcomes. 
    more » « less
  2. DNA base damage arises frequently in living cells and needs to be removed by base excision repair (BER) to prevent mutagenesis and genome instability. Both the formation and repair of base damage occur in chromatin and are conceivably affected by DNA-binding proteins such as transcription factors (TFs). However, to what extent TF binding affects base damage distribution and BER in cells is unclear. Here, we used a genome-wide damage mapping method, N -methylpurine-sequencing (NMP-seq), and characterized alkylation damage distribution and BER at TF binding sites in yeast cells treated with the alkylating agent methyl methanesulfonate (MMS). Our data show that alkylation damage formation was mainly suppressed at the binding sites of yeast TFs ARS binding factor 1 (Abf1) and rDNA enhancer binding protein 1 (Reb1), but individual hotspots with elevated damage levels were also found. Additionally, Abf1 and Reb1 binding strongly inhibits BER in vivo and in vitro, causing slow repair both within the core motif and its adjacent DNA. Repair of ultraviolet (UV) damage by nucleotide excision repair (NER) was also inhibited by TF binding. Interestingly, TF binding inhibits a larger DNA region for NER relative to BER. The observed effects are caused by the TF–DNA interaction, because damage formation and BER can be restored by depletion of Abf1 or Reb1 protein from the nucleus. Thus, our data reveal that TF binding significantly modulates alkylation base damage formation and inhibits repair by the BER pathway. The interplay between base damage formation and BER may play an important role in affecting mutation frequency in gene regulatory regions. 
    more » « less
  3. John Pham, Ph.D. Editor-in-Chief (Ed.)
    The target DNA specificity of the CRISPR-associated genome editor nuclease Cas9 is determined by complementarity to a 20-nucleotide segment in its guide RNA. However, Cas9 can bind and cleave partially complementary off-target sequences, which raises safety concerns for its use in clinical applications. Here we report crystallographic structures of Cas9 bound to bona fide off-target substrates, revealing that off-target binding is enabled by a range of non-canonical base-pairing interactions and preservation of base stacking within the guide–off-target heteroduplex. Off-target sites containing single-nucleotide deletions relative to the guide RNA are accommodated by base skipping or multiple non-canonical base pairs rather than RNA bulge formation. Additionally, PAM-distal mismatches result in duplex unpairing and induce a conformational change of the Cas9 REC lobe that perturbs its conformational activation. Together, these insights provide a structural rationale for the off-target activity of Cas9 and contribute to the improved rational design of guide RNAs and off-target prediction algorithms. 
    more » « less
  4. The transcriptional anti-silencing and DNA-binding protein, VirB, is essential for the virulence of Shigella species and, yet, sequences required for VirB-DNA binding are poorly understood. While a 7-8 bp VirB-binding site has been proposed, it was derived from studies at a single VirB-dependent promoter, icsB. Our previous in vivo studies at a different VirB-dependent promoter, icsP, found that the proposed VirB-binding site was insufficient for regulation. Instead, the required site was found to be organized as a near-perfect inverted repeat separated by a single nucleotide spacer. Thus, the proposed 7-8 bp VirB-binding site needed to be re-evaluated. Here, we engineer and validate a molecular tool to capture protein-DNA binding interactions in vivo. Our data show that a sequence organized as a near-perfect inverted repeat is required for VirB-DNA binding interactions in vivo at both the icsB and icsP promoters. Furthermore, the previously proposed VirB-binding site and multiple sites found as a result of its description (i.e., sites located at the virB, virF, spa15, and virA promoters) are not sufficient for VirB to bind in vivo using this tool. The implications of these findings are discussed. 
    more » « less
  5. Volkert, Michael R. (Ed.)
    A protein roadblock forms when a protein binds DNA and hinders translocation of other DNA binding proteins. These roadblocks can have significant effects on gene expression and regulation as well as DNA binding. Experimental methods for studying the effects of such roadblocks often target endogenous sites or introduce non-variable specific sites into DNAs to create binding sites for artificially introduced protein roadblocks. In this work, we describe a method to create programmable roadblocks using dCas9, a cleavage deficient mutant of the CRISPR effector nuclease Cas9. The programmability allows us to custom design target sites in a synthetic gene intended for in vitro studies. These target sites can be coded with multivalency—in our case, internal restriction sites which can be used in validation studies to verify complete binding of the roadblock. We provide full protocols and sequences and demonstrate how to use the internal restriction sites to verify complete binding of the roadblock. We also provide example results of the effect of DNA roadblocks on the translocation of the restriction endonuclease NdeI, which searches for its cognate site using one dimensional diffusion along DNA. 
    more » « less