skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A rapid and scalable approach to build synthetic repetitive hormone‐responsive promoters
Summary Advancement of DNA‐synthesis technologies has greatly facilitated the development of synthetic biology tools. However, high‐complexity DNA sequences containing tandems of short repeats are still notoriously difficult to produce synthetically, with commercial DNA synthesis companies usually rejecting orders that exceed specific sequence complexity thresholds. To overcome this limitation, we developed a simple, single‐tube reaction method that enables the generation of DNA sequences containing multiple repetitive elements. Our strategy involves commercial synthesis and PCR amplification of padded sequences that contain the repeats of interest, along with random intervening sequence stuffers that include type IIS restriction enzyme sites. GoldenBraid molecular cloning technology is then employed to remove the stuffers, rejoin the repeats together in a predefined order, and subclone the tandem(s) in a vector using a single‐tube digestion–ligation reaction. In our hands, this new approach is much simpler, more versatile and efficient than previously developed solutions to this problem. As a proof of concept, two different phytohormone‐responsive, synthetic, repetitive proximal promoters were generated and testedin plantain the context of transcriptional reporters. Analysis of transgenic lines carrying the synthetic ethylene‐responsive promoter10x2EBS‐S10fused to theGUSreporter gene uncovered several developmentally regulated ethylene response maxima, indicating the utility of this reporter for monitoring the involvement of ethylene in a variety of physiologically relevant processes. These encouraging results suggest that this reporter system can be leveraged to investigate the ethylene response to biotic and abiotic factors with high spatial and temporal resolution.  more » « less
Award ID(s):
1750006 1444561 1940829 1650139
PAR ID:
10534889
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Plant Biotechnology Journal
Volume:
22
Issue:
7
ISSN:
1467-7644
Page Range / eLocation ID:
1942 to 1956
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Repetitive DNA (repeats) poses significant challenges for accurate and efficient genome assembly and sequence alignment. This is particularly true for metagenomic data, where genome dynamics such as horizontal gene transfer, gene duplication, and gene loss/gain complicate accurate genome assembly from metagenomic communities. Detecting repeats is a crucial first step in overcoming these challenges. To address this issue, we propose GraSSRep, a novel approach that leverages the assembly graph's structure through graph neural networks (GNNs) within a self-supervised learning framework to classify DNA sequences into repetitive and non-repetitive categories. Specifically, we frame this problem as a node classification task within a metagenomic assembly graph. In a self-supervised fashion, we rely on a high-precision (but low-recall) heuristic to generate pseudo-labels for a small proportion of the nodes. We then use those pseudo-labels to train a GNN embedding and a random forest classifier to propagate the labels to the remaining nodes. In this way, GraSSRep combines sequencing features with predefined and learned graph features to achieve state-of-the-art performance in repeat detection. We evaluate our method using simulated and synthetic metagenomic datasets. The results on the simulated data highlight our GraSSRep's robustness to repeat attributes, demonstrating its effectiveness in handling the complexity of repeated sequences. Additionally, our experiments with synthetic metagenomic datasets reveal that incorporating the graph structure and the GNN enhances our detection performance. Finally, in comparative analyses, GraSSRep outperforms existing repeat detection tools with respect to precision and recall. 
    more » « less
  2. Long oligodeoxynucleotides (ODNs) are segments of DNAs having over one hundred nucleotides (nt). They are typically assembled using enzymatic methods such as PCR and ligation from shorter 20 to 60 nt ODNs produced by automated de novo chemical synthesis. While these methods have made many projects in areas such as synthetic biology and protein engineering possible, they have various drawbacks. For example, they cannot produce genes and genomes with long repeats and have difficulty to produce sequences containing stable secondary structures. Here, we report a direct de novo chemical synthesis of 400 nt ODNs, and their isolation from the complex reaction mixture using the catching-by-polymerization (CBP) method. To determine the authenticity of the ODNs, 399 and 401 nt ODNs were synthesized and purified with CBP. The two were joined together using Gibson assembly to give the 800 nt green fluorescent protein (GFP) gene construct. The sequence of the construct was verified via Sanger sequencing. To demonstrate the potential use of the long ODN synthesis method, the GFP gene was expressed inE. coli. The long ODN synthesis and isolation method presented here provides a pathway to the production of genes and genomes containing long repeats or stable secondary structures that cannot be produced or are highly challenging to produce using existing technologies. 
    more » « less
  3. ABSTRACT Although the ϕX174 H protein is monomeric during procapsid morphogenesis, 10 proteins oligomerize to form a DNA translocating conduit (H-tube) for penetration. However, the timing and location of H-tube formation are unknown. The H-tube's highly repetitive primary and quaternary structures made it amenable to a genetic analysis using in-frame insertions and deletions. Length-altered proteins were characterized for the ability to perform the protein's three known functions: participation in particle assembly, genome translocation, and stimulation of viral protein synthesis. Insertion mutants were viable. Theoretically, these proteins would produce an assembled tube exceeding the capsid's internal diameter, suggesting that virions do not contain a fully assembled tube. Lengthened proteins were also used to test the biological significance of the crystal structure. Particles containing H proteins of two different lengths were significantly less infectious than both parents, indicating an inability to pilot DNA. Shortened H proteins were not fully functional. Although they could still stimulate viral protein synthesis, they either were not incorporated into virions or, if incorporated, failed to pilot the genome. Mutant proteins that failed to incorporate contained deletions within an 85-amino-acid segment, suggesting the existence of an incorporation domain. The revertants of shortened H protein mutants fell into two classes. The first class duplicated sequences neighboring the deletion, restoring wild-type length but not wild-type sequence. The second class suppressed an incorporation defect, allowing the use of the shortened protein. IMPORTANCE The H-tube crystal structure represents the first high-resolution structure of a virally encoded DNA-translocating conduit. It has similarities with other viral proteins through which DNA must travel, such as the α-helical barrel domains of P22 portal proteins and T7 proteins that form tail tube extensions during infection. Thus, the H protein serves as a paradigm for the assembly and function of long α-helical supramolecular structures and nanotubes. Highly repetitive in primary and quaternary structure, they are amenable to structure-function analyses using in-frame insertions and deletions as presented herein. 
    more » « less
  4. INTRODUCTION To faithfully distribute genetic material to daughter cells during cell division, spindle fibers must couple to DNA by means of a structure called the kinetochore, which assembles at each chromosome’s centromere. Human centromeres are located within large arrays of tandemly repeated DNA sequences known as alpha satellite (αSat), which often span millions of base pairs on each chromosome. Arrays of αSat are frequently surrounded by other types of tandem satellite repeats, which have poorly understood functions, along with nonrepetitive sequences, including transcribed genes. Previous genome sequencing efforts have been unable to generate complete assemblies of satellite-rich regions because of their scale and repetitive nature, limiting the ability to study their organization, variation, and function. RATIONALE Pericentromeric and centromeric (peri/centromeric) satellite DNA sequences have remained almost entirely missing from the assembled human reference genome for the past 20 years. Using a complete, telomere-to-telomere (T2T) assembly of a human genome, we developed and deployed tailored computational approaches to reveal the organization and evolutionary patterns of these satellite arrays at both large and small length scales. We also performed experiments to map precisely which αSat repeats interact with kinetochore proteins. Last, we compared peri/centromeric regions among multiple individuals to understand how these sequences vary across diverse genetic backgrounds. RESULTS Satellite repeats constitute 6.2% of the T2T-CHM13 genome assembly, with αSat representing the single largest component (2.8% of the genome). By studying the sequence relationships of αSat repeats in detail across each centromere, we found genome-wide evidence that human centromeres evolve through “layered expansions.” Specifically, distinct repetitive variants arise within each centromeric region and expand through mechanisms that resemble successive tandem duplications, whereas older flanking sequences shrink and diverge over time. We also revealed that the most recently expanded repeats within each αSat array are more likely to interact with the inner kinetochore protein Centromere Protein A (CENP-A), which coincides with regions of reduced CpG methylation. This suggests a strong relationship between local satellite repeat expansion, kinetochore positioning, and DNA hypomethylation. Furthermore, we uncovered large and unexpected structural rearrangements that affect multiple satellite repeat types, including active centromeric αSat arrays. Last, by comparing sequence information from nearly 1600 individuals’ X chromosomes, we observed that individuals with recent African ancestry possess the greatest genetic diversity in the region surrounding the centromere, which sometimes contains a predominantly African αSat sequence variant. CONCLUSION The genetic and epigenetic properties of centromeres are closely interwoven through evolution. These findings raise important questions about the specific molecular mechanisms responsible for the relationship between inner kinetochore proteins, DNA hypomethylation, and layered αSat expansions. Even more questions remain about the function and evolution of non-αSat repeats. To begin answering these questions, we have produced a comprehensive encyclopedia of peri/centromeric sequences in a human genome, and we demonstrated how these regions can be studied with modern genomic tools. Our work also illuminates the rich genetic variation hidden within these formerly missing regions of the genome, which may contribute to health and disease. This unexplored variation underlines the need for more T2T human genome assemblies from genetically diverse individuals. Gapless assemblies illuminate centromere evolution. ( Top ) The organization of peri/centromeric satellite repeats. ( Bottom left ) A schematic portraying (i) evidence for centromere evolution through layered expansions and (ii) the localization of inner-kinetochore proteins in the youngest, most recently expanded repeats, which coincide with a region of DNA hypomethylation. ( Bottom right ) An illustration of the global distribution of chrX centromere haplotypes, showing increased diversity in populations with recent African ancestry. 
    more » « less
  5. String edit distances have been used for decades in applications ranging from spelling correction and web search suggestions to DNA analysis. Most string edit distances are variations of the Levenshtein distance and consider only single-character edits. In forensic applications polymorphic genetic markers such as short tandem repeats (STRs) are used. At these repetitive motifs the DNA copying errors consist of more than just single base differences. More often the phenomenon of “stutter” is observed, where the number of repeated units differs (by whole units) from the template. To adapt the Levenshtein distance to be suitable for forensic applications where DNA sequence similarity is of interest, a generalized string edit distance is defined that accommodates the addition or deletion of whole motifs in addition to single-nucleotide edits. A dynamic programming implementation is developed for computing this distance between sequences. The novelty of this algorithm is in handling the complex interactions that arise between multiple- and single-character edits. Forensic examples illustrate the purpose and use of the Restricted Forensic Levenshtein (RFL) distance measure, but applications extend to sequence alignment and string similarity in other biological areas, as well as dynamic programming algorithms more broadly. 
    more » « less