Background Transposable element (TE) polymorphisms are important components of population genetic variation. The functional impacts of TEs in gene regulation and generating genetic diversity have been observed in multiple species, but the frequency and magnitude of TE variation is under appreciated. Inexpensive and deep sequencing technology has made it affordable to apply population genetic methods to whole genomes with methods that identify single nucleotide and insertion/deletion polymorphisms. However, identifying TE polymorphisms, particularly transposition events or non-reference insertion sites can be challenging due to the repetitive nature of these sequences, which hamper both the sensitivity and specificity of analysis tools. Methods We have developed the tool RelocaTE2 for identification of TE insertion sites at high sensitivity and specificity. RelocaTE2 searches for known TE sequences in whole genome sequencing reads from second generation sequencing platforms such as Illumina. These sequence reads are used as seeds to pinpoint chromosome locations where TEs have transposed. RelocaTE2 detects target site duplication (TSD) of TE insertions allowing it to report TE polymorphism loci with single base pair precision. Results and Discussion The performance of RelocaTE2 is evaluated using both simulated and real sequence data. RelocaTE2 demonstrate high level of sensitivity and specificity, particularly when the sequence coverage is not shallow. In comparison to other tools tested, RelocaTE2 achieves the best balance between sensitivity and specificity. In particular, RelocaTE2 performs best in prediction of TSDs for TE insertions. Even in highly repetitive regions, such as those tested on rice chromosome 4, RelocaTE2 is able to report up to 95% of simulated TE insertions with less than 0.1% false positive rate using 10-fold genome coverage resequencing data. RelocaTE2 provides a robust solution to identify TE insertion sites and can be incorporated into analysis workflows in support of describing the complete genotype from light coverage genome sequencing.
more »
« less
TEfinder: A Bioinformatics Pipeline for Detecting New Transposable Element Insertion Events in Next-Generation Sequencing Data
Transposable elements (TEs) are mobile elements capable of introducing genetic changes rapidly. Their importance has been documented in many biological processes, such as introducing genetic instability, altering patterns of gene expression, and accelerating genome evolution. Increasing appreciation of TEs has resulted in a growing number of bioinformatics software to identify insertion events. However, the application of existing tools is limited by either narrow-focused design of the package, too many dependencies on other tools, or prior knowledge required as input files that may not be readily available to all users. Here, we reported a simple pipeline, TEfinder, developed for the detection of new TE insertions with minimal software and input file dependencies. The external software requirements are BEDTools, SAMtools, and Picard. Necessary input files include the reference genome sequence in FASTA format, an alignment file from paired-end reads, existing TEs in GTF format, and a text file of TE names. We tested TEfinder among several evolving populations of Fusarium oxysporum generated through a short-term adaptation study. Our results demonstrate that this easy-to-use tool can effectively detect new TE insertion events, making it accessible and practical for TE analysis.
more »
« less
- Award ID(s):
- 1652641
- PAR ID:
- 10214665
- Date Published:
- Journal Name:
- Genes
- Volume:
- 12
- Issue:
- 2
- ISSN:
- 2073-4425
- Page Range / eLocation ID:
- 224
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The current technologies to place new DNA into specific locations in plant genomes are low frequency and error-prone, and this inefficiency hampers genome-editing approaches to develop improved crops. Often considered to be genome ‘parasites’, transposable elements (TEs) evolved to insert their DNA seamlessly into genomes. Eukaryotic TEs select their site of insertion based on preferences for chromatin contexts, which differ for each TE type. Here we developed a genome engineering tool that controls the TE insertion site and cargo delivered, taking advantage of the natural ability of the TE to precisely excise and insert into the genome. Inspired by CRISPR-associated transposases that target transposition in a programmable manner in bacteria, we fused the rice Pong transposase protein to the Cas9 or Cas12a programmable nucleases. We demonstrated sequence-specific targeted insertion (guided by the CRISPR gRNA) of enhancer elements, an open reading frame and a gene expression cassette into the genome of the model plant Arabidopsis. We then translated this system into soybean—a major global crop in need of targeted insertion technology. We have engineered a TE ‘parasite’ into a usable and accessible toolkit that enables the sequence-specific targeting of custom DNA into plant genomes.more » « less
-
Andrews, B J (Ed.)Abstract Intact transposable elements (TEs) account for 65% of the maize genome and can impact gene function and regulation. Although TEs comprise the majority of the maize genome and affect important phenotypes, genome-wide patterns of TE polymorphisms in maize have only been studied in a handful of maize genotypes, due to the challenging nature of assessing highly repetitive sequences. We implemented a method to use short-read sequencing data from 509 diverse inbred lines to classify the presence/absence of 445,418 nonredundant TEs that were previously annotated in four genome assemblies including B73, Mo17, PH207, and W22. Different orders of TEs (i.e., LTRs, Helitrons, and TIRs) had different frequency distributions within the population. LTRs with lower LTR similarity were generally more frequent in the population than LTRs with higher LTR similarity, though high-frequency insertions with very high LTR similarity were observed. LTR similarity and frequency estimates of nested elements and the outer elements in which they insert revealed that most nesting events occurred very near the timing of the outer element insertion. TEs within genes were at higher frequency than those that were outside of genes and this is particularly true for those not inserted into introns. Many TE insertional polymorphisms observed in this population were tagged by SNP markers. However, there were also 19.9% of the TE polymorphisms that were not well tagged by SNPs (R2 < 0.5) that potentially represent information that has not been well captured in previous SNP-based marker-trait association studies. This study provides a population scale genome-wide assessment of TE variation in maize and provides valuable insight on variation in TEs in maize and factors that contribute to this variation.more » « less
-
Bomblies, K (Ed.)Abstract Transposable elements (TEs) have the potential to create regulatory variation both through the disruption of existing DNA regulatory elements and through the creation of novel DNA regulatory elements. In a species with a large genome, such as maize, many TEs interspersed with genes create opportunities for significant allelic variation due to TE presence/absence polymorphisms among individuals. We used information on putative regulatory elements in combination with knowledge about TE polymorphisms in maize to identify TE insertions that interrupt existing accessible chromatin regions (ACRs) in B73 as well as examples of polymorphic TEs that contain ACRs among four inbred lines of maize including B73, Mo17, W22, and PH207. The TE insertions in three other assembled maize genomes (Mo17, W22, or PH207) that interrupt ACRs that are present in the B73 genome can trigger changes to the chromatin, suggesting the potential for both genetic and epigenetic influences of these insertions. Nearly 20% of the ACRs located over 2 kb from the nearest gene are located within an annotated TE. These are regions of unmethylated DNA that show evidence for functional importance similar to ACRs that are not present within TEs. Using a large panel of maize genotypes, we tested if there is an association between the presence of TE insertions that interrupt, or carry, an ACR and the expression of nearby genes. While most TE polymorphisms are not associated with expression for nearby genes, the TEs that carry ACRs exhibit enrichment for being associated with higher expression of nearby genes, suggesting that these TEs may contribute novel regulatory elements. These analyses highlight the potential for a subset of TEs to rewire transcriptional responses in eukaryotic genomes.more » « less
-
Bosco, Giovanni (Ed.)Transposable elements (TE) are selfish genetic elements that can cause harmful mutations. In Drosophila , it has been estimated that half of all spontaneous visible marker phenotypes are mutations caused by TE insertions. Several factors likely limit the accumulation of exponentially amplifying TEs within genomes. First, synergistic interactions between TEs that amplify their harm with increasing copy number are proposed to limit TE copy number. However, the nature of this synergy is poorly understood. Second, because of the harm posed by TEs, eukaryotes have evolved systems of small RNA-based genome defense to limit transposition. However, as in all immune systems, there is a cost of autoimmunity and small RNA-based systems that silence TEs can inadvertently silence genes flanking TE insertions. In a screen for essential meiotic genes in Drosophila melanogaster , a truncated Doc retrotransposon within a neighboring gene was found to trigger the germline silencing of ald , the Drosophila Mps1 homolog, a gene essential for proper chromosome segregation in meiosis. A subsequent screen for suppressors of this silencing identified a new insertion of a Hobo DNA transposon in the same neighboring gene. Here we describe how the original Doc insertion triggers flanking piRNA biogenesis and local gene silencing. We show that this local gene silencing occurs in cis and is dependent on deadlock , a component of the Rhino-Deadlock-Cutoff (RDC) complex, to trigger dual-strand piRNA biogenesis at TE insertions. We further show how the additional Hobo insertion leads to de-silencing by reducing flanking piRNA biogenesis triggered by the original Doc insertion. These results support a model of TE-mediated gene silencing by piRNA biogenesis in cis that depends on local determinants of transcription. This may explain complex patterns of off-target gene silencing triggered by TEs within populations and in the laboratory. It also provides a mechanism of sign epistasis among TE insertions, illuminates the complex nature of their interactions and supports a model in which off-target gene silencing shapes the evolution of the RDC complex.more » « less