skip to main content


Title: TEfinder: A Bioinformatics Pipeline for Detecting New Transposable Element Insertion Events in Next-Generation Sequencing Data
Transposable elements (TEs) are mobile elements capable of introducing genetic changes rapidly. Their importance has been documented in many biological processes, such as introducing genetic instability, altering patterns of gene expression, and accelerating genome evolution. Increasing appreciation of TEs has resulted in a growing number of bioinformatics software to identify insertion events. However, the application of existing tools is limited by either narrow-focused design of the package, too many dependencies on other tools, or prior knowledge required as input files that may not be readily available to all users. Here, we reported a simple pipeline, TEfinder, developed for the detection of new TE insertions with minimal software and input file dependencies. The external software requirements are BEDTools, SAMtools, and Picard. Necessary input files include the reference genome sequence in FASTA format, an alignment file from paired-end reads, existing TEs in GTF format, and a text file of TE names. We tested TEfinder among several evolving populations of Fusarium oxysporum generated through a short-term adaptation study. Our results demonstrate that this easy-to-use tool can effectively detect new TE insertion events, making it accessible and practical for TE analysis.  more » « less
Award ID(s):
1652641
NSF-PAR ID:
10214665
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Genes
Volume:
12
Issue:
2
ISSN:
2073-4425
Page Range / eLocation ID:
224
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Background Transposable element (TE) polymorphisms are important components of population genetic variation. The functional impacts of TEs in gene regulation and generating genetic diversity have been observed in multiple species, but the frequency and magnitude of TE variation is under appreciated. Inexpensive and deep sequencing technology has made it affordable to apply population genetic methods to whole genomes with methods that identify single nucleotide and insertion/deletion polymorphisms. However, identifying TE polymorphisms, particularly transposition events or non-reference insertion sites can be challenging due to the repetitive nature of these sequences, which hamper both the sensitivity and specificity of analysis tools. Methods We have developed the tool RelocaTE2 for identification of TE insertion sites at high sensitivity and specificity. RelocaTE2 searches for known TE sequences in whole genome sequencing reads from second generation sequencing platforms such as Illumina. These sequence reads are used as seeds to pinpoint chromosome locations where TEs have transposed. RelocaTE2 detects target site duplication (TSD) of TE insertions allowing it to report TE polymorphism loci with single base pair precision. Results and Discussion The performance of RelocaTE2 is evaluated using both simulated and real sequence data. RelocaTE2 demonstrate high level of sensitivity and specificity, particularly when the sequence coverage is not shallow. In comparison to other tools tested, RelocaTE2 achieves the best balance between sensitivity and specificity. In particular, RelocaTE2 performs best in prediction of TSDs for TE insertions. Even in highly repetitive regions, such as those tested on rice chromosome 4, RelocaTE2 is able to report up to 95% of simulated TE insertions with less than 0.1% false positive rate using 10-fold genome coverage resequencing data. RelocaTE2 provides a robust solution to identify TE insertion sites and can be incorporated into analysis workflows in support of describing the complete genotype from light coverage genome sequencing. 
    more » « less
  2. Bosco, Giovanni (Ed.)
    Transposable elements (TE) are selfish genetic elements that can cause harmful mutations. In Drosophila , it has been estimated that half of all spontaneous visible marker phenotypes are mutations caused by TE insertions. Several factors likely limit the accumulation of exponentially amplifying TEs within genomes. First, synergistic interactions between TEs that amplify their harm with increasing copy number are proposed to limit TE copy number. However, the nature of this synergy is poorly understood. Second, because of the harm posed by TEs, eukaryotes have evolved systems of small RNA-based genome defense to limit transposition. However, as in all immune systems, there is a cost of autoimmunity and small RNA-based systems that silence TEs can inadvertently silence genes flanking TE insertions. In a screen for essential meiotic genes in Drosophila melanogaster , a truncated Doc retrotransposon within a neighboring gene was found to trigger the germline silencing of ald , the Drosophila Mps1 homolog, a gene essential for proper chromosome segregation in meiosis. A subsequent screen for suppressors of this silencing identified a new insertion of a Hobo DNA transposon in the same neighboring gene. Here we describe how the original Doc insertion triggers flanking piRNA biogenesis and local gene silencing. We show that this local gene silencing occurs in cis and is dependent on deadlock , a component of the Rhino-Deadlock-Cutoff (RDC) complex, to trigger dual-strand piRNA biogenesis at TE insertions. We further show how the additional Hobo insertion leads to de-silencing by reducing flanking piRNA biogenesis triggered by the original Doc insertion. These results support a model of TE-mediated gene silencing by piRNA biogenesis in cis that depends on local determinants of transcription. This may explain complex patterns of off-target gene silencing triggered by TEs within populations and in the laboratory. It also provides a mechanism of sign epistasis among TE insertions, illuminates the complex nature of their interactions and supports a model in which off-target gene silencing shapes the evolution of the RDC complex. 
    more » « less
  3. Andrews, B J (Ed.)
    Abstract Intact transposable elements (TEs) account for 65% of the maize genome and can impact gene function and regulation. Although TEs comprise the majority of the maize genome and affect important phenotypes, genome-wide patterns of TE polymorphisms in maize have only been studied in a handful of maize genotypes, due to the challenging nature of assessing highly repetitive sequences. We implemented a method to use short-read sequencing data from 509 diverse inbred lines to classify the presence/absence of 445,418 nonredundant TEs that were previously annotated in four genome assemblies including B73, Mo17, PH207, and W22. Different orders of TEs (i.e., LTRs, Helitrons, and TIRs) had different frequency distributions within the population. LTRs with lower LTR similarity were generally more frequent in the population than LTRs with higher LTR similarity, though high-frequency insertions with very high LTR similarity were observed. LTR similarity and frequency estimates of nested elements and the outer elements in which they insert revealed that most nesting events occurred very near the timing of the outer element insertion. TEs within genes were at higher frequency than those that were outside of genes and this is particularly true for those not inserted into introns. Many TE insertional polymorphisms observed in this population were tagged by SNP markers. However, there were also 19.9% of the TE polymorphisms that were not well tagged by SNPs (R2 < 0.5) that potentially represent information that has not been well captured in previous SNP-based marker-trait association studies. This study provides a population scale genome-wide assessment of TE variation in maize and provides valuable insight on variation in TEs in maize and factors that contribute to this variation. 
    more » « less
  4. Abstract Motivation

    Transposable elements (TEs) are ubiquitous in genomes and many remain active. TEs comprise an important fraction of the transcriptomes with potential effects on the host genome, either by generating deleterious mutations or promoting evolutionary novelties. However, their functional study is limited by the difficulty in their identification and quantification, particularly in non-model organisms.

    Results

    We developed a new pipeline [explore active transposable elements (ExplorATE)] implemented in R and bash that allows the quantification of active TEs in both model and non-model organisms. ExplorATE creates TE-specific indexes and uses the Selective Alignment (SA) to filter out co-transcribed transposons within genes based on alignment scores. Moreover, our software incorporates a Wicker-like criteria to refine a set of target TEs and avoid spurious mapping. Based on simulated and real data, we show that the SA strategy adopted by ExplorATE achieved better estimates of non-co-transcribed elements than other available alignment-based or mapping-based software. ExplorATE results showed high congruence with alignment-based tools with and without a reference genome, yet ExplorATE required less execution time. Likewise, ExplorATE expands and complements most previous TE analyses by incorporating the co-transcription and multi-mapping effects during quantification, and provides a seamless integration with other downstream tools within the R environment.

    Availability and implementation

    Source code is available at https://github.com/FemeniasM/ExplorATEproject and https://github.com/FemeniasM/ExplorATE_shell_script. Data available on request.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Abstract Background Transposable elements (TEs) are selfish DNA sequences capable of moving and amplifying at the expense of host cells. Despite this, an increasing number of studies have revealed that TE proteins are important contributors to the emergence of novel host proteins through molecular domestication. We previously described seven transposase-derived domesticated genes from the PIF/Harbinger DNA family of TEs in Drosophila and a co-domestication. All PIF TEs known in plants and animals distinguish themselves from other DNA transposons by the presence of two genes. We hypothesize that there should often be co-domestications of the two genes from the same TE because the transposase (gene 1) has been described to be translocated to the nucleus by the MADF protein (gene 2). To provide support for this model of new gene origination, we investigated available insect species genomes for additional evidence of PIF TE domestication events and explored the co-domestication of the MADF protein from the same TE insertion. Results After the extensive insect species genomes exploration of hits to PIF transposases and analyses of their context and evolution, we present evidence of at least six independent PIF transposable elements proteins domestication events in insects: two co-domestications of both transposase and MADF proteins in Anopheles (Diptera), one transposase-only domestication event and one co-domestication in butterflies and moths (Lepidoptera), and two transposases-only domestication events in cockroaches (Blattodea). The predicted nuclear localization signals for many of those proteins and dicistronic transcription in some instances support the functional associations of co-domesticated transposase and MADF proteins. Conclusions Our results add to a co-domestication that we previously described in fruit fly genomes and support that new gene origination through domestication of a PIF transposase is frequently accompanied by the co-domestication of a cognate MADF protein in insects, potentially for regulatory functions. We propose a detailed model that predicts that PIF TE protein co-domestication should often occur from the same PIF TE insertion. 
    more » « less