skip to main content


Title: ExplorATE: a new pipeline to explore active transposable elements from RNA-seq data
Abstract Motivation

Transposable elements (TEs) are ubiquitous in genomes and many remain active. TEs comprise an important fraction of the transcriptomes with potential effects on the host genome, either by generating deleterious mutations or promoting evolutionary novelties. However, their functional study is limited by the difficulty in their identification and quantification, particularly in non-model organisms.

Results

We developed a new pipeline [explore active transposable elements (ExplorATE)] implemented in R and bash that allows the quantification of active TEs in both model and non-model organisms. ExplorATE creates TE-specific indexes and uses the Selective Alignment (SA) to filter out co-transcribed transposons within genes based on alignment scores. Moreover, our software incorporates a Wicker-like criteria to refine a set of target TEs and avoid spurious mapping. Based on simulated and real data, we show that the SA strategy adopted by ExplorATE achieved better estimates of non-co-transcribed elements than other available alignment-based or mapping-based software. ExplorATE results showed high congruence with alignment-based tools with and without a reference genome, yet ExplorATE required less execution time. Likewise, ExplorATE expands and complements most previous TE analyses by incorporating the co-transcription and multi-mapping effects during quantification, and provides a seamless integration with other downstream tools within the R environment.

Availability and implementation

Source code is available at https://github.com/FemeniasM/ExplorATEproject and https://github.com/FemeniasM/ExplorATE_shell_script. Data available on request.

Supplementary information

Supplementary data are available at Bioinformatics online.

 
more » « less
Award ID(s):
2016372
NSF-PAR ID:
10368208
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
38
Issue:
13
ISSN:
1367-4803
Page Range / eLocation ID:
p. 3361-3366
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Short interspersed nuclear elements (SINEs) are a widespread type of small transposable element (TE). With increasing evidence for their impact on gene function and genome evolution in plants, accurate genome-scale SINE annotation becomes a fundamental step for studying the regulatory roles of SINEs and their relationship with other components in the genomes. Despite the overall promising progress made in TE annotation, SINE annotation remains a major challenge. Unlike some other TEs, SINEs are short and heterogeneous, and they usually lack well-conserved sequence or structural features. Thus, current SINE annotation tools have either low sensitivity or high false discovery rates. Given the demand and challenges, we aimed to provide a more accurate and efficient SINE annotation tool for plant genomes. The pipeline starts with maximizing the pool of SINE candidates via profile hidden Markov model-based homology search and de novo SINE search using structural features. Then, it excludes the false positives by integrating all known features of SINEs and the features of other types of TEs that can often be misannotated as SINEs. As a result, the pipeline substantially improves the tradeoff between sensitivity and accuracy, with both values close to or over 90%. We tested our tool in Arabidopsis thaliana and rice (Oryza sativa), and the results show that our tool competes favorably against existing SINE annotation tools. The simplicity and effectiveness of this tool would potentially be useful for generating more accurate SINE annotations for other plant species. The pipeline is freely available at https://github.com/yangli557/AnnoSINE.

     
    more » « less
  2. null (Ed.)
    Transposable elements (TEs) are mobile elements capable of introducing genetic changes rapidly. Their importance has been documented in many biological processes, such as introducing genetic instability, altering patterns of gene expression, and accelerating genome evolution. Increasing appreciation of TEs has resulted in a growing number of bioinformatics software to identify insertion events. However, the application of existing tools is limited by either narrow-focused design of the package, too many dependencies on other tools, or prior knowledge required as input files that may not be readily available to all users. Here, we reported a simple pipeline, TEfinder, developed for the detection of new TE insertions with minimal software and input file dependencies. The external software requirements are BEDTools, SAMtools, and Picard. Necessary input files include the reference genome sequence in FASTA format, an alignment file from paired-end reads, existing TEs in GTF format, and a text file of TE names. We tested TEfinder among several evolving populations of Fusarium oxysporum generated through a short-term adaptation study. Our results demonstrate that this easy-to-use tool can effectively detect new TE insertion events, making it accessible and practical for TE analysis. 
    more » « less
  3. Summary Open Research Badges

    This article has earned an Open Data Badge for making publicly available the digitally‐shareable data necessary to reproduce the reported results. The data is available athttps://github.com/SNAnderson/maizeTE_variation;https://mcstitzer.github.io/maize_TEs.

     
    more » « less
  4. Abstract Background Transposable elements (TEs) are powerful creators of genotypic and phenotypic diversity due to their inherent mutagenic capabilities and in this way they serve as a deep reservoir of sequences for genomic variation. As agents of genetic disruption, a TE’s potential to impact phenotype is partially a factor of its location in the genome. Previous research has shown TEs’ ability to impact the expression of neighboring genes, however our understanding of this trend is hampered by the exceptional amount of diversity in the TE world, and a lack of publicly available computational methods that quantify the presence of TEs relative to genes. Results Here, we have developed a tool to more easily quantify TE presence relative to genes through the use of only a gene and TE annotation, yielding a new metric we call TE Density. Briefly defined as the proportion of TE-occupied base-pairs relative to a window-size of the genome. This new pipeline reports TE density for each gene in the genome, for each type descriptor of TE (order and superfamily), and for multiple positions and distances relative to the gene (upstream, intragenic, and downstream) over sliding, user-defined windows. In this way, we overcome previous limitations to the study of TE-gene relationships by focusing on all TE types present in the genome, utilizing flexible genomic distances for measurement, and reporting a TE presence metric for every gene in the genome. Conclusions Together, this new tool opens up new avenues for studying TE-gene relationships, genome architecture, comparative genomics, and the tremendous diversity present of the TE world. TE Density is open-source and freely available at: https://github.com/sjteresi/TE_Density . 
    more » « less
  5. Abstract Background

    Advances in microbiome science are being driven in large part due to our ability to study and infer microbial ecology from genomes reconstructed from mixed microbial communities using metagenomics and single-cell genomics. Such omics-based techniques allow us to read genomic blueprints of microorganisms, decipher their functional capacities and activities, and reconstruct their roles in biogeochemical processes. Currently available tools for analyses of genomic data can annotate and depict metabolic functions to some extent; however, no standardized approaches are currently available for the comprehensive characterization of metabolic predictions, metabolite exchanges, microbial interactions, and microbial contributions to biogeochemical cycling.

    Results

    We present METABOLIC (METabolic And BiogeOchemistry anaLyses In miCrobes), a scalable software to advance microbial ecology and biogeochemistry studies using genomes at the resolution of individual organisms and/or microbial communities. The genome-scale workflow includes annotation of microbial genomes, motif validation of biochemically validated conserved protein residues, metabolic pathway analyses, and calculation of contributions to individual biogeochemical transformations and cycles. The community-scale workflow supplements genome-scale analyses with determination of genome abundance in the microbiome, potential microbial metabolic handoffs and metabolite exchange, reconstruction of functional networks, and determination of microbial contributions to biogeochemical cycles. METABOLIC can take input genomes from isolates, metagenome-assembled genomes, or single-cell genomes. Results are presented in the form of tables for metabolism and a variety of visualizations including biogeochemical cycling potential, representation of sequential metabolic transformations, community-scale microbial functional networks using a newly defined metric “MW-score” (metabolic weight score), and metabolic Sankey diagrams. METABOLIC takes ~ 3 h with 40 CPU threads to process ~ 100 genomes and corresponding metagenomic reads within which the most compute-demanding part of hmmsearch takes ~ 45 min, while it takes ~ 5 h to complete hmmsearch for ~ 3600 genomes. Tests of accuracy, robustness, and consistency suggest METABOLIC provides better performance compared to other software and online servers. To highlight the utility and versatility of METABOLIC, we demonstrate its capabilities on diverse metagenomic datasets from the marine subsurface, terrestrial subsurface, meadow soil, deep sea, freshwater lakes, wastewater, and the human gut.

    Conclusion

    METABOLIC enables the consistent and reproducible study of microbial community ecology and biogeochemistry using a foundation of genome-informed microbial metabolism, and will advance the integration of uncultivated organisms into metabolic and biogeochemical models. METABOLIC is written in Perl and R and is freely available under GPLv3 athttps://github.com/AnantharamanLab/METABOLIC.

     
    more » « less