skip to main content


Title: Venomix: a simple bioinformatic pipeline for identifying and characterizing toxin gene candidates from transcriptomic data
The advent of next-generation sequencing has resulted in transcriptome-based approaches to investigate functionally significant biological components in a variety of non-model organism. This has resulted in the area of “venomics”: a rapidly growing field using combined transcriptomic and proteomic datasets to characterize toxin diversity in a variety of venomous taxa. Ultimately, the transcriptomic portion of these analyses follows very similar pathways after transcriptome assembly often including candidate toxin identification using BLAST, expression level screening, protein sequence alignment, gene tree reconstruction, and characterization of potential toxin function. Here we describe the Python package Venomix, which streamlines these processes using common bioinformatic tools along with ToxProt, a publicly available annotated database comprised of characterized venom proteins. In this study, we use the Venomix pipeline to characterize candidate venom diversity in four phylogenetically distinct organisms, a cone snail (Conidae; Conus sponsalis ), a snake (Viperidae; Echis coloratus ), an ant (Formicidae; Tetramorium bicarinatum ), and a scorpion (Scorpionidae; Urodacus yaschenkoi ). Data on these organisms were sampled from public databases, with each original analysis using different approaches for transcriptome assembly, toxin identification, or gene expression quantification. Venomix recovered numerically more candidate toxin transcripts for three of the four transcriptomes than the original analyses and identified new toxin candidates. In summary, we show that the Venomix package is a useful tool to identify and characterize the diversity of toxin-like transcripts derived from transcriptomic datasets. Venomix is available at: https://bitbucket.org/JasonMacrander/Venomix/ .  more » « less
Award ID(s):
1401014
NSF-PAR ID:
10363782
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
PeerJ
Volume:
6
ISSN:
2167-8359
Page Range / eLocation ID:
e5361
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    Systems-level analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction, depend on the accuracy of the transcriptome. Multiple tools exist to perform transcriptome assembly from RNAseq data. However, assembling high quality transcriptomes is still not a trivial problem. This is especially the case for non-model organisms where adequate reference genomes are often not available. Different methods produce different transcriptome models and there is no easy way to determine which are more accurate. Furthermore, having alternative-splicing events exacerbates such difficult assembly problems. While benchmarking transcriptome assemblies is critical, this is also not trivial due to the general lack of true reference transcriptomes.

    Results

    In this study, we first provide a pipeline to generate a set of the simulated benchmark transcriptome and corresponding RNAseq data. Using the simulated benchmarking datasets, we compared the performance of various transcriptome assembly approaches including both de novo and genome-guided methods. The results showed that the assembly performance deteriorates significantly when alternative transcripts (isoforms) exist or for genome-guided methods when the reference is not available from the same genome. To improve the transcriptome assembly performance, leveraging the overlapping predictions between different assemblies, we present a new consensus-based ensemble transcriptome assembly approach, ConSemble.

    Conclusions

    Without using a reference genome, ConSemble using four de novo assemblers achieved an accuracy up to twice as high as any de novo assemblers we compared. When a reference genome is available, ConSemble using four genome-guided assemblies removed many incorrectly assembled contigs with minimal impact on correctly assembled contigs, achieving higher precision and accuracy than individual genome-guided methods. Furthermore, ConSemble using de novo assemblers matched or exceeded the best performing genome-guided assemblers even when the transcriptomes included isoforms. We thus demonstrated that the ConSemble consensus strategy both for de novo and genome-guided assemblers can improve transcriptome assembly. The RNAseq simulation pipeline, the benchmark transcriptome datasets, and the script to perform the ConSemble assembly are all freely available from:http://bioinfolab.unl.edu/emlab/consemble/.

     
    more » « less
  2. null (Ed.)
    The venoms of small rear-fanged snakes (RFS) remain largely unexplored, despite increased recognition of their importance in understanding venom evolution more broadly. Sequencing the transcriptome of venom-producing glands has greatly increased the ability of researchers to examine and characterize the toxin repertoire of small taxa with low venom yields. Here, we use RNA-seq to characterize the Duvernoy’s gland transcriptome of the Plains Black-headed Snake, Tantilla nigriceps, a small, semi-fossorial colubrid that feeds on a variety of potentially dangerous arthropods including centipedes and spiders. We generated transcriptomes of six individuals from three localities in order to both characterize the toxin expression of this species for the first time, and to look for initial evidence of venom variation in the species. Three toxin families—three-finger neurotoxins (3FTxs), cysteine-rich secretory proteins (CRISPs), and snake venom metalloproteinases (SVMPIIIs)—dominated the transcriptome of T. nigriceps; 3FTx themselves were the dominant toxin family in most individuals, accounting for as much as 86.4% of an individual’s toxin expression. Variation in toxin expression between individuals was also noted, with two specimens exhibiting higher relative expression of c-type lectins than any other sample (8.7–11.9% compared to <1%), and another expressed CRISPs higher than any other toxin. This study provides the first Duvernoy’s gland transcriptomes of any species of Tantilla, and one of the few transcriptomic studies of RFS not predicated on a single individual. This initial characterization demonstrates the need for further study of toxin expression variation in this species, as well as the need for further exploration of small RFS venoms. 
    more » « less
  3. Abstract Motivation

    De novo transcriptome analysis using RNA-seq offers a promising means to study gene expression in non-model organisms. Yet, the difficulty of transcriptome assembly means that the contigs provided by the assembler often represent a fractured and incomplete view of the transcriptome, complicating downstream analysis. We introduce Grouper, a new method for clustering contigs from de novo assemblies that are likely to belong to the same transcripts and genes; these groups can subsequently be analyzed more robustly. When provided with access to the genome of a related organism, Grouper can transfer annotations to the de novo assembly, further improving the clustering.

    Results

    On de novo assemblies from four different species, we show that Grouper is able to accurately cluster a larger number of contigs than the existing state-of-the-art method. The Grouper pipeline is able to map greater than 10% more reads against the contigs, leading to accurate downstream differential expression analyses. The labeling module, in the presence of a closely related annotated genome, can efficiently transfer annotations to the contigs and use this information to further improve clustering. Overall, Grouper provides a complete and efficient pipeline for processing de novo transcriptomic assemblies.

    Availability and implementation

    The Grouper software is freely available at https://github.com/COMBINE-lab/grouper under the 2-clause BSD license.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  4. The same selective forces that give rise to rapid inter- and intraspecific divergence in snake venoms can also favor differences in venoms across life-history stages. Ontogenetic changes in venom composition are well known and widespread in snakes but have not been investigated to the level of unambiguously identifying the specific loci involved. The eastern diamondback rattlesnake was previously shown to undergo an ontogenetic shift in venom composition at sexual maturity, and this shift accounted for more venom variation than geography. To characterize the genetics underlying the ontogenetic venom compositional change inC. adamanteus, we sequenced adult/juvenile pairs of venom-gland transcriptomes from five populations previously shown to have different adult venom compositions. We identified a total of 59 putative toxin transcripts for C. adamanteus, and 12 of these were involved in the ontogenetic change. Three toxins were downregulated, and nine were upregulated in adults relative to juveniles. Adults and juveniles expressed similar total levels of snake-venom metalloproteinases but differed substantially in their featured paralogs, and adults expressed higher levels of Bradykinin-potentiating and C-type natriuretic peptides, nerve growth factor, and specific paralogs of phospholipases A2and snake venom serine proteinases. Juvenile venom was more toxic to mice, indicating that the expression differences resulted in a phenotypically, and therefore potentially ecologically, significant difference in venom function. We also showed that adult and juvenile venom-gland transcriptomes for a species with known ontogenetic venom variation were equally effective at individually providing a full characterization of the venom genes of a species but that any particular individual was likely to lack several toxins in their transcriptome. A full characterization of a species’ venom-gene complement therefore requires sequencing more than one individual, although the ages of the individuals are unimportant.

     
    more » « less
  5. Abstract Rapid development of transcriptome sequencing technologies has resulted in a data revolution and emergence of new approaches to study transcriptomic regulation such as alternative splicing, alternative polyadenylation, CRISPR knockout screening in addition to the regular gene expression. A full characterization of the transcriptional landscape of different groups of cells or tissues holds enormous potential for both basic science as well as clinical applications. Although many methods have been developed in the realm of differential gene expression analysis, they all geared towards a particular type of sequencing data and failed to perform well when applied in different types of transcriptomic data. To fill this gap, we offer a negative beta binomial t-test (NBBt-test). NBBt-test provides multiple functions to perform differential analyses of alternative splicing, polyadenylation, CRISPR knockout screening, and gene expression datasets. Both real and large-scale simulation data show superior performance of NBBt-test with higher efficiency, and lower type I error rate and FDR to identify differential isoforms and differentially expressed genes and differential CRISPR knockout screening genes with different sample sizes when compared against the current very popular statistical methods. An R-package implementing NBBt-test is available for downloading from CRAN ( https://CRAN.R-project.org/package=NBBttest ). 
    more » « less