skip to main content

Title: ToxCodAn: a new toxin annotator and guide to venom gland transcriptomics
Abstract Motivation

Next-generation sequencing has become exceedingly common and has transformed our ability to explore nonmodel systems. In particular, transcriptomics has facilitated the study of venom and evolution of toxins in venomous lineages; however, many challenges remain. Primarily, annotation of toxins in the transcriptome is a laborious and time-consuming task. Current annotation software often fails to predict the correct coding sequence and overestimates the number of toxins present in the transcriptome. Here, we present ToxCodAn, a python script designed to perform precise annotation of snake venom gland transcriptomes. We test ToxCodAn with a set of previously curated transcriptomes and compare the results to other annotators. In addition, we provide a guide for venom gland transcriptomics to facilitate future research and use Bothrops alternatus as a case study for ToxCodAn and our guide.


Our analysis reveals that ToxCodAn provides precise annotation of toxins present in the transcriptome of venom glands of snakes. Comparison with other annotators demonstrates that ToxCodAn has better performance with regard to run time ($>20x$ faster), coding sequence prediction ($>3x$ more accurate) and the number of toxins predicted (generating $>4x$ less false positives). In this sense, ToxCodAn is a valuable resource for toxin annotation. The ToxCodAn framework can be expanded in the future to work with other venomous lineages and detect novel toxins.

more » « less
Award ID(s):
1638879 1822417 1638902
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Briefings in Bioinformatics
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    The rapid development of sequencing technologies resulted in a wide expansion of genomics studies using venomous lineages. This facilitated research focusing on understanding the evolution of adaptive traits and the search for novel compounds that can be applied in agriculture and medicine. However, the toxin annotation of genomes is a laborious and time-consuming task, and no consensus pipeline is currently available. No computational tool currently exists to address the challenges specific to toxin annotation and to ensure the reproducibility of the process.


    Here, we present ToxCodAn-Genome, the first software designed to perform automated toxin annotation in genomes of venomous lineages. This pipeline was designed to retrieve the full-length coding sequences of toxins and to allow the detection of novel truncated paralogs and pseudogenes. We tested ToxCodAn-Genome using 12 genomes of venomous lineages and achieved high performance on recovering their current toxin annotations. This tool can be easily customized to allow improvements in the final toxin annotation set and can be expanded to virtually any venomous lineage. ToxCodAn-Genome is fast, allowing it to run on any personal computer, but it can also be executed in multicore mode, taking advantage of large high-performance servers. In addition, we provide a guide to direct future research in the venomics field to ensure a confident toxin annotation in the genome being studied. As a case study, we sequenced and annotated the toxin repertoire of Bothrops alternatus, which may facilitate future evolutionary and biomedical studies using vipers as models.


    ToxCodAn-Genome is suitable to perform toxin annotation in the genome of venomous species and may help to improve the reproducibility of further studies. ToxCodAn-Genome and the guide are freely available at

    more » « less
  2. Abstract

    Scorpions constitute a charismatic lineage of arthropods and comprise more than 2500 described species. Found throughout various tropical and temperate habitats, these predatory arachnids have a long evolutionary history, with a fossil record that began in the Silurian. While all scorpions are venomous, the asymmetrically diverse family Buthidae harbors nearly half the diversity of extant scorpions, and all but one of the 58 species that are medically significant to humans. However, the lack of a densely sampled scorpion phylogeny has hindered broader inferences of the diversification dynamics of scorpion toxins. To redress this gap, we assembled a phylogenomic data set of 100 scorpion venom gland transcriptomes and genomes, emphasizing the sampling of highly toxic buthid genera. To infer divergence times of venom gene families, we applied a phylogenomic node dating approach for the species tree in tandem with phylostratigraphic bracketing to estimate the minimum ages of mammal-specific toxins. Our analyses establish a robustly supported phylogeny of scorpions, particularly with regard to relationships between medically significant taxa. Analysis of venom gene families shows that mammal-active sodium channel toxins (NaTx) have independently evolved in five lineages within Buthidae. Temporal windows of mammal-targeting toxin origins are correlated with the basal diversification of major scorpion mammal predators such as shrews, bats, and rodents. These results suggest an evolutionary model of relatively recent diversification of buthid NaTx homologs in response to the diversification of scorpion predators. [Adaptation; arachnids; phylogenomic dating; phylostratigraphy; venom.]

    more » « less
  3. Abstract Background The explosive radiation and diversification of the advanced snakes (superfamily Colubroidea) was associated with changes in all aspects of the shared venom system. Morphological changes included the partitioning of the mixed ancestral glands into two discrete glands devoted for production of venom or mucous respectively, as well as changes in the location, size and structural elements of the venom-delivering teeth. Evidence also exists for homology among venom gland toxins expressed across the advanced snakes. However, despite the evolutionary novelty of snake venoms, in-depth toxin molecular evolutionary history reconstructions have been mostly limited to those types present in only two front-fanged snake families, Elapidae and Viperidae. To have a broader understanding of toxins shared among extant snakes, here we first sequenced the transcriptomes of eight taxonomically diverse rear-fanged species and four key viperid species and analysed major toxin types shared across the advanced snakes. Results Transcriptomes were constructed for the following families and species: Colubridae - Helicops leopardinus , Heterodon nasicus , Rhabdophis subminiatus ; Homalopsidae – Homalopsis buccata ; Lamprophiidae - Malpolon monspessulanus , Psammophis schokari , Psammophis subtaeniatus , Rhamphiophis oxyrhynchus ; and Viperidae – Bitis atropos , Pseudocerastes urarachnoides , Tropidolaeumus subannulatus , Vipera transcaucasiana . These sequences were combined with those from available databases of other species in order to facilitate a robust reconstruction of the molecular evolutionary history of the key toxin classes present in the venom of the last common ancestor of the advanced snakes, and thus present across the full diversity of colubroid snake venoms. In addition to differential rates of evolution in toxin classes between the snake lineages, these analyses revealed multiple instances of previously unknown instances of structural and functional convergences. Structural convergences included: the evolution of new cysteines to form heteromeric complexes, such as within kunitz peptides (the beta-bungarotoxin trait evolving on at least two occasions) and within SVMP enzymes (the P-IIId trait evolving on at least three occasions); and the C-terminal tail evolving on two separate occasions within the C-type natriuretic peptides, to create structural and functional analogues of the ANP/BNP tailed condition. Also shown was that the de novo evolution of new post-translationally liberated toxin families within the natriuretic peptide gene propeptide region occurred on at least five occasions, with novel functions ranging from induction of hypotension to post-synaptic neurotoxicity. Functional convergences included the following: multiple occasions of SVMP neofunctionalised in procoagulant venoms into activators of the clotting factors prothrombin and Factor X; multiple instances in procoagulant venoms where kunitz peptides were neofunctionalised into inhibitors of the clot destroying enzyme plasmin, thereby prolonging the half-life of the clots formed by the clotting activating enzymatic toxins; and multiple occasions of kunitz peptides neofunctionalised into neurotoxins acting on presynaptic targets, including twice just within Bungarus venoms. Conclusions We found novel convergences in both structural and functional evolution of snake toxins. These results provide a detailed roadmap for future work to elucidate predator–prey evolutionary arms races, ascertain differential clinical pathologies, as well as documenting rich biodiscovery resources for lead compounds in the drug design and discovery pipeline. 
    more » « less
  4. Abstract

    Changes in gene expression can rapidly influence adaptive traits in the early stages of lineage diversification. Venom is an adaptive trait comprised of numerous toxins used for prey capture and defense. Snake venoms can vary widely between conspecific populations, but the influence of lineage diversification on such compositional differences are unknown. To explore venom differentiation in the early stages of lineage diversification, we used RNA-seq and mass spectrometry to characterize Sidewinder Rattlesnake (Crotalus cerastes) venom. We generated the first venom-gland transcriptomes and complementary venom proteomes for eight individuals collected across the United States and tested for expression differences across life history traits and between subspecific, mitochondrial, and phylotranscriptomic hypotheses. Sidewinder venom was comprised primarily of hemorrhagic toxins, with few cases of differential expression attributable to life history or lineage hypotheses. However, phylotranscriptomic lineage comparisons more than doubled instances of significant expression differences compared to all other factors. Nevertheless, only 6.4% of toxins were differentially expressed overall, suggesting that shallow divergence has not led to major changes in Sidewinder venom composition. Our results demonstrate the need for consensus venom-gland transcriptomes based on multiple individuals and highlight the potential for discrepancies in differential expression between different phylogenetic hypotheses.

    more » « less
  5. The same selective forces that give rise to rapid inter- and intraspecific divergence in snake venoms can also favor differences in venoms across life-history stages. Ontogenetic changes in venom composition are well known and widespread in snakes but have not been investigated to the level of unambiguously identifying the specific loci involved. The eastern diamondback rattlesnake was previously shown to undergo an ontogenetic shift in venom composition at sexual maturity, and this shift accounted for more venom variation than geography. To characterize the genetics underlying the ontogenetic venom compositional change inC. adamanteus, we sequenced adult/juvenile pairs of venom-gland transcriptomes from five populations previously shown to have different adult venom compositions. We identified a total of 59 putative toxin transcripts for C. adamanteus, and 12 of these were involved in the ontogenetic change. Three toxins were downregulated, and nine were upregulated in adults relative to juveniles. Adults and juveniles expressed similar total levels of snake-venom metalloproteinases but differed substantially in their featured paralogs, and adults expressed higher levels of Bradykinin-potentiating and C-type natriuretic peptides, nerve growth factor, and specific paralogs of phospholipases A2and snake venom serine proteinases. Juvenile venom was more toxic to mice, indicating that the expression differences resulted in a phenotypically, and therefore potentially ecologically, significant difference in venom function. We also showed that adult and juvenile venom-gland transcriptomes for a species with known ontogenetic venom variation were equally effective at individually providing a full characterization of the venom genes of a species but that any particular individual was likely to lack several toxins in their transcriptome. A full characterization of a species’ venom-gene complement therefore requires sequencing more than one individual, although the ages of the individuals are unimportant.

    more » « less