skip to main content

Title: ToxCodAn: a new toxin annotator and guide to venom gland transcriptomics
Abstract Motivation

Next-generation sequencing has become exceedingly common and has transformed our ability to explore nonmodel systems. In particular, transcriptomics has facilitated the study of venom and evolution of toxins in venomous lineages; however, many challenges remain. Primarily, annotation of toxins in the transcriptome is a laborious and time-consuming task. Current annotation software often fails to predict the correct coding sequence and overestimates the number of toxins present in the transcriptome. Here, we present ToxCodAn, a python script designed to perform precise annotation of snake venom gland transcriptomes. We test ToxCodAn with a set of previously curated transcriptomes and compare the results to other annotators. In addition, we provide a guide for venom gland transcriptomics to facilitate future research and use Bothrops alternatus as a case study for ToxCodAn and our guide.


Our analysis reveals that ToxCodAn provides precise annotation of toxins present in the transcriptome of venom glands of snakes. Comparison with other annotators demonstrates that ToxCodAn has better performance with regard to run time ($>20x$ faster), coding sequence prediction ($>3x$ more accurate) and the number of toxins predicted (generating $>4x$ less false positives). In this sense, ToxCodAn is a valuable resource for toxin annotation. The ToxCodAn framework can be more » expanded in the future to work with other venomous lineages and detect novel toxins.

« less
 ;  ;  ;  ;  ;  ;  
Award ID(s):
1638879 1822417 1638902
Publication Date:
Journal Name:
Briefings in Bioinformatics
Oxford University Press
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Scorpions constitute a charismatic lineage of arthropods and comprise more than 2500 described species. Found throughout various tropical and temperate habitats, these predatory arachnids have a long evolutionary history, with a fossil record that began in the Silurian. While all scorpions are venomous, the asymmetrically diverse family Buthidae harbors nearly half the diversity of extant scorpions, and all but one of the 58 species that are medically significant to humans. However, the lack of a densely sampled scorpion phylogeny has hindered broader inferences of the diversification dynamics of scorpion toxins. To redress this gap, we assembled a phylogenomic data set of 100 scorpion venom gland transcriptomes and genomes, emphasizing the sampling of highly toxic buthid genera. To infer divergence times of venom gene families, we applied a phylogenomic node dating approach for the species tree in tandem with phylostratigraphic bracketing to estimate the minimum ages of mammal-specific toxins. Our analyses establish a robustly supported phylogeny of scorpions, particularly with regard to relationships between medically significant taxa. Analysis of venom gene families shows that mammal-active sodium channel toxins (NaTx) have independently evolved in five lineages within Buthidae. Temporal windows of mammal-targeting toxin origins are correlated with the basal diversification ofmore »major scorpion mammal predators such as shrews, bats, and rodents. These results suggest an evolutionary model of relatively recent diversification of buthid NaTx homologs in response to the diversification of scorpion predators. [Adaptation; arachnids; phylogenomic dating; phylostratigraphy; venom.]

    « less
  2. The same selective forces that give rise to rapid inter- and intraspecific divergence in snake venoms can also favor differences in venoms across life-history stages. Ontogenetic changes in venom composition are well known and widespread in snakes but have not been investigated to the level of unambiguously identifying the specific loci involved. The eastern diamondback rattlesnake was previously shown to undergo an ontogenetic shift in venom composition at sexual maturity, and this shift accounted for more venom variation than geography. To characterize the genetics underlying the ontogenetic venom compositional change inC. adamanteus, we sequenced adult/juvenile pairs of venom-gland transcriptomes from five populations previously shown to have different adult venom compositions. We identified a total of 59 putative toxin transcripts for C. adamanteus, and 12 of these were involved in the ontogenetic change. Three toxins were downregulated, and nine were upregulated in adults relative to juveniles. Adults and juveniles expressed similar total levels of snake-venom metalloproteinases but differed substantially in their featured paralogs, and adults expressed higher levels of Bradykinin-potentiating and C-type natriuretic peptides, nerve growth factor, and specific paralogs of phospholipases A2and snake venom serine proteinases. Juvenile venom was more toxic to mice, indicating that the expression differences resulted inmore »a phenotypically, and therefore potentially ecologically, significant difference in venom function. We also showed that adult and juvenile venom-gland transcriptomes for a species with known ontogenetic venom variation were equally effective at individually providing a full characterization of the venom genes of a species but that any particular individual was likely to lack several toxins in their transcriptome. A full characterization of a species’ venom-gene complement therefore requires sequencing more than one individual, although the ages of the individuals are unimportant.

    « less
  3. The venoms of small rear-fanged snakes (RFS) remain largely unexplored, despite increased recognition of their importance in understanding venom evolution more broadly. Sequencing the transcriptome of venom-producing glands has greatly increased the ability of researchers to examine and characterize the toxin repertoire of small taxa with low venom yields. Here, we use RNA-seq to characterize the Duvernoy’s gland transcriptome of the Plains Black-headed Snake, Tantilla nigriceps, a small, semi-fossorial colubrid that feeds on a variety of potentially dangerous arthropods including centipedes and spiders. We generated transcriptomes of six individuals from three localities in order to both characterize the toxin expression of this species for the first time, and to look for initial evidence of venom variation in the species. Three toxin families—three-finger neurotoxins (3FTxs), cysteine-rich secretory proteins (CRISPs), and snake venom metalloproteinases (SVMPIIIs)—dominated the transcriptome of T. nigriceps; 3FTx themselves were the dominant toxin family in most individuals, accounting for as much as 86.4% of an individual’s toxin expression. Variation in toxin expression between individuals was also noted, with two specimens exhibiting higher relative expression of c-type lectins than any other sample (8.7–11.9% compared to <1%), and another expressed CRISPs higher than any other toxin. This study provides the firstmore »Duvernoy’s gland transcriptomes of any species of Tantilla, and one of the few transcriptomic studies of RFS not predicated on a single individual. This initial characterization demonstrates the need for further study of toxin expression variation in this species, as well as the need for further exploration of small RFS venoms.« less
  4. Abstract Background

    The barnacles are a group of >2,000 species that have fascinated biologists, including Darwin, for centuries. Their lifestyles are extremely diverse, from free-swimming larvae to sessile adults, and even root-like endoparasites. Barnacles also cause hundreds of millions of dollars of losses annually due to biofouling. However, genomic resources for crustaceans, and barnacles in particular, are lacking.


    Using 62× Pacific Biosciences coverage, 189× Illumina whole-genome sequencing coverage, 203× HiC coverage, and 69× CHi-C coverage, we produced a chromosome-level genome assembly of the gooseneck barnacle Pollicipes pollicipes. The P. pollicipes genome is 770 Mb long and its assembly is one of the most contiguous and complete crustacean genomes available, with a scaffold N50 of 47 Mb and 90.5% of the BUSCO Arthropoda gene set. Using the genome annotation produced here along with transcriptomes of 13 other barnacle species, we completed phylogenomic analyses on a nearly 2 million amino acid alignment. Contrary to previous studies, our phylogenies suggest that the Pollicipedomorpha is monophyletic and sister to the Balanomorpha, which alters our understanding of barnacle larval evolution and suggests homoplasy in a number of naupliar characters. We also compared transcriptomes of P. pollicipes nauplius larvae and adults and found that nearly one-half ofmore »the genes in the genome are differentially expressed, highlighting the vastly different transcriptomes of larvae and adult gooseneck barnacles. Annotation of the genes with KEGG and GO terms reveals that these stages exhibit many differences including cuticle binding, chitin binding, microtubule motor activity, and membrane adhesion.


    This study provides high-quality genomic resources for a key group of crustaceans. This is especially valuable given the roles P. pollicipes plays in European fisheries, as a sentinel species for coastal ecosystems, and as a model for studying barnacle adhesion as well as its key position in the barnacle tree of life. A combination of genomic, phylogenetic, and transcriptomic analyses here provides valuable insights into the evolution and development of barnacles.

    « less
  5. Abstract Venomous animals can deploy toxins for both predation and defense. These dual functions of toxins might be expected to promote the evolution of new venoms and alteration of their composition. Cnidarians are the most ancient venomous animals but our present understanding of their venom diversity is compromised by poor taxon sampling. New proteomic data were therefore generated to characterize toxins in venoms of a staurozoan, a hydrozoan, and an anthozoan. We then used a novel clustering approach to compare venom diversity in cnidarians to other venomous animals. Comparison of the presence or absence of 32 toxin protein families indicated venom composition did not vary widely among the 11 cnidarian species studied. Unsupervised clustering of toxin peptide sequences suggested that toxin composition of cnidarian venoms is just as complex as that in many venomous bilaterians, including marine snakes. The adaptive significance of maintaining a complex and relatively invariant venom remains unclear. Future study of cnidarian venom diversity, venom variation with nematocyst types and in different body regions are required to better understand venom evolution.