The infraorder Mygalomorphae is one of the three main lineages of spiders comprising over 3000 nominal species. This ancient group has a worldwide distribution that includes among its ranks large and charismatic taxa such as tarantulas, trapdoor spiders, and highly venomous funnel-web spiders. Based on past molecular studies using Sanger-sequencing approaches, numerous mygalomorph families (e.g., Hexathelidae, Ctenizidae, Cyrtaucheniidae, Dipluridae, and Nemesiidae) have been identified as non-monophyletic. However, these data were unable to sufficiently resolve the higher-level (intra- and interfamilial) relationships such that the necessary changes in classification could be made with confidence. Here, we present a comprehensive phylogenomic treatment of the spider infraorder Mygalomorphae. We employ 472 loci obtained through anchored hybrid enrichment to reconstruct relationships among all the mygalomorph spider families and estimate the timeframe of their diversification. We sampled nearly all currently recognized families, which has allowed us to assess their status, and as a result, propose a new classification scheme. Our generic-level sampling has also provided an evolutionary framework for revisiting questions regarding silk use in mygalomorph spiders. The first such analysis for the group within a strict phylogenetic framework shows that a sheet web is likely the plesiomorphic condition for mygalomorphs, as well as providingmore »
Contamination of a genetic sample with DNA from one or more nontarget species is a continuing concern of molecular phylogenetic studies, both Sanger sequencing studies and next-generation sequencing studies. We developed an automated pipeline for identifying and excluding likely cross-contaminated loci based on the detection of bimodal distributions of patristic distances across gene trees. When contamination occurs between samples within a data set, a comparison between a contaminated sample and its contaminant taxon will yield bimodal distributions with one peak close to zero patristic distance. This new method does not rely on a priori knowledge of taxon relatedness nor does it determine the causes(s) of the contamination. Exclusion of putatively contaminated loci from a data set generated for the insect family Cicadidae showed that these sequences were affecting some topological patterns and branch supports, although the effects were sometimes subtle, with some contamination-influenced relationships exhibiting strong bootstrap support. Long tip branches and outlier values for one anchored phylogenomic pipeline statistic (AvgNHomologs) were correlated with the presence of contamination. While the anchored hybrid enrichment markers used here, which target hemipteroid taxa, proved effective in resolving deep and shallow level Cicadidae relationships in aggregate, individual markers contained inadequate phylogenetic signal, in more »
- Award ID(s):
- 1655891
- Publication Date:
- NSF-PAR ID:
- 10373416
- Journal Name:
- Systematic Biology
- Volume:
- 71
- Issue:
- 6
- Page Range or eLocation-ID:
- p. 1504-1523
- ISSN:
- 1063-5157
- Publisher:
- Oxford University Press
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Wiegmann, Brian (Ed.)Abstract Ultraconserved genomic elements (UCEs) are generally treated as independent loci in phylogenetic analyses. The identification pipeline for UCE probes does not require prior knowledge of genetic identity, only selecting loci that are highly conserved, single copy, without repeats, and of a particular length. Here, we characterized UCEs from 11 phylogenomic studies across the animal tree of life, from birds to marine invertebrates. We found that within vertebrate lineages, UCEs are mostly intronic and intergenic, while in invertebrates, the majority are in exons. We then curated four different sets of UCE markers by genomic category from five different studies including: birds, mammals, fish, Hymenoptera (ants, wasps, and bees), and Coleoptera (beetles). Of genes captured by UCEs, we find that many are represented by two or more UCEs, corresponding to nonoverlapping segments of a single gene. We considered these UCEs to be nonindependent, merged all UCEs that belonged to a particular gene, constructed gene and species trees, and then evaluated the subsequent effect of merging cogenic UCEs on gene and species tree reconstruction. Average bootstrap support for merged UCE gene trees was significantly improved across all data sets apparently driven by the increase in loci length. Additionally, we conducted simulations andmore »
-
Abstract Estimating multiple sequence alignments (MSAs) and inferring phylogenies are essential for many aspects of comparative biology. Yet, many bioinformatics tools for such analyses have focused on specific clades, with greatest attention paid to plants, animals, and fungi. The rapid increase in high-throughput sequencing (HTS) data from diverse lineages now provides opportunities to estimate evolutionary relationships and gene family evolution across the eukaryotic tree of life. At the same time, these types of data are known to be error-prone (e.g., substitutions, contamination). To address these opportunities and challenges, we have refined a phylogenomic pipeline, now named PhyloToL, to allow easy incorporation of data from HTS studies, to automate production of both MSAs and gene trees, and to identify and remove contaminants. PhyloToL is designed for phylogenomic analyses of diverse lineages across the tree of life (i.e., at scales of >100 My). We demonstrate the power of PhyloToL by assessing stop codon usage in Ciliophora, identifying contamination in a taxon- and gene-rich database and exploring the evolutionary history of chromosomes in the kinetoplastid parasite Trypanosoma brucei, the causative agent of African sleeping sickness. Benchmarking PhyloToL’s homology assessment against that of OrthoMCL and a published paper on superfamilies of bacterial and eukaryoticmore »
-
A molecular phylogeny and a review of family-group classification are presented for 137 species (ca. 125 genera) of the insect family Cicadidae, the true cicadas, plus two species of hairy cicadas (Tettigarctidae) and two outgroup species from Cercopidae. Five genes, two of them mitochondrial, comprise the 4992 base-pair molecular dataset. Maximum-likelihood and Bayesian phylogenetic results are shown, including analyses to address potential base composition bias. Tettigarcta is confirmed as the sister-clade of the Cicadidae and support is found for three subfamilies identified in an earlier morphological cladistic analysis. A set of paraphyletic deep-level clades formed by African genera are together named as Tettigomyiinae n. stat. Taxonomic reassignments of genera and tribes are made where morphological examination confirms incorrect placements suggested by the molecular tree, and 11 new tribes are defined (Arenopsaltriini n. tribe, Durangonini n. tribe, Katoini n. tribe, Lacetasini n. tribe, Macrotristriini n. tribe, Malagasiini n. tribe, Nelcyndanini n. tribe, Pagiphorini n. tribe, Pictilini n. tribe, Psaltodini n. tribe, and Selymbriini n. tribe). Tribe Tacuini n. syn. is synonymized with Cryptotympanini, and Tryellina n. syn. is synonymized with an expanded Tribe Lamotialnini. Tribe Hyantiini n. syn. is synonymized with Fidicinini. Tribe Sinosenini is transferred to Cicadinae from Cicadettinae, Cicadatrinimore »
-
Abstract Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractablemore »