skip to main content


Title: Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data
Abstract

The diversity of biological and ecological characteristics of organisms, and the underlying genetic patterns and processes of speciation, makes the development of universally applicable genetic species delimitation methods challenging. Many approaches, like those incorporating the multispecies coalescent, sometimes delimit populations and overestimate species numbers. This issue is exacerbated in taxa with inherently high population structure due to low dispersal ability, and in cryptic species resulting from nonecological speciation. These taxa present a conundrum when delimiting species: analyses rely heavily, if not entirely, on genetic data which over split species, while other lines of evidence lump. We showcase this conundrum in the harvesterTheromaster brunneus, a low dispersal taxon with a wide geographic distribution and high potential for cryptic species. Integrating morphology, mitochondrial, and sub-genomic (double-digest RADSeq and ultraconserved elements) data, we find high discordance across analyses and data types in the number of inferred species, with further evidence that multispecies coalescent approaches over split. We demonstrate the power of a supervised machine learning approach in effectively delimiting cryptic species by creating a “custom” training data set derived from a well-studied lineage with similar biological characteristics asTheromaster. This novel approach uses known taxa with particular biological characteristics to inform unknown taxa with similar characteristics, using modern computational tools ideally suited for species delimitation. The approach also considers the natural history of organisms to make more biologically informed species delimitation decisions, and in principle is broadly applicable for taxa across the tree of life.

 
more » « less
NSF-PAR ID:
10363189
Author(s) / Creator(s):
; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Frontiers in Zoology
Volume:
19
Issue:
1
ISSN:
1742-9994
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Barraclough, Timothy G. (Ed.)
    The “multispecies” coalescent (MSC) model that underlies many genomic species-delimitation approaches is problematic because it does not distinguish between genetic structure associated with species versus that of populations within species. Consequently, as both the genomic and spatial resolution of data increases, a proliferation of artifactual species results as within-species population lineages, detected due to restrictions in gene flow, are identified as distinct species. The toll of this extends beyond systematic studies, getting magnified across the many disciplines that rely upon an accurate framework of identified species. Here we present the first of a new class of approaches that addresses this issue by incorporating an extended speciation process for species delimitation. We model the formation of population lineages and their subsequent development into independent species as separate processes and provide for a way to incorporate current understanding of the species boundaries in the system through specification of species identities of a subset of population lineages. As a result, species boundaries and within-species lineages boundaries can be discriminated across the entire system, and species identities can be assigned to the remaining lineages of unknown affinities with quantified probabilities. In addition to the identification of species units in nature, the primary goal of species delimitation, the incorporation of a speciation model also allows us insights into the links between population and species-level processes. By explicitly accounting for restrictions in gene flow not only between, but also within, species, we also address the limits of genetic data for delimiting species. Specifically, while genetic data alone is not sufficient for accurate delimitation, when considered in conjunction with other information we are able to not only learn about species boundaries, but also about the tempo of the speciation process itself. 
    more » « less
  2. Abstract

    Species delimitation is an imperative first step toward understanding Earth's biodiversity, yet what constitutes a species and the relative importance of the various processes by which new species arise continue to be debatable. Species delimitation in spiders has traditionally used morphological characters; however, certain mygalomorph spiders exhibit morphological homogeneity despite long periods of population‐level isolation, absence of gene flow, and consequent high degrees of molecular divergence. Studies have shown strong geographic structuring and significant genetic divergence among several species complexes within the trapdoor spider genusAptostichus, most of which are restricted to the California Floristic Province (CAFP) biodiversity hotspot. Specifically, theAptostichus icenogleicomplex, which comprises the three sibling species,A. barackobamai,A. isabella, andA. icenoglei, exhibits evidence of cryptic mitochondrial DNA diversity throughout their ranges in Northern, Central, and Southern California. Our study aimed to explicitly test species hypotheses within this assemblage by implementing a cohesion species‐based approach. We used genomic‐scale data (ultraconserved elements, UCEs) to first evaluate genetic exchangeability and then assessed ecological interchangeability of genetic lineages. Biogeographical analysis was used to assess the likelihood of dispersal versus vicariance events that may have influenced speciation pattern and process across the CAFP's complex geologic and topographic landscape. Considering the lack of congruence across data types and analyses, we take a more conservative approach by retaining species boundaries withinA. icenoglei.

     
    more » « less
  3. Abstract

    Most new cryptic species are described using conventional tree‐ and distance‐based species delimitation methods (SDMs), which rely on phylogenetic arrangements and measures of genetic divergence. However, although numerous factors such as population structure and gene flow are known to confound phylogenetic inference and species delimitation, the influence of these processes is not frequently evaluated. Using large numbers of exons, introns, and ultraconserved elements obtained using the FrogCap sequence‐capture protocol, we compared conventional SDMs with more robust genomic analyses that assess population structure and gene flow to characterize species boundaries in a Southeast Asian frog complex (Pulchrana picturata). Our results showed that gene flow and introgression can produce phylogenetic patterns and levels of divergence that resemble distinct species (up to 10% divergence in mitochondrial DNA). Hybrid populations were inferred as independent (singleton) clades that were highly divergent from adjacent populations (7%–10%) and unusually similar (<3%) to allopatric populations. Such anomalous patterns are not uncommon in Southeast Asian amphibians, which brings into question whether the high levels of cryptic diversity observed in other amphibian groups reflect distinct cryptic species—or, instead, highly admixed and structured metapopulation lineages. Our results also provide an alternative explanation to the conundrum of divergent (sometimes nonsister) sympatric lineages—a pattern that has been celebrated as indicative of true cryptic speciation. Based on these findings, we recommend that species delimitation of continuously distributed “cryptic” groups should not rely solely on conventional SDMs, but should necessarily examine population structure and gene flow to avoid taxonomic inflation.

     
    more » « less
  4. Abstract

    Amidst the rapid advancement in next‐generation sequencing (NGS) technology over the last few years, salamanders have been left behind. Salamanders have enormous genomes—up to 40 times the size of the human genome—and this poses challenges to generatingNGSdata sets of quality and quantity similar to those of other vertebrates. However, optimization of laboratory protocols is time‐consuming and often cost prohibitive, and continued omission of salamanders from novel phylogeographic research is detrimental to species facing decline. Here, we use a salamander endemic to the southeastern United States,Plethodon serratus, to test the utility of an established protocol for sequence capture of ultraconserved elements (UCEs) in resolving intraspecific phylogeographic relationships and delimiting cryptic species. Without modifying the standard laboratory protocol, we generated a data set consisting of over 600 million reads for 85P. serratussamples. Species delimitation analyses support recognition of seven species withinP. serratus sensu lato, and all phylogenetic relationships among the seven species are fully resolved under a coalescent model. Results also corroborate previous data suggesting nonmonophyly of the Ouachita and Louisiana regions. Our results demonstrate that establishedUCEprotocols can successfully be used in phylogeographic studies of salamander species, providing a powerful tool for future research on evolutionary history of amphibians and other organisms with large genomes.

     
    more » « less
  5. Abstract

    Sympatric diversification is recognized to have played an important role in the evolution of biodiversity. However, an in situ sympatric origin for codistributed taxa is difficult to demonstrate because different evolutionary processes can lead to similar biogeographic outcomes, especially in ecosystems that can readily facilitate secondary contact due to a lack of hard barriers to dispersal. Here we use a genomic (ddRADseq), model‐based approach to delimit a species complex of tropical sea anemones that are codistributed on coral reefs throughout the Tropical Western Atlantic. We use coalescent simulations infastsimcoal2and ordinary differential equations inMomentsto test competing diversification scenarios that span the allopatric‐sympatric continuum. Our results suggest that the corkscrew sea anemoneBartholomea annulatais a cryptic species complex whose members are codistributed throughout their range. Simulation and model selection analyses from both approaches suggest these lineages experienced historical and contemporary gene flow, supporting a sympatric origin, but an alternative secondary contact model receives appreciable model support infastsimcoal2. Leveraging the genome of the closely relatedExaiptasia diaphana,we identify five loci under divergent selection between crypticB. annulatalineages that fall within mRNA transcripts or CDS regions. Our study provides a rare empirical, genomic example of sympatric speciation in a tropical anthozoan and the first range‐wide molecular study of a tropical sea anemone, underscoring that anemone diversity is under‐described in the tropics, and highlighting the need for additional systematic studies into these ecologically and economically important species.

     
    more » « less