skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, January 16 until 2:00 AM ET on Friday, January 17 due to maintenance. We apologize for the inconvenience.


Title: An empirical pipeline for choosing the optimal clustering threshold in RADseq studies
Abstract

Genomic data are increasingly used for high resolution population genetic studies including those at the forefront of biological conservation. A key methodological challenge is determining sequence similarity clustering thresholds for RADseq data when no reference genome is available. These thresholds define the maximum permitted divergence among allelic variants and the minimum divergence among putative paralogues and are central to downstream population genomic analyses. Here we develop a novel set of metrics to determine sequence similarity thresholds that maximize the correct separation of paralogous regions and minimize oversplitting naturally occurring allelic variation within loci. These metrics empirically identify the threshold value at which true alleles at opposite ends of several major axes of genetic variation begin to incorrectly separate into distinct clusters, allowing researchers to choose thresholds just below this value. We test our approach on a recently published data set for the protected foothill yellow‐legged frog (Rana boylii). The metrics recover a consistent pattern of roughly 96% similarity as a threshold above which genetic divergence and data missingness become increasingly correlated. We provide scripts for assessing different clustering thresholds and discuss how this approach can be applied across a wide range of empirical data sets.

 
more » « less
PAR ID:
10455838
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Molecular Ecology Resources
Volume:
19
Issue:
5
ISSN:
1755-098X
Page Range / eLocation ID:
p. 1195-1204
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Genetic structure can be influenced by local adaptation to environmental heterogeneity and biogeographic barriers, resulting in discrete population clusters. Geographic distance among populations, however, can result in continuous clines of genetic divergence that appear as structured populations. Here, we evaluate the relevant importance of these three factors over a landscape characterized by environmental heterogeneity and the presence of a hypothesized biogeographic barrier in producing population genetic structure within 13 codistributed snake species using a genomic data set. We demonstrate that geographic distance and environmental heterogeneity across western North America contribute to population genomic divergence. Surprisingly, landscape features long thought to contribute to biogeographic barriers play little role in divergence community wide. Our results suggest that isolation by environment is the most important contributor to genomic divergence. Furthermore, we show that models of population clustering that incorporate spatial information consistently outperform nonspatial models, demonstrating the importance of considering geographic distances in population clustering. We argue that environmental and geographic distances as drivers of community‐wide divergence should be explored before assuming the role of biogeographic barriers.

     
    more » « less
  2. Abstract

    Multi‐locus sequence data are widely used in fungal systematic and taxonomic studies to delimit species and infer evolutionary relationships. We developed and assessed the efficacy of a multi‐locus pooled sequencing method using PacBio long‐read high‐throughput sequencing. Samples included fresh and dried voucher specimens, cultures and archival DNA extracts of Agaricomycetes with an emphasis on the order Cantharellales. Of the 283 specimens sequenced, 93.6% successfully amplified at one or more loci with a mean of 3.3 loci amplified. Our method recovered multiple sequence variants representing alleles of rDNA loci and single copy protein‐coding genesrpb1,rpb2 andtef1. Within‐sample genetic variation differed by locus and taxonomic group, with the greatest genetic divergence observed among sequence variants ofrpb2 andtef1 from corticioid Cantharellales. Our method is a cost‐effective approach for generating accurate multi‐locus sequence data coupled with recovery of alleles from polymorphic samples and multi‐organism specimens. These results have important implications for understanding intra‐individual genomic variation among genetic loci commonly used in species delimitation of fungi.

     
    more » « less
  3. The ways in which genetic variation is distributed within and among populations is a key determinant of the evolutionary features of a species. However, most comprehensive studies of these features have been restricted to studies of subdivision in settings known to have been driven by local adaptation, leaving our understanding of the natural dispersion of allelic variation less than ideal. Here, we present a geographic population-genomic analysis of 10 populations of the freshwater microcrustacean Daphnia pulex, an emerging model system in evolutionary genomics. These populations exhibit a pattern of moderate isolation-by-distance, with an average migration rate of 0.6 individuals per generation, and average effective population sizes of ∼650,000 individuals. Most populations contain numerous private alleles, and genomic scans highlight the presence of islands of excessively high population subdivision for more common alleles. A large fraction of such islands of population divergence likely reflect historical neutral changes, including rare stochastic migration and hybridization events. The data do point to local adaptive divergence, although the precise nature of the relevant variation is diffuse and cannot be associated with particular loci, despite the very large sample sizes involved in this study. In contrast, an analysis of between-species divergence highlights positive selection operating on a large set of genes with functions nearly nonoverlapping with those involved in local adaptation, in particular ribosome structure, mitochondrial bioenergetics, light reception and response, detoxification, and gene regulation. These results set the stage for using D. pulex as a model for understanding the relationship between molecular and cellular evolution in the context of natural environments. 
    more » « less
  4. Abstract

    The successes of introduced populations in novel habitats often provide powerful examples of evolution and adaptation. In the 1950s, opossum shrimp (Mysis diluviana) individuals from Clearwater Lake in Minnesota, USA were transported and introduced to Twin Lakes in Colorado, USA by fisheries managers to supplement food sources for trout.Mysiswere subsequently introduced from Twin Lakes into numerous lakes throughout Colorado. Because managers kept detailed records of the timing of the introductions, we had the opportunity to test for evolutionary divergence within a known time interval. Here, we used reduced representation genomic data to investigate patterns of genetic diversity, test for genetic divergence between populations, and for evidence of adaptive evolution within the introduced populations in Colorado. We found very low levels of genetic diversity across all populations, with evidence for some genetic divergence between the Minnesota source population and the introduced populations in Colorado. There was little differentiation among the Colorado populations, consistent with the known provenance of a single founding population, with the exception of the population from Gross Reservoir, Colorado. Demographic modeling suggests that at least one undocumented introduction from an unknown source population hybridized with the population in Gross Reservoir. Despite the overall low genetic diversity we observed,FSToutlier and environmental association analyses identified multiple loci exhibiting signatures of selection and adaptive variation related to elevation and lake depth. The success of introduced species is thought to be limited by genetic variation, but our results imply that populations with limited genetic variation can become established in a wide range of novel environments. From an applied perspective, the observed patterns of divergence between populations suggest that genetic analysis can be a useful forensic tool to determine likely sources of invasive species.

     
    more » « less
  5. Premise

    Divergence depends on the strength of selection and frequency of gene flow between taxa, while reproductive isolation relies on mating barriers and geographic distance. Less is known about how these processes interact at early stages of speciation. Here, we compared population‐level differentiation in floral phenotype and genetic sequence variation among recently divergedCastillejato explore patterns of diversification under different scenarios of reproductive isolation.

    Methods

    Using target enrichment enabled by the Angiosperms353 probe set, we assessed genetic distance among 50 populations of fourCastillejaspecies. We investigated whether patterns of genetic divergence are explained by floral trait variation or geographic distance in two focal groups: the widespreadC. sessilifloraand the more restrictedC. purpureaspecies complex.

    Results

    We document thatC. sessilifloraand theC. purpureacomplex are characterized by high diversity in floral color across varying geographic scales. Despite phenotypic divergence, groups were not well supported in phylogenetic analyses, and little genetic differentiation was found across targeted Angiosperms353 loci. Nonetheless, a principal coordinate analysis of single nucleotide polymorphisms revealed differentiation withinC. sessilifloraacross floral morphs and geography and less differentiation among species of theC. purpureacomplex.

    Conclusions

    Patterns of genetic distance inC. sessiliflorasuggest species cohesion maintained over long distances despite variation in floral traits. In theC. purpureacomplex, divergence in floral color across narrow geographic clines may be driven by recent selection on floral color. These contrasting patterns of floral and genetic differentiation reveal that divergence can arise via multiple eco‐evolutionary paths.

     
    more » « less