skip to main content


Title: Geonomics: Forward-Time, Spatially Explicit, and Arbitrarily Complex Landscape Genomic Simulations
Abstract Understanding the drivers of spatial patterns of genomic diversity has emerged as a major goal of evolutionary genetics. The flexibility of forward-time simulation makes it especially valuable for these efforts, allowing for the simulation of arbitrarily complex scenarios in a way that mimics how real populations evolve. Here, we present Geonomics, a Python package for performing complex, spatially explicit, landscape genomic simulations with full spatial pedigrees that dramatically reduces user workload yet remains customizable and extensible because it is embedded within a popular, general-purpose language. We show that Geonomics results are consistent with expectations for a variety of validation tests based on classic models in population genetics and then demonstrate its utility and flexibility with a trio of more complex simulation scenarios that feature polygenic selection, selection on multiple traits, simulation on complex landscapes, and nonstationary environmental change. We then discuss runtime, which is primarily sensitive to landscape raster size, memory usage, which is primarily sensitive to maximum population size and recombination rate, and other caveats related to the model’s methods for approximating recombination and movement. Taken together, our tests and demonstrations show that Geonomics provides an efficient and robust platform for population genomic simulations that capture complex spatial and evolutionary dynamics.  more » « less
Award ID(s):
1845682
NSF-PAR ID:
10322846
Author(s) / Creator(s):
; ;
Editor(s):
Wilson, Melissa
Date Published:
Journal Name:
Molecular Biology and Evolution
Volume:
38
Issue:
10
ISSN:
1537-1719
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Genetic connectivity lies at the heart of evolutionary theory, and landscape genetics has rapidly advanced to understand how gene flow can be impacted by the environment. Isolation by landscape resistance, often inferred through the use of circuit theory, is increasingly identified as being critical for predicting genetic connectivity across complex landscapes. Yet landscape impediments to migration can arise from fundamentally different processes, such as landscape gradients causing directional migration and mortality during migration, which can be challenging to address. Spatial absorbing Markov chains (SAMC) have been introduced to understand and predict these (and other) processes affecting connectivity in ecological settings, but the relationship of this framework to landscape genetics remains unclear. Here, we relate the SAMC to population genetics theory, provide simulations to interpret the extent to which the SAMC can predict genetic metrics and demonstrate how the SAMC can be applied to genomic data using an example with an endangered species, the Panama City crayfish Procambarus econfinae , where directional migration is hypothesized to occur. The use of the SAMC for landscape genetics can be justified based on similar grounds to using circuit theory, as we show how circuit theory is a special case of this framework. The SAMC can extend circuit‐theoretic connectivity modelling by quantifying both directional resistance to migration and acknowledging the difference between migration mortality and resistance to migration. Our empirical example highlights that the SAMC better predicts population structure than circuit theory and least‐cost analysis by acknowledging asymmetric environmental gradients (i.e. slope) and migration mortality in this species. These results provide a foundation for applying the SAMC to landscape genetics. This framework extends isolation‐by‐resistance modelling to account for some common processes that can impact gene flow, which can improve predicting genetic connectivity across complex landscapes. 
    more » « less
  2. Understanding the timescales on which different geologic processes influence genetic divergence is crucial to defining and testing geogenomic hypotheses and characterizing Earth- life evolution. To see if we can recover a genetic signal produced by a hypothetical physical barrier to gene flow, we used a geographically explicit simulation approach. We used the CDMetaPop software to simulate heritable genetic, nonadaptive, data for 20 geographically distinct populations distributed throughout the Baja California peninsula of Mexico, a landscape where a transpeninsular seaway barrier has been proposed to have isolated the southern peninsula and caused the observed latitudinal genetic divergence in over 80 terrestrial species. We simulated 10,000 generations of isolation by a barrier under two dispersal scenarios (1 km and 100 km of max. dispersal from population of origin per generation) and three DNA substitution rates (10-7, 10-8 and 10-9 nucleotide substitutions per site per generation). Our simulations indicate that a physical barrier can produce strong genetic divergence within 10,000 generations, comparable to the continuum of values observed in nature for different taxonomic groups and geological settings. We found that the generation time of the organism was by far the most important factor dictating the rate of divergence. Evaluating different generation times (0.02, 0.2, 2 and 20 years), showed that species with longer generation times require longer periods of isolation to accumulate genetic divergence over 10k generations (~1 My). Simulating 10,000 generations of gene flow following removal of the barrier showed that the divergence signal eroded quickly, in less than 1,000 generations in every scenario, a pattern supported by theory from population genetics. These results are particularly relevant to geogenomic studies because they show that ephemeral gene flow barriers produce different magnitudes of genetic signals depending on attributes of the organism, particularly generation time, and that if reproductive isolation is not achieved during isolation, then the evolutionary signal of an ephemeral barrier may not develop. This work helps guide the limits of detectability when integrating genomic data with geological and climatic processes. 
    more » « less
  3. A central goal of population genetics is to understand how genetic drift, natural selection, and gene flow shape allele frequencies through time. However, the actual processes underlying these changes—variation in individual survival, reproductive success, and movement—are often difficult to quantify. Fully understanding these processes requires the population pedigree, the set of relationships among all individuals in the population through time. Here, we use extensive pedigree and genomic information from a long-studied natural population of Florida Scrub-Jays (Aphelocoma coerulescens) to directly characterize the relative roles of different evolutionary processes in shaping patterns of genetic variation through time. We performed gene dropping simulations to estimate individual genetic contributions to the population and model drift on the known pedigree. We found that observed allele frequency changes are generally well predicted by accounting for the different genetic contributions of founders. Our results show that the genetic contribution of recent immigrants is substantial, with some large allele frequency shifts that otherwise may have been attributed to selection actually due to gene flow. We identified a few SNPs under directional short-term selection after appropriately accounting for gene flow. Using models that account for changes in population size, we partitioned the proportion of variance in allele frequency change through time. Observed allele frequency changes are primarily due to variation in survival and reproductive success, with gene flow making a smaller contribution. This study provides one of the most complete descriptions of short-term evolutionary change in allele frequencies in a natural population to date.

     
    more » « less
  4. Satta, Yoko (Ed.)
    Abstract Long-term balancing selection typically leaves narrow footprints of increased genetic diversity, and therefore most detection approaches only achieve optimal performances when sufficiently small genomic regions (i.e., windows) are examined. Such methods are sensitive to window sizes and suffer substantial losses in power when windows are large. Here, we employ mixture models to construct a set of five composite likelihood ratio test statistics, which we collectively term B statistics. These statistics are agnostic to window sizes and can operate on diverse forms of input data. Through simulations, we show that they exhibit comparable power to the best-performing current methods, and retain substantially high power regardless of window sizes. They also display considerable robustness to high mutation rates and uneven recombination landscapes, as well as an array of other common confounding scenarios. Moreover, we applied a specific version of the B statistics, termed B2, to a human population-genomic data set and recovered many top candidates from prior studies, including the then-uncharacterized STPG2 and CCDC169–SOHLH2, both of which are related to gamete functions. We further applied B2 on a bonobo population-genomic data set. In addition to the MHC-DQ genes, we uncovered several novel candidate genes, such as KLRD1, involved in viral defense, and SCN9A, associated with pain perception. Finally, we show that our methods can be extended to account for multiallelic balancing selection and integrated the set of statistics into open-source software named BalLeRMix for future applications by the scientific community. 
    more » « less
  5. Hancock, Angela (Ed.)
    Abstract

    Geographic barriers are frequently invoked to explain genetic structuring across the landscape. However, inferences on the spatial and temporal origins of population variation have been largely limited to evolutionary neutral models, ignoring the potential role of natural selection and intrinsic genomic processes known as genomic architecture in producing heterogeneity in differentiation across the genome. To test how variation in genomic characteristics (e.g. recombination rate) impacts our ability to reconstruct general patterns of differentiation between species that cooccur across geographic barriers, we sequenced the whole genomes of multiple bird populations that are distributed across rivers in southeastern Amazonia. We found that phylogenetic relationships within species and demographic parameters varied across the genome in predictable ways. Genetic diversity was positively associated with recombination rate and negatively associated with species tree support. Gene flow was less pervasive in genomic regions of low recombination, making these windows more likely to retain patterns of population structuring that matched the species tree. We further found that approximately a third of the genome showed evidence of selective sweeps and linked selection, skewing genome-wide estimates of effective population sizes and gene flow between populations toward lower values. In sum, we showed that the effects of intrinsic genomic characteristics and selection can be disentangled from neutral processes to elucidate spatial patterns of population differentiation.

     
    more » « less