skip to main content


Title: learnPopGen : An R package for population genetic simulation and numerical analysis
Abstract

Here, I briefly present a new R package calledlearnPopGenthat has been designed primarily for the purposes of teaching evolutionary biology, population genetics, and evolutionary theory. Functions of the package can be used to conduct simulations and numerical analyses of a wide range of evolutionary phenomena that would typically be covered in advanced undergraduate through graduate‐level curricula in population genetics or evolution. For instance,learnPopGenfunctions can be used to visualize gene frequency changes through time under multiple deterministic and stochastic processes, to compute and animate the changes in phenotypic trait values or distributions under natural selection, to numerically analyze and graph the outcome of simple game theory models, and to plot coalescence within a population experiencing genetic drift, along with a number of other things. Functions have been designed to be maximally didactic and frequently employ compelling animated visualizations. Furthermore, it is straightforward to export plots and animations from R in the form of flat or animated graphics, or as videos. For maximum flexibility, students working with the package can run functions directly in R; however, instructors may choose to guide students less adept in the R environment to one of various web interfaces that I have built for a number of the functions of the package and that are already available online.

 
more » « less
Award ID(s):
1759940 1350474
NSF-PAR ID:
10461330
Author(s) / Creator(s):
 
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Ecology and Evolution
Volume:
9
Issue:
14
ISSN:
2045-7758
Page Range / eLocation ID:
p. 7896-7902
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Earth's biosphere is undergoing drastic reorganization due to the sixth mass extinction brought on by the Anthropocene. Impacts of local and regional extirpation of species have been demonstrated to propagate through the complex interaction networks they are part of, leading to secondary extinctions and exacerbating biodiversity loss. Contemporary ecological theory has developed several measures to analyse the structure and robustness of ecological networks under biodiversity loss. However, a toolbox for directly simulating and quantifying extinction cascades and creating novel interactions (i.e. rewiring) remains absent.

    Here, we presentNetworkExtinction—a novel R package which we have developed to explore the propagation of species extinction sequences through ecological networks and quantify the effects of rewiring potential in response to primary species extinctions. WithNetworkExtinction, we integrate ecological theory and computational simulations to develop functionality with which users may analyse and visualize the structure and robustness of ecological networks. The core functions introduced withNetworkExtinctionfocus on simulations of sequential primary extinctions and associated secondary extinctions, allowing user‐specified secondary extinction thresholds and realization of rewiring potential.

    With the packageNetworkExtinction, users can estimate the robustness of ecological networks after performing species extinction routines based on several algorithms. Moreover, users can compare the number of simulated secondary extinctions against a null model of random extinctions. In‐built visualizations enable graphing topological indices calculated by the deletion sequence functions after each simulation step. Finally, the user can estimate the network's degree distribution by fitting different common distributions. Here, we illustrate the use of the package and its outputs by analysing a Chilean coastal marine food web.

    NetworkExtinctionis a compact and easy‐to‐use R package with which users can quantify changes in ecological network structure in response to different patterns of species loss, thresholds and rewiring potential. Therefore, this package is particularly useful for evaluating ecosystem responses to anthropogenic and environmental perturbations that produce nonrandom and sometimes targeted, species extinctions.

     
    more » « less
  2. Abstract

    “Evolve and resequence” (E&R) studies combine experimental evolution and whole‐genome sequencing to interrogate the genetics underlying adaptation. Due to ease of handling, E&R work with asexual organisms such as bacteria can employ optimized experimental design, with large experiments and many generations of selection. By contrast, E&R experiments with sexually reproducing organisms are more difficult to implement, and design parameters vary dramatically among studies. Thus, efforts have been made to assess how these differences, such as number of independent replicates, or size of experimental populations, impact inference. We add to this work by investigating the role of time sampling—the number of discrete time points sequence data are collected from evolving populations. Using data from an E&R experiment with outcrossingSaccharomyces cerevisiaein which populations were sequenced 17 times over ~540 generations, we address the following questions: (a) Do more time points improve the ability to identify candidate regions underlying selection? And (b) does high‐resolution sampling provide unique insight into evolutionary processes driving adaptation? We find that while time sampling does not improve the ability to identify candidate regions, high‐resolution sampling does provide valuable opportunities to characterize evolutionary dynamics. Increased time sampling reveals three distinct trajectories for adaptive alleles: one consistent with classic population genetic theory (i.e., models assuming constant selection coefficients), and two where trajectories suggest more context‐dependent responses (i.e., models involving dynamic selection coefficients). We conclude that while time sampling has limited impact on candidate region identification, sampling eight or more time points has clear benefits for studying complex evolutionary dynamics.

     
    more » « less
  3. null (Ed.)
    Abstract Stochastic models of character trait evolution have become a cornerstone of evolutionary biology in an array of contexts. While probabilistic models have been used extensively for statistical inference, they have largely been ignored for the purpose of measuring distances between phylogeny-aware models. Recent contributions to the problem of phylogenetic distance computation have highlighted the importance of explicitly considering evolutionary model parameters and their impacts on molecular sequence data when quantifying dissimilarity between trees. By comparing two phylogenies in terms of their induced probability distributions that are functions of many model parameters, these distances can be more informative than traditional approaches that rely strictly on differences in topology or branch lengths alone. Currently, however, these approaches are designed for comparing models of nucleotide substitution and gene tree distributions, and thus, are unable to address other classes of traits and associated models that may be of interest to evolutionary biologists. Here we expand the principles of probabilistic phylogenetic distances to compute tree distances under models of continuous trait evolution along a phylogeny. By explicitly considering both the degree of relatedness among species and the evolutionary processes that collectively give rise to character traits, these distances provide a foundation for comparing models and their predictions, and for quantifying the impacts of assuming one phylogenetic background over another while studying the evolution of a particular trait. We demonstrate the properties of these approaches using theory, simulations, and several empirical datasets that highlight potential uses of probabilistic distances in many scenarios. We also introduce an open-source R package named PRDATR for easy application by the scientific community for computing phylogenetic distances under models of character trait evolution. 
    more » « less
  4. Birol, Inanc (Ed.)
    Abstract Motivation Linking microbial community members to their ecological functions is a central goal of environmental microbiology. When assigned taxonomy, amplicon sequences of metabolic marker genes can suggest such links, thereby offering an overview of the phylogenetic structure underpinning particular ecosystem functions. However, inferring microbial taxonomy from metabolic marker gene sequences remains a challenge, particularly for the frequently sequenced nitrogen fixation marker gene, nitrogenase reductase (nifH). Horizontal gene transfer in recent nifH evolutionary history can confound taxonomic inferences drawn from the pairwise identity methods used in existing software. Other methods for inferring taxonomy are not standardized and require manual inspection that is difficult to scale. Results We present Phylogenetic Placement for Inferring Taxonomy (PPIT), an R package that infers microbial taxonomy from nifH amplicons using both phylogenetic and sequence identity approaches. After users place query sequences on a reference nifH gene tree provided by PPIT (n = 6317 full-length nifH sequences), PPIT searches the phylogenetic neighborhood of each query sequence and attempts to infer microbial taxonomy. An inference is drawn only if references in the phylogenetic neighborhood are: (1) taxonomically consistent and (2) share sufficient pairwise identity with the query, thereby avoiding erroneous inferences due to known horizontal gene transfer events. We find that PPIT returns a higher proportion of correct taxonomic inferences than BLAST-based approaches at the cost of fewer total inferences. We demonstrate PPIT on deep-sea sediment and find that Deltaproteobacteria are the most abundant potential diazotrophs. Using this dataset we show that emending PPIT inferences based on visual inspection of query sequence placement can achieve taxonomic inferences for nearly all sequences in a query set. We additionally discuss how users can apply PPIT to the analysis of other marker genes. Availability PPIT is freely available to non-commercial users at https://github.com/bkapili/ppit. Installation includes a vignette that demonstrates package use and reproduces the nifH amplicon analysis discussed here. The raw nifH amplicon sequence data have been deposited in the GenBank, EMBL, and DDBJ databases under BioProject number PRJEB37167. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  5. Abstract

    We introduce a new R package “MrIML” (“Mister iml”; Multi‐response Interpretable Machine Learning). MrIML provides a powerful and interpretable framework that enables users to harness recent advances in machine learning to quantify multilocus genomic relationships, to identify loci of interest for future landscape genetics studies, and to gain new insights into adaptation across environmental gradients. Relationships between genetic variation and environment are often nonlinear and interactive; these characteristics have been challenging to address using traditional landscape genetic approaches. Our package helps capture this complexity and offers functions that fit and interpret a wide range of highly flexible models that are routinely used for single‐locus landscape genetics studies but are rarely extended to estimate response functions for multiple loci. To demonstrate the package's broad functionality, we test its ability to recover landscape relationships from simulated genomic data. We also apply the package to two empirical case studies. In the first, we model genetic variation of North American balsam poplar (Populus balsamifera, Salicaceae) populations across environmental gradients. In the second case study, we recover the landscape and host drivers of feline immunodeficiency virus genetic variation in bobcats (Lynx rufus). The ability to model thousands of loci collectively and compare models from linear regression to extreme gradient boosting, within the same analytical framework, has the potential to be transformative. The MrIML framework is also extendable and not limited to modelling genetic variation; for example, it can quantify the environmental drivers of microbiomes and coinfection dynamics.

     
    more » « less