skip to main content

Title: Neural networks enable efficient and accurate simulation-based inference of evolutionary parameters from adaptation dynamics
The rate of adaptive evolution depends on the rate at which beneficial mutations are introduced into a population and the fitness effects of those mutations. The rate of beneficial mutations and their expected fitness effects is often difficult to empirically quantify. As these 2 parameters determine the pace of evolutionary change in a population, the dynamics of adaptive evolution may enable inference of their values. Copy number variants (CNVs) are a pervasive source of heritable variation that can facilitate rapid adaptive evolution. Previously, we developed a locus-specific fluorescent CNV reporter to quantify CNV dynamics in evolving populations maintained in nutrient-limiting conditions using chemostats. Here, we use CNV adaptation dynamics to estimate the rate at which beneficial CNVs are introduced through de novo mutation and their fitness effects using simulation-based likelihood–free inference approaches. We tested the suitability of 2 evolutionary models: a standard Wright–Fisher model and a chemostat model. We evaluated 2 likelihood-free inference algorithms: the well-established Approximate Bayesian Computation with Sequential Monte Carlo (ABC-SMC) algorithm, and the recently developed Neural Posterior Estimation (NPE) algorithm, which applies an artificial neural network to directly estimate the posterior distribution. By systematically evaluating the suitability of different inference methods and models, we show that more » NPE has several advantages over ABC-SMC and that a Wright–Fisher evolutionary model suffices in most cases. Using our validated inference framework, we estimate the CNV formation rate at the GAP1 locus in the yeast Saccharomyces cerevisiae to be 10 −4.7 to 10 −4 CNVs per cell division and a fitness coefficient of 0.04 to 0.1 per generation for GAP1 CNVs in glutamine-limited chemostats. We experimentally validated our inference-based estimates using 2 distinct experimental methods—barcode lineage tracking and pairwise fitness assays—which provide independent confirmation of the accuracy of our approach. Our results are consistent with a beneficial CNV supply rate that is 10-fold greater than the estimated rates of beneficial single-nucleotide mutations, explaining the outsized importance of CNVs in rapid adaptive evolution. More generally, our study demonstrates the utility of novel neural network–based likelihood–free inference methods for inferring the rates and effects of evolutionary processes from empirical data with possible applications ranging from tumor to viral evolution. « less
; ; ; ; ;
de Visser, J. Arjan
Award ID(s):
Publication Date:
Journal Name:
PLOS Biology
Page Range or eLocation-ID:
Sponsoring Org:
National Science Foundation
More Like this
  1. Copy number variants (CNVs) are regions of the genome that vary in integer copy number. CNVs, which comprise both amplifications and deletions of DNA sequence, have been identified across all domains of life, from bacteria and archaea to plants and animals. CNVs are an important source of genetic diversity, and can drive rapid adaptive evolution and progression of heritable and somatic human diseases, such as cancer. However, despite their evolutionary importance and clinical relevance, CNVs remain understudied compared to single-nucleotide variants (SNVs). This is a consequence of the inherent difficulties in detecting CNVs at low-to-intermediate frequencies in heterogeneous populations of cells. Here, we discuss molecular methods used to detect CNVs, the limitations associated with using these techniques, and the application of new and emerging technologies that present solutions to these challenges. The goal of this short review and perspective is to highlight aspects of CNV biology that are understudied and define avenues for further research that address specific gaps in our knowledge of these complex alleles. We describe our recently developed method for CNV detection in which a fluorescent gene functions as a single-cell CNV reporter and present key findings from our evolution experiments in Saccharomyces cerevisiae. Using a CNVmore »reporter, we found that CNVs are generated at a high rate and undergo selection with predictable dynamics across independently evolving replicate populations. Many CNVs appear to be generated through DNA replication-based processes that are mediated by the presence of short, interrupted, inverted-repeat sequences. Our results have important implications for the role of CNVs in evolutionary processes and the molecular mechanisms that underlie CNV formation. We discuss the possible extension of our method to other applications, including tracking the dynamics of CNVs in models of human tumors.« less
  2. Mutations of small effect underlie most adaptation to new environments, but beneficial variants with large fitness effects are expected to contribute under certain conditions. Genes and genomic regions having large effects on phenotypic differences between populations are known from numerous taxa, but fitness effect sizes have rarely been estimated. We mapped fitness over a generation in an F2 intercross between a marine and a lake stickleback population introduced to a freshwater pond. A quantitative trait locus map of the number of surviving offspring per F2 female detected a single, large-effect locus nearEctodysplasin(Eda), a gene having an ancient freshwater allele causing reduced bony armor and other changes. F2 females homozygous for the freshwater allele had twice the number of surviving offspring as homozygotes for the marine allele, producing a large selection coefficient,s= 0.50 ± 0.09 SE. Correspondingly, the frequency of the freshwater allele increased from 0.50 in F2 mothers to 0.58 in surviving offspring. We compare these results to allele frequency changes at theEdagene in an Alaskan lake population colonized by marine stickleback in the 1980s. The frequency of the freshwaterEdaallele rose steadily over multiple generations and reached 95% within 20 y, yielding a similar estimate of selection,s= 0.49 ± 0.05,more »but a different degree of dominance. These findings are consistent with other studies suggesting strong selection on this gene (and/or linked genes) in fresh water. Selection on ancient genetic variants carried by colonizing ancestors is likely to increase the prevalence of large-effect fitness variants in adaptive evolution.

    « less
  3. Gaut, Brandon (Ed.)
    Abstract How microbes adapt to a novel environment is a central question in evolutionary biology. Although adaptive evolution must be fueled by beneficial mutations, whether higher mutation rates facilitate the rate of adaptive evolution remains unclear. To address this question, we cultured Escherichia coli hypermutating populations, in which a defective methyl-directed mismatch repair pathway causes a 140-fold increase in single-nucleotide mutation rates. In parallel with wild-type E. coli, populations were cultured in tubes containing Luria-Bertani broth, a complex medium known to promote the evolution of subpopulation structure. After 900 days of evolution, in three transfer schemes with different population-size bottlenecks, hypermutators always exhibited similar levels of improved fitness as controls. Fluctuation tests revealed that the mutation rates of hypermutator lines converged evolutionarily on those of wild-type populations, which may have contributed to the absence of fitness differences. Further genome-sequence analysis revealed that, although hypermutator populations have higher rates of genomic evolution, this largely reflects strong genetic linkage. Despite these linkage effects, the evolved population exhibits parallelism in fixed mutations, including those potentially related to biofilm formation, transcription regulation, and mutation-rate evolution. Together, these results are generally inconsistent with a hypothesized positive relationship between the mutation rate and the adaptive speedmore »of evolution, and provide insight into how clonal adaptation occurs in novel environments.« less
  4. Abstract Background

    Genetic barcoding provides a high-throughput way to simultaneously track the frequencies of large numbers of competing and evolving microbial lineages. However making inferences about the nature of the evolution that is taking place remains a difficult task.


    Here we describe an algorithm for the inference of fitness effects and establishment times of beneficial mutations from barcode sequencing data, which builds upon a Bayesian inference method by enforcing self-consistency between the population mean fitness and the individual effects of mutations within lineages. By testing our inference method on a simulation of 40,000 barcoded lineages evolving in serial batch culture, we find that this new method outperforms its predecessor, identifying more adaptive mutations and more accurately inferring their mutational parameters.


    Our new algorithm is particularly suited to inference of mutational parameters when read depth is low. We have made Python code for our serial dilution evolution simulations, as well as both the old and new inference methods, available on GitHub (, in the hope that it can find broader use by the microbial evolution community.

  5. Abstract

    Germline copy number variants (CNVs) and single-nucleotide polymorphisms (SNPs) form the basis of inter-individual genetic variation. Although the phenotypic effects of SNPs have been extensively investigated, the effects of CNVs is relatively less understood. To better characterize mechanisms by which CNVs affect cellular phenotype, we tested their association with variable CpG methylation in a genome-wide manner. Using paired CNV and methylation data from the 1000 genomes and HapMap projects, we identified genome-wide associations by methylation quantitative trait locus (mQTL) analysis. We found individual CNVs being associated with methylation of multiple CpGs and vice versa. CNV-associated methylation changes were correlated with gene expression. CNV-mQTLs were enriched for regulatory regions, transcription factor-binding sites (TFBSs), and were involved in long-range physical interactions with associated CpGs. Some CNV-mQTLs were associated with methylation of imprinted genes. Several CNV-mQTLs and/or associated genes were among those previously reported by genome-wide association studies (GWASs). We demonstrate that germline CNVs in the genome are associated with CpG methylation. Our findings suggest that structural variation together with methylation may affect cellular phenotype.