skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Robust detection of natural selection using a probabilistic model of tree imbalance
Abstract Neutrality tests such as Tajima’s D and Fay and Wu’s H are standard implements in the population genetics toolbox. One of their most common uses is to scan the genome for signals of natural selection. However, it is well understood that D and H are confounded by other evolutionary forces—in particular, population expansion—that may be unrelated to selection. Because they are not model-based, it is not clear how to deconfound these tests in a principled way. In this article, we derive new likelihood-based methods for detecting natural selection, which are robust to fluctuations in effective population size. At the core of our method is a novel probabilistic model of tree imbalance, which generalizes Kingman’s coalescent to allow certain aberrant tree topologies to arise more frequently than is expected under neutrality. We derive a frequency spectrum-based estimator that can be used in place of D, and also extend to the case where genealogies are first estimated. We benchmark our methods on real and simulated data, and provide an open source software implementation.  more » « less
Award ID(s):
2052653
PAR ID:
10363456
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Genetics
Volume:
220
Issue:
3
ISSN:
1943-2631
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Kim, Yuseob (Ed.)
    Abstract Natural selection leaves a spatial pattern along the genome, with a haplotype distribution distortion near the selected locus that fades with distance. Evaluating the spatial signal of a population-genetic summary statistic across the genome allows for patterns of natural selection to be distinguished from neutrality. Considering the genomic spatial distribution of multiple summary statistics is expected to aid in uncovering subtle signatures of selection. In recent years, numerous methods have been devised that consider genomic spatial distributions across summary statistics, utilizing both classical machine learning and deep learning architectures. However, better predictions may be attainable by improving the way in which features are extracted from these summary statistics. We apply wavelet transform, multitaper spectral analysis, and S-transform to summary statistic arrays to achieve this goal. Each analysis method converts one-dimensional summary statistic arrays to two-dimensional images of spectral analysis, allowing simultaneous temporal and spectral assessment. We feed these images into convolutional neural networks and consider combining models using ensemble stacking. Our modeling framework achieves high accuracy and power across a diverse set of evolutionary settings, including population size changes and test sets of varying sweep strength, softness, and timing. A scan of central European whole-genome sequences recapitulated well-established sweep candidates and predicted novel cancer-associated genes as sweeps with high support. Given that this modeling framework is also robust to missing genomic segments, we believe that it will represent a welcome addition to the population-genomic toolkit for learning about adaptive processes from genomic data. 
    more » « less
  2. Abstract The expression of genomically-encoded information is not error-free. Transcript-error rates are dramatically higher than DNA-level mutation rates, and despite their transient nature, the steady-state load of such errors must impose some burden on cellular performance. However, a broad perspective on the degree to which transcript-error rates are constrained by natural selection and diverge among lineages remains to be developed. Here, we present a genome-wide analysis of transcript-error rates across the Tree of Life using a modified rolling-circle sequencing method, revealing that the range in error rates is remarkably narrow across diverse species. Transcript errors tend to be randomly distributed, with little evidence supporting local control of error rates associated with gene-expression levels. A majority of transcript errors result in missense errors if translated, and as with a fraction of nonsense transcript errors, these are underrepresented relative to random expectations, suggesting the existence of mechanisms for purging some such errors. To quantitatively understand how natural selection and random genetic drift might shape transcript-error rates across species, we present a model based on cell biology and population genetics, incorporating information on cell volume, proteome size, average degree of exposure of individual errors, and effective population size. However, while this model provides a framework for understanding the evolution of this highly conserved trait, as currently structured it explains only 20% of the variation in the data, suggesting a need for further theoretical work in this area. 
    more » « less
  3. Buerkle, Alex (Ed.)
    Inferences about past processes of adaptation and speciation require a gene-scale and genome-wide understanding of the evolutionary history of diverging taxa. In this study, we use genome-wide capture of nuclear gene sequences, plus skimming of organellar sequences, to investigate the phylogenomics of monkeyflowers in Mimulus section Erythranthe (27 accessions from seven species ) . Taxa within Erythranthe , particularly the parapatric and putatively sister species M . lewisii (bee-pollinated) and M . cardinalis (hummingbird-pollinated), have been a model system for investigating the ecological genetics of speciation and adaptation for over five decades. Across >8000 nuclear loci, multiple methods resolve a predominant species tree in which M . cardinalis groups with other hummingbird-pollinated taxa (37% of gene trees), rather than being sister to M . lewisii (32% of gene trees). We independently corroborate a single evolution of hummingbird pollination syndrome in Erythranthe by demonstrating functional redundancy in genetic complementation tests of floral traits in hybrids; together, these analyses overturn a textbook case of pollination-syndrome convergence. Strong asymmetries in allele sharing (Patterson’s D-statistic and related tests) indicate that gene tree discordance reflects ancient and recent introgression rather than incomplete lineage sorting. Consistent with abundant introgression blurring the history of divergence, low-recombination and adaptation-associated regions support the new species tree, while high-recombination regions generate phylogenetic evidence for sister status for M . lewisii and M . cardinalis . Population-level sampling of core taxa also revealed two instances of chloroplast capture, with Sierran M . lewisii and Southern Californian M . parishii each carrying organelle genomes nested within respective sympatric M . cardinalis clades. A recent organellar transfer from M . cardinalis , an outcrosser where selfish cytonuclear dynamics are more likely, may account for the unexpected cytoplasmic male sterility effects of selfer M . parishii organelles in hybrids with M . lewisii . Overall, our phylogenomic results reveal extensive reticulation throughout the evolutionary history of a classic monkeyflower radiation, suggesting that natural selection (re-)assembles and maintains species-diagnostic traits and barriers in the face of gene flow. Our findings further underline the challenges, even in reproductively isolated species, in distinguishing re-use of adaptive alleles from true convergence and emphasize the value of a phylogenomic framework for reconstructing the evolutionary genetics of adaptation and speciation. 
    more » « less
  4. Throughout the evolutionary tree, there are gains and losses of morphological features, physiological processes, and behavioral patterns. Losses are perhaps nowhere so prominent as for subterranean organisms, which typically show reductions or losses of eyes and pigment. These losses seem easy to explain without recourse to natural selection. Its most modern form is the accumulation of selectively neutral, structurally reducing mutations. Selectionist explanations include direct selection, often involving metabolic efficiency in resource poor subterranean environments, and pleiotropy, where genes affecting eyes and pigment have other effects, such as increasing extra-optic sensory structures. This dichotomy echoes the debate in evolutionary biology in general about the sufficiency of natural selection as an explanation of evolution, e.g., Kimura’s neutral mutation theory. Tests of the two hypotheses have largely been one-sided, with data supporting that one or the other processes is occurring. While these tests have utilized a variety of subterranean organisms, the Mexican cavefish,Astyanax mexicanus, which has eyed extant ancestral-like surface fish conspecifics, is easily bred in the lab, and whose whole genome has been sequenced, is the favored experimental organism. However, with few exceptions, tests for selection versus neutral mutations contain limitations or flaws. Notably, these tests are often one sided, testing for the presence of one or the other process. In fact, it is most likely that both processes occur and make a significant contribution to the two most studied traits in cave evolution: eye and pigment reduction. Furthermore, narrow focus on neutral mutation hypothesis versus selection to explain cave-evolved traits often fails, at least in the simplest forms of these hypotheses, to account for aspects that are likely essential for understanding cave evolution: migration or epigenetic effects. Further, epigenetic effects and phenotypic plasticity have been demonstrated to play an important role in cave evolution in recent studies. Phenotypic plasticity does not by itself result in genetic change of course, but plasticity can reveal cryptic genetic variation which then selection can act on. These processes may result in a radical change in our thinking about evolution of subterranean life, especially the speed with which it may occur. Thus, perhaps it is better to ask what role the interaction of genes and environment plays, in addition to natural selection and neutral mutation. 
    more » « less
  5. Abstract We derive precise asymptotic results that are directly usable for confidence intervals and Wald hypothesis tests for likelihood-based generalized linear mixed model analysis. The essence of our approach is to derive the exact leading term behaviour of the Fisher information matrix when both the number of groups and number of observations within each group diverge. This leads to asymptotic normality results with simple studentizable forms. Similar analyses result in tractable leading term forms for the determination of approximate locally D-optimal designs. 
    more » « less