skip to main content

This content will become publicly available on August 19, 2023

Title: Recombination-aware phylogeographic inference using the structured coalescent with ancestral recombination
Movement of individuals between populations or demes is often restricted, especially between geographically isolated populations. The structured coalescent provides an elegant theoretical framework for describing how movement between populations shapes the genealogical history of sampled individuals and thereby structures genetic variation within and between populations. However, in the presence of recombination an individual may inherit different regions of their genome from different parents, resulting in a mosaic of genealogical histories across the genome, which can be represented by an Ancestral Recombination Graph (ARG). In this case, different genomic regions may have different ancestral histories and so different histories of movement between populations. Recombination therefore poses an additional challenge to phylogeographic methods that aim to reconstruct the movement of individuals from genealogies, although also a potential benefit in that different loci may contain additional information about movement. Here, we introduce the Structured Coalescent with Ancestral Recombination (SCAR) model, which builds on recent approximations to the structured coalescent by incorporating recombination into the ancestry of sampled individuals. The SCAR model allows us to infer how the migration history of sampled individuals varies across the genome from ARGs, and improves estimation of key population genetic parameters such as population sizes, recombination rates and more » migration rates. Using the SCAR model, we explore the potential and limitations of phylogeographic inference using full ARGs. We then apply the SCAR to lineages of the recombining fungus Aspergillus flavus sampled across the United States to explore patterns of recombination and migration across the genome. « less
; ;
Schiffels, Stephan
Award ID(s):
Publication Date:
Journal Name:
PLOS Computational Biology
Page Range or eLocation-ID:
Sponsoring Org:
National Science Foundation
More Like this
  1. Lohmueller, Kirk (Ed.)
    Abstract The levels and distribution of standing genetic variation in a genome can provide a wealth of insights about the adaptive potential, demographic history, and genome structure of a population or species. As structural variants are increasingly associated with traits important for adaptation and speciation, investigating both sequence and structural variation is essential for wholly tapping this potential. Using a combination of shotgun sequencing, 10x Genomics linked reads and proximity-ligation data (Chicago and Hi-C), we produced and annotated a chromosome-level genome assembly for the Atlantic silverside (Menidia menidia)—an established ecological model for studying the phenotypic effects of natural and artificial selection—and examined patterns of genomic variation across two individuals sampled from different populations with divergent local adaptations. Levels of diversity varied substantially across each chromosome, consistently being highly elevated near the ends (presumably near telomeric regions) and dipping to near zero around putative centromeres. Overall, our estimate of the genome-wide average heterozygosity in the Atlantic silverside is among the highest reported for a fish, or any vertebrate (1.32–1.76% depending on inference method and sample). Furthermore, we also found extreme levels of structural variation, affecting ∼23% of the total genome sequence, including multiple large inversions (> 1 Mb and up to 12.6 Mb)more »associated with previously identified haploblocks showing strong differentiation between locally adapted populations. These extreme levels of standing genetic variation are likely associated with large effective population sizes and may help explain the remarkable adaptive divergence among populations of the Atlantic silverside.« less
  2. Smith, Stephen (Ed.)
    Abstract Understanding how gene flow affects population divergence and speciation remains challenging. Differentiating one evolutionary process from another can be difficult because multiple processes can produce similar patterns, and more than one process can occur simultaneously. Although simple population models produce predictable results, how these processes balance in taxa with patchy distributions and complicated natural histories is less certain. These types of populations might be highly connected through migration (gene flow), but can experience stronger effects of genetic drift and inbreeding, or localized selection. Although different signals can be difficult to separate, the application of high-throughput sequence data can provide the resolution necessary to distinguish many of these processes. We present whole-genome sequence data for an avian species group with an alpine and arctic tundra distribution to examine the role that different population genetic processes have played in their evolutionary history. Rosy-finches inhabit high elevation mountaintop sky islands and high-latitude island and continental tundra. They exhibit extensive plumage variation coupled with low levels of genetic variation. Additionally, the number of species within the complex is debated, making them excellent for studying the forces involved in the process of diversification, as well as an important species group in which to investigatemore »species boundaries. Total genomic variation suggests a broadly continuous pattern of allele frequency changes across the mainland taxa of this group in North America. However, phylogenomic analyses recover multiple distinct, well supported, groups that coincide with previously described morphological variation and current species-level taxonomy. Tests of introgression using D-statistics and approximate Bayesian computation reveal significant levels of introgression between multiple North American taxa. These results provide insight into the balance between divergent and homogenizing population genetic processes and highlight remaining challenges in interpreting conflict between different types of analytical approaches with whole-genome sequence data. [ABBA-BABA; approximate Bayesian computation; gene flow; phylogenomics; speciation; whole-genome sequencing.]« less
  3. Abstract

    Accurate estimates of the rate of recombination are key to understanding a host of evolutionary processes as well as the evolution of the recombination rate itself. Model-based population genetic methods that infer recombination rates from patterns of linkage disequilibrium in the genome have become a popular method to estimate rates of recombination. However, these linkage disequilibrium-based methods make a variety of simplifying assumptions about the populations of interest that are often not met in natural populations. One such assumption is the absence of gene flow from other populations. Here, we use forward-time population genetic simulations of isolation-with-migration scenarios to explore how gene flow affects the accuracy of linkage disequilibrium-based estimators of recombination rate. We find that moderate levels of gene flow can result in either the overestimation or underestimation of recombination rates by up to 20–50% depending on the timing of divergence. We also find that these biases can affect the detection of interpopulation differences in recombination rate, causing both false positives and false negatives depending on the scenario. We discuss future possibilities for mitigating these biases and recommend that investigators exercise caution and confirm that their study populations meet assumptions before deploying these methods.

  4. Falush, Daniel (Ed.)
    Abstract Meiotic recombination is an important evolutionary force and an essential meiotic process. In many species, recombination events concentrate into hotspots defined by the site-specific binding of PRMD9. Rapid evolution of Prdm9's zinc finger DNA-binding array leads to remarkably abrupt shifts in the genomic distribution of hotspots between species, but the question of how Prdm9 allelic variation shapes the landscape of recombination between populations remains less well understood. Wild house mice (Mus musculus) harbor exceptional Prdm9 diversity, with >150 alleles identified to date, and pose a particularly powerful system for addressing this open question. We employed a coalescent-based approach to construct broad- and fine-scale sex-averaged recombination maps from contemporary patterns of linkage disequilibrium in nine geographically isolated wild house mouse populations, including multiple populations from each of three subspecies. Comparing maps between wild mouse populations and subspecies reveals several themes. First, we report weak fine- and broad-scale recombination map conservation across subspecies and populations, with genetic divergence offering no clear prediction for recombination map divergence. Second, most hotspots are unique to one population, an outcome consistent with minimal sharing of Prdm9 alleles between surveyed populations. Finally, by contrasting aggregate hotspot activity on the X versus autosomes, we uncover evidence formore »population-specific differences in the degree and direction of sex dimorphism for recombination. Overall, our findings illuminate the variability of both the broad- and fine-scale recombination landscape in M. musculus and underscore the functional impact of Prdm9 allelic variation in wild mouse populations.« less
  5. SARS-CoV-2 has caused symptomatic COVID-19 and widespread death across the globe. We sought to determine genetic variants contributing to COVID-19 susceptibility and hospitalization in a large biobank linked to a national United States health system. We identified 19,168 (3.7%) lab-confirmed COVID-19 cases among Million Veteran Program participants between March 1, 2020, and February 2, 2021, including 11,778 Whites, 4,893 Blacks, and 2,497 Hispanics. A multi-population genome-wide association study (GWAS) for COVID-19 outcomes identified four independent genetic variants (rs8176719, rs73062389, rs60870724, and rs73910904) contributing to COVID-19 positivity, including one novel locus found exclusively among Hispanics. We replicated eight of nine previously reported genetic associations at an alpha of 0.05 in at least one population-specific or the multi-population meta-analysis for one of the four MVP COVID-19 outcomes. We used rs8176719 and three additional variants to accurately infer ABO blood types. We found that A, AB, and B blood types were associated with testing positive for COVID-19 compared with O blood type with the highest risk for the A blood group. We did not observe any genome-wide significant associations for COVID-19 severity outcomes among those testing positive. Our study replicates prior GWAS findings associated with testing positive for COVID-19 among mostly White samplesmore »and extends findings at three loci to Black and Hispanic individuals. We also report a new locus among Hispanics requiring further investigation. These findings may aid in the identification of novel therapeutic agents to decrease the morbidity and mortality of COVID-19 across all major ancestral populations.« less