Abstract In a genetically admixed population, admixed individuals possess genealogical and genetic ancestry from multiple source groups. Under a mechanistic model of admixture, we study the number of distinct ancestors from the source populations that the admixture represents. Combining a mechanistic admixture model with a recombination model that describes the probability that a genealogical ancestor is a genetic ancestor, for a member of a genetically admixed population, we count genetic ancestors from the source populations—those genealogical ancestors from the source populations who contribute to the genome of the modern admixed individual. We compare patterns in the numbers of genealogical and genetic ancestors across the generations. To illustrate the enumeration of genetic ancestors from source populations in an admixed group, we apply the model to the African-American population, extending recent results on the numbers of African and European genealogical ancestors that contribute to the pedigree of an African-American chosen at random, so that we also evaluate the numbers of African and European genetic ancestors who contribute to random African-American genomes. The model suggests that the autosomal genome of a random African-American born in the interval 1960–1965 contains genetic contributions from a mean of 162 African (standard deviation 47, interquartile range 127–192) and 32 European ancestors (standard deviation 14, interquartile range 21–43). The enumeration of genetic ancestors can potentially be performed in other diploid species in which admixture and recombination models can be specified.
more »
« less
Recombination-aware phylogeographic inference using the structured coalescent with ancestral recombination
Movement of individuals between populations or demes is often restricted, especially between geographically isolated populations. The structured coalescent provides an elegant theoretical framework for describing how movement between populations shapes the genealogical history of sampled individuals and thereby structures genetic variation within and between populations. However, in the presence of recombination an individual may inherit different regions of their genome from different parents, resulting in a mosaic of genealogical histories across the genome, which can be represented by an Ancestral Recombination Graph (ARG). In this case, different genomic regions may have different ancestral histories and so different histories of movement between populations. Recombination therefore poses an additional challenge to phylogeographic methods that aim to reconstruct the movement of individuals from genealogies, although also a potential benefit in that different loci may contain additional information about movement. Here, we introduce the Structured Coalescent with Ancestral Recombination (SCAR) model, which builds on recent approximations to the structured coalescent by incorporating recombination into the ancestry of sampled individuals. The SCAR model allows us to infer how the migration history of sampled individuals varies across the genome from ARGs, and improves estimation of key population genetic parameters such as population sizes, recombination rates and migration rates. Using the SCAR model, we explore the potential and limitations of phylogeographic inference using full ARGs. We then apply the SCAR to lineages of the recombining fungus Aspergillus flavus sampled across the United States to explore patterns of recombination and migration across the genome.
more »
« less
- Award ID(s):
- 2031955
- PAR ID:
- 10385655
- Editor(s):
- Schiffels, Stephan
- Date Published:
- Journal Name:
- PLOS Computational Biology
- Volume:
- 18
- Issue:
- 8
- ISSN:
- 1553-7358
- Page Range / eLocation ID:
- e1010422
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Sil, Anita (Ed.)Aspergillus fumigatus is a deadly agent of human fungal disease where virulence heterogeneity is thought to be at least partially structured by genetic variation between strains. While population genomic analyses based on reference genome alignments offer valuable insights into how gene variants are distributed across populations, these approaches fail to capture intraspecific variation in genes absent from the reference genome. Pan-genomic analyses based on de novo assemblies offer a promising alternative to reference-based genomics with the potential to address the full genetic repertoire of a species. Here, we evaluate 260 genome sequences of A . fumigatus including 62 newly sequenced strains, using a combination of population genomics, phylogenomics, and pan-genomics. Our results offer a high-resolution assessment of population structure and recombination frequency, phylogenetically structured gene presence–absence variation, evidence for metabolic specificity, and the distribution of putative antifungal resistance genes. Although A . fumigatus disperses primarily via asexual conidia, we identified extraordinarily high levels of recombination with the lowest linkage disequilibrium decay value reported for any fungal species to date. We provide evidence for 3 primary populations of A . fumigatus , with recombination occurring only rarely between populations and often within them. These 3 populations are structured by both gene variation and distinct patterns of gene presence–absence with unique suites of accessory genes present exclusively in each clade. Accessory genes displayed functional enrichment for nitrogen and carbohydrate metabolism suggesting that populations may be stratified by environmental niche specialization. Similarly, the distribution of antifungal resistance genes and resistance alleles were often structured by phylogeny. Altogether, the pan-genome of A . fumigatus represents one of the largest fungal pan-genomes reported to date including many genes unrepresented in the Af293 reference genome. These results highlight the inadequacy of relying on a single-reference genome-based approach for evaluating intraspecific variation and the power of combined genomic approaches to elucidate population structure, genetic diversity, and putative ecological drivers of clinically relevant fungi.more » « less
-
Abstract Can knowledge about genome architecture inform biogeographic and phylogenetic inference? Selection, drift, recombination, and gene flow interact to produce a genomic landscape of divergence wherein patterns of differentiation and genealogy vary nonrandomly across the genomes of diverging populations. For instance, genealogical patterns that arise due to gene flow should be more likely to occur on smaller chromosomes, which experience high recombination, whereas those tracking histories of geographic isolation (reduced gene flow caused by a barrier) and divergence should be more likely to occur on larger and sex chromosomes. In Amazonia, populations of many bird species diverge and introgress across rivers, resulting in reticulated genomic signals. Herein, we used reduced representation genomic data to disentangle the evolutionary history of 4 populations of an Amazonian antbird, Thamnophilus aethiops, whose biogeographic history was associated with the dynamic evolution of the Madeira River Basin. Specifically, we evaluate whether a large river capture event ca. 200 Ka, gave rise to reticulated genealogies in the genome by making spatially explicit predictions about isolation and gene flow based on knowledge about genomic processes. We first estimated chromosome-level phylogenies and recovered 2 primary topologies across the genome. The first topology (T1) was most consistent with predictions about population divergence and was recovered for the Z-chromosome. The second (T2), was consistent with predictions about gene flow upon secondary contact. To evaluate support for these topologies, we trained a convolutional neural network to classify our data into alternative diversification models and estimate demographic parameters. The best-fit model was concordant with T1 and included gene flow between non-sister taxa. Finally, we modeled levels of divergence and introgression as functions of chromosome length and found that smaller chromosomes experienced higher gene flow. Given that (1) genetrees supporting T2 were more likely to occur on smaller chromosomes and (2) we found lower levels of introgression on larger chromosomes (and especially the Z-chromosome), we argue that T1 represents the history of population divergence across rivers and T2 the history of secondary contact due to barrier loss. Our results suggest that a significant portion of genomic heterogeneity arises due to extrinsic biogeographic processes such as river capture interacting with intrinsic processes associated with genome architecture. Future phylogeographic studies would benefit from accounting for genomic processes, as different parts of the genome reveal contrasting, albeit complementary histories, all of which are relevant for disentangling the intricate geogenomic mechanisms of biotic diversification. [Amazonia; biogeography; demographic modeling; gene flow; gene tree; genome architecture; geogenomics; introgression; linked selection; neural network; phylogenomic; phylogeography; reproductive isolation; speciation; species tree.]more » « less
-
Abstract Inferences from population genomic data provide valuable insights into the demographic history of a population. Likewise, in genomic epidemiology, pathogen genomic data provide key insights into epidemic dynamics and potential sources of transmission. Yet, predicting what information will be gained from genomic data about variables of interest and how different sampling strategies will impact the quality of downstream inferences remains challenging. As a result, population genomics and related fields such as phylodynamics and phylogeography largely lack theory to guide decisions on how best to sample individuals for genomic sequencing. By adopting a sequential decision making framework based on Markov decision processes, we model how sampling interacts with a population’s demographic history to shape the ancestral or genealogical relationships of sampled individuals. By probabilistically considering these ancestral relationships, we can use Markov decision processes to predict the expected value of sampling in terms of information gained about estimated variables. This in turn allows us to very efficiently explore and identify optimal sampling strategies even when the informational value of sampling depends on past or future sampling events. To illustrate our framework, we develop Markov decision processes for three common demographic and epidemiological inference problems: estimating population growth rates, minimizing the transmission distance between sampled individuals and estimating migration rates between subpopulations. In each case, the Markov decision process allows us to identify optimal sampling strategies that maximize the information gained from genomic data while minimizing the associated costs of sampling.more » « less
-
Abstract We consider a simple diploid population-genetic model with potentially high variability of offspring numbers among individuals. Specifically, against a backdrop of Wright–Fisher reproduction and no selection, there is an additional probability that a big family occurs, meaning that a pair of individuals has a number of offspring on the order of the population size. We study how the pedigree of the population generated under this model affects the ancestral genetic process of a sample of size two at a single autosomal locus without recombination. Our population model is of the type for which multiple-merger coalescent processes have been described. We prove that the conditional distribution of the pairwise coalescence time given the random pedigree converges to a limit law as the population size tends to infinity. This limit law may or may not be the usual exponential distribution of the Kingman coalescent, depending on the frequency of big families. But because it includes the number and times of big families, it differs from the usual multiple-merger coalescent models. The usual multiple-merger coalescent models are seen as describing the ancestral process marginal to, or averaging over, the pedigree. In the limiting ancestral process conditional on the pedigree, the intervals between big families can be modeled using the Kingman coalescent but each big family causes a discrete jump in the probability of coalescence. Analogous results should hold for larger samples and other population models. We illustrate these results with simulations and additional analysis, highlighting their implications for inference and understanding of multilocus data.more » « less
An official website of the United States government

