skip to main content


Title: Recombination-aware phylogeographic inference using the structured coalescent with ancestral recombination
Movement of individuals between populations or demes is often restricted, especially between geographically isolated populations. The structured coalescent provides an elegant theoretical framework for describing how movement between populations shapes the genealogical history of sampled individuals and thereby structures genetic variation within and between populations. However, in the presence of recombination an individual may inherit different regions of their genome from different parents, resulting in a mosaic of genealogical histories across the genome, which can be represented by an Ancestral Recombination Graph (ARG). In this case, different genomic regions may have different ancestral histories and so different histories of movement between populations. Recombination therefore poses an additional challenge to phylogeographic methods that aim to reconstruct the movement of individuals from genealogies, although also a potential benefit in that different loci may contain additional information about movement. Here, we introduce the Structured Coalescent with Ancestral Recombination (SCAR) model, which builds on recent approximations to the structured coalescent by incorporating recombination into the ancestry of sampled individuals. The SCAR model allows us to infer how the migration history of sampled individuals varies across the genome from ARGs, and improves estimation of key population genetic parameters such as population sizes, recombination rates and migration rates. Using the SCAR model, we explore the potential and limitations of phylogeographic inference using full ARGs. We then apply the SCAR to lineages of the recombining fungus Aspergillus flavus sampled across the United States to explore patterns of recombination and migration across the genome.  more » « less
Award ID(s):
2031955
NSF-PAR ID:
10385655
Author(s) / Creator(s):
; ;
Editor(s):
Schiffels, Stephan
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
18
Issue:
8
ISSN:
1553-7358
Page Range / eLocation ID:
e1010422
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Aim

    Natural selection typically results in the homogenization of reproductive traits, reducing natural variation within populations; thus, highly polymorphic species present unresolved questions regarding the mechanisms that shape and maintain gene flow given a diversity of phenotypes. We used an integrative framework to characterize phenotypic diversity and assess how evolutionary history and population genetics affect the highly polymorphic nature of a California endemic lily.

    Location

    California, United States.

    Taxon

    Butterfly mariposa lily,Calochortus venustus(Liliaceae).

    Methods

    We summarized phenotypic diversity at both metapopulation and subpopulation scales to explore spatial phenotypic distributions. We sampled 174 individuals across the species range representing multiple samples for each population and each phenotype. We used restriction‐site‐associated DNA sequencing (RAD‐Seq) to detect population clusters, gene flow between phenotypes and between populations, infer haplotype networks, and reconstruct ancestral range evolution to infer historical migration and range expansion.

    Results

    Polymorphic floral traits within the species such as petal pigmentation and distal spots are geographically structured, and inferred evolutionary history is consistent with a ring species pattern involving a complex of populations having experienced sequential change in genetic and phenotypic variation from the founding population. Populations remain interconnected yet have differentiated from each other along a bifurcating south‐to‐north range expansion, consequently indicating parallel evolution towards the white morphotype in the northern range. Thus, our phylogeographical analyses reveal morphological convergence with population genetic cohesion irrespective of phenotypic diversity.

    Main conclusions

    Phenotypic variation in the highly polymorphicCalochortus venustusis not due to genetic differentiation between phenotypes; rather there is genetic cohesion within six geographically defined populations, some of which maintain a high level of within‐population phenotypic diversity. Our results demonstrate that analyses of polymorphic taxa greatly benefit from disentangling phenotype from genotype at various spatial scales. We discuss results in light of ring species concepts and the need to determine the adaptive significance of the patterns we report.

     
    more » « less
  2. Sil, Anita (Ed.)
    Aspergillus fumigatus is a deadly agent of human fungal disease where virulence heterogeneity is thought to be at least partially structured by genetic variation between strains. While population genomic analyses based on reference genome alignments offer valuable insights into how gene variants are distributed across populations, these approaches fail to capture intraspecific variation in genes absent from the reference genome. Pan-genomic analyses based on de novo assemblies offer a promising alternative to reference-based genomics with the potential to address the full genetic repertoire of a species. Here, we evaluate 260 genome sequences of A . fumigatus including 62 newly sequenced strains, using a combination of population genomics, phylogenomics, and pan-genomics. Our results offer a high-resolution assessment of population structure and recombination frequency, phylogenetically structured gene presence–absence variation, evidence for metabolic specificity, and the distribution of putative antifungal resistance genes. Although A . fumigatus disperses primarily via asexual conidia, we identified extraordinarily high levels of recombination with the lowest linkage disequilibrium decay value reported for any fungal species to date. We provide evidence for 3 primary populations of A . fumigatus , with recombination occurring only rarely between populations and often within them. These 3 populations are structured by both gene variation and distinct patterns of gene presence–absence with unique suites of accessory genes present exclusively in each clade. Accessory genes displayed functional enrichment for nitrogen and carbohydrate metabolism suggesting that populations may be stratified by environmental niche specialization. Similarly, the distribution of antifungal resistance genes and resistance alleles were often structured by phylogeny. Altogether, the pan-genome of A . fumigatus represents one of the largest fungal pan-genomes reported to date including many genes unrepresented in the Af293 reference genome. These results highlight the inadequacy of relying on a single-reference genome-based approach for evaluating intraspecific variation and the power of combined genomic approaches to elucidate population structure, genetic diversity, and putative ecological drivers of clinically relevant fungi. 
    more » « less
  3. Abstract Aim

    We used genome‐scale sampling to assess the phylogeography of a group of topminnows in theFundulus notatusspecies complex. Two of the species have undergone extensive range expansions resulting in broadly overlapping distributions, and sympatry within drainages has provided opportunities for hybridization and introgression. We assessed the timing and pattern of range expansion in the context of late Pleistocene–Holocene drainage events and evaluated the evidence for introgressive hybridization between species.

    Location

    Central and southern United States including drainages of the Gulf of Mexico Coastal Plain and portions of the Mississippi River drainage in and around the Central Highlands.

    Taxon

    Topminnows, GenusFundulus, subgenusZygonectesFundulus notatus, Fundulus olivaceus, Fundulus euryzonus.

    Methods

    We sampled members of theF. notatusspecies complex throughout their respective ranges, including numerous drainage systems where species co‐occur. We collected genome‐wide single nucleotide polymorphisms (SNPs) using the genotype‐by‐sequencing (GBS) method and subjected data to population genetic analyses to infer the population histories of both species, including explicit tests for admixture and introgression. The methods employed includedSTRUCTURE, principal coordinates analysis, TreeMix and approximate Bayesian computation.

    Results

    Genetic data are presented for 749 individuals sampled from 14F. notatus, 20F. olivaceusand 2F. euryzonuspopulations. Members of the species complex differed in phylogeographic structure, withF. notatusexhibiting geographic clusters corresponding to Pleistocene coastal drainages andF. olivaceuscomparatively lacking in phylogeographic structure. Evidence for interspecific introgression varied by drainage.

    Main conclusions

    Populations ofF. notatusandF. olivaceusexhibited contrasting patterns of lineage diversity among coastal drainages, indicating interspecific differences in their Pleistocene southern refugia. Phylogeographic patterns in both species indicated that range expansions into the northern limits of contemporary distributions coincided with and continued subsequent to the Last Glacial Maximum. There was evidence of introgression between species in some, but not all drainages where the species co‐occur, in a pattern that is correlated with previous estimates of hybridization rates.

     
    more » « less
  4. Abstract Aim

    To investigate the structure and rate of gene flow among populations of habitat‐specialized species to understand the ecological and evolutionary processes underpinning their population dynamics and historical demography, including speciation and extinction.

    Location

    Peruvian and Argentine Andes.

    Taxon

    Two subspecies of torrent duck (Merganetta armata).

    Methods

    We sampled 156 individuals in Peru (M. a. leucogenis;Chillón River,n = 57 and Pachachaca River,n = 49) and Argentina (M. a. armata;Arroyo Grande River,n = 33 and Malargüe River,n = 17), and sequenced the mitochondrial DNA (mtDNA) control region to conduct coarse and fine‐scale demographic analyses of population structure. Additionally, to test for differences between subspecies, and across genetic markers with distinct inheritance patterns, a subset of individuals (Peru,n = 10 and Argentina,n = 9) was subjected to partial genome resequencing, obtaining 4,027 autosomal and 189 Z‐linked double‐digest restriction‐associated DNA sequences.

    Results

    Haplotype and nucleotide diversities were higher in Peru than Argentina across all markers. Peruvian and Argentine subspecies showed concordant species‐level differences (ΦSTmtDNA = 0.82; ΦSTautosomal = 0.30; ΦSTZ chromosome = 0.45), including no shared mtDNA haplotypes. Demographic parameters estimated for mtDNA using IM and IMa2 analyses, and for autosomal markers using∂a∂i(isolation‐with‐migration model), supported an old divergence (mtDNA = 600,000 years before present (ybp), 95% HPD range = 1.2 Mya to 200,000 ybp; and autosomal∂a∂i = 782,490 ybp), between the two subspecies, characteristic of deeply diverged lineages. The populations were well‐differentiated in Argentina but moderately differentiated in Peru, with low unidirectional gene flow in each country.

    Main conclusions

    We suggest that the South American Arid Diagonal was preexisting and remains a current phylogeographic barrier between the ranges of the two torrent duck subspecies, and the adult territoriality and breeding site fidelity to the rivers define their population structure.

     
    more » « less
  5. Abstract

    In a genetically admixed population, admixed individuals possess genealogical and genetic ancestry from multiple source groups. Under a mechanistic model of admixture, we study the number of distinct ancestors from the source populations that the admixture represents. Combining a mechanistic admixture model with a recombination model that describes the probability that a genealogical ancestor is a genetic ancestor, for a member of a genetically admixed population, we count genetic ancestors from the source populations—those genealogical ancestors from the source populations who contribute to the genome of the modern admixed individual. We compare patterns in the numbers of genealogical and genetic ancestors across the generations. To illustrate the enumeration of genetic ancestors from source populations in an admixed group, we apply the model to the African-American population, extending recent results on the numbers of African and European genealogical ancestors that contribute to the pedigree of an African-American chosen at random, so that we also evaluate the numbers of African and European genetic ancestors who contribute to random African-American genomes. The model suggests that the autosomal genome of a random African-American born in the interval 1960–1965 contains genetic contributions from a mean of 162 African (standard deviation 47, interquartile range 127–192) and 32 European ancestors (standard deviation 14, interquartile range 21–43). The enumeration of genetic ancestors can potentially be performed in other diploid species in which admixture and recombination models can be specified.

     
    more » « less