skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Full Bayesian Comparative Phylogeography from Genomic Data
A challenge to understanding biological diversification is accounting for community-scale processes that cause multiple, co-distributed lineages to co-speciate. Such processes predict non-independent, temporally clustered divergences across taxa. Approximate-likelihood Bayesian computation (ABC) approaches to inferring such patterns from comparative genetic data are very sensitive to prior assumptions and often biased toward estimating shared divergences. We introduce a full-likelihood Bayesian approach, ecoevolity, which takes full advantage of information in genomic data. By analytically integrating over gene trees, we are able to directly calculate the likelihood of the population history from genomic data, and efficiently sample the model-averaged posterior via Markov chain Monte Carlo algorithms. Using simulations, we find that the new method is much more accurate and precise at estimating the number and timing of divergence events across pairs of populations than existing approximate-likelihood methods. Our full Bayesian approach also requires several orders of magnitude less computational time than existing ABC approaches. We find that despite assuming unlinked characters (e.g., unlinked single-nucleotide polymorphisms), the new method performs better if this assumption is violated in order to retain the constant characters of whole linked loci. In fact, retaining constant characters allows the new method to robustly estimate the correct number of divergence events with high posterior probability in the face of character-acquisition biases, which commonly plague loci assembled from reduced-representation genomic libraries. We apply our method to genomic data from four pairs of insular populations of Gekko lizards from the Philippines that are not expected to have co-diverged. Despite all four pairs diverging very recently, our method strongly supports that they diverged independently, and these results are robust to very disparate prior assumptions.  more » « less
Award ID(s):
1656004
PAR ID:
10092832
Author(s) / Creator(s):
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Systematic Biology
Volume:
68
Issue:
3
ISSN:
1063-5157
Page Range / eLocation ID:
371 to 395
Subject(s) / Keyword(s):
Bayesian model choice biogeography Dirichlet process prior phylogeography
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT Approximate Bayesian computation (ABC) methods are applicable to statistical models specified by generative processes with analytically intractable likelihoods. These methods try to approximate the posterior density of a model parameter by comparing the observed data with additional process‐generated simulated data sets. For computational benefit, only the values of certain well‐chosen summary statistics are usually compared, instead of the whole data set. Most ABC procedures are computationally expensive, justified only heuristically, and have poor asymptotic properties. In this article, we introduce a new empirical likelihood‐based approach to the ABC paradigm called ABCel. The proposed procedure is computationally tractable and approximates the target log posterior of the parameter as a sum of two functions of the data—namely, the mean of the optimal log‐empirical likelihood weights and the estimated differential entropy of the summary functions. We rigorously justify the procedure via direct and reverse information projections onto appropriate classes of probability densities. Past applications of empirical likelihood in ABC demanded constraints based on analytically tractable estimating functions that involve both the data and the parameter; although by the nature of the ABC problem such functions may not be available in general. In contrast, we use constraints that are functions of the summary statistics only. Equally importantly, we show that our construction directly connects to the reverse information projection and estimate the relevant differential entropy by a k‐NN estimator. We show that ABCel is posterior consistent and has highly favorable asymptotic properties. Its construction justifies the use of simple summary statistics like moments, quantiles, and so forth, which in practice produce accurate approximation of the posterior density. We illustrate the performance of the proposed procedure in a range of applications. 
    more » « less
  2. Many processes of biological diversification can simultaneously affect multiple evolutionary lineages. Examples include multiple members of a gene family diverging when a region of a chromosome is duplicated, multiple viral strains diverging at a “super-spreading” event, and a geological event fragmenting whole communities of species. It is difficult to test for patterns of shared divergences predicted by such processes because all phylogenetic methods assume that lineages diverge independently. We introduce a Bayesian phylogenetic approach to relax the assumption of independent, bifurcating divergences by expanding the space of topologies to include trees with shared and multifurcating divergences. This allows us to jointly infer phylogenetic relationships, divergence times, and patterns of divergences predicted by processes of diversification that affect multiple evolutionary lineages simultaneously or lead to more than two descendant lineages. Using simulations, we find that the method accurately infers shared and multifurcating divergence events when they occur and performs as well as current phylogenetic methods when divergences are independent and bifurcating. We apply our approach to genomic data from two genera of geckos from across the Philippines to test if past changes to the islands’ landscape caused bursts of speciation. Unlike previous analyses restricted to only pairs of gecko populations, we find evidence for patterns of shared divergences. By generalizing the space of phylogenetic trees in a way that is independent from the likelihood model, our approach opens many avenues for future research into processes of diversification across the life sciences. 
    more » « less
  3. The objective of this work is to provide a Bayesian re-interpretation to model falsification. We show that model falsification can be viewed as an approximate Bayesian computation (ABC) approach when hypotheses (models) are sampled from a prior. To achieve this, we recast model falsifiers as discrepancy metrics and density kernels such that they may be adopted within ABC and generalized ABC (GABC) methods. We call the resulting frameworks model falsified ABC and GABC, respectively. Moreover, as a result of our reinterpretation, the set of unfalsified models can be shown to be realizations of an approximate posterior. We consider both error and likelihood domain model falsification in our exposition. Model falsified (G)ABC is used to tackle two practical inverse problems albeit with synthetic measurements. The first type of problem concerns parameter estimation and includes applications of ABC to the inference of a statistical model where the likelihood can be difficult to compute, and the identification of a cubic-quintic dynamical system. The second type of example involves model selection for the base isolation system of a four degree-of-freedom base isolated structure. The performance of model falsified ABC and GABC are compared with Bayesian inference. The results show that model falsified (G)ABC can be used to solve inverse problems in a computationally efficient manner. The results are also used to compare the various falsifiers in their capability of approximating the posterior and some of its important statistics. Further, we show that model falsifier based density kernels can be used in kernel regression to infer unknown model parameters and compute structural responses under epistemic uncertainty. 
    more » « less
  4. ABSTRACT AimWe tested whether co‐distributed phrynosomatid lizards in the Baja California Peninsula (BCP) share synchronous phylogeographic discontinuities, as predicted by the “peninsular archipelago” hypothesis, and examined the diversification ofCallisaurus draconoidesthroughout its range. LocationThe BCP and the Great Basin, Mojave and Sonoran Deserts of southwestern North America. TaxaFive co‐distributed species complexes representing four genera within Phrynosomatidae:Callisaurus,Petrosaurus,UrosaurusandSceloporus. MethodsDouble‐digest restriction‐associated‐DNA (ddRAD) sequencing was used to collect genome‐wide sequence data for 309 lizards. We used phylogenetic analyses of concatenated loci and population admixture analysis of unlinked SNPs to identify lineages. To infer a species tree, we collected target sequence capture (TSC) data. Migration between adjacent peninsular lineages was estimated using the multispecies coalescent with migration (MSC‐M) in BPP. A full‐likelihood Bayesian comparative phylogeographic approach (ecoevolity) was used to test the simultaneous divergence hypothesis for the Isthmus of La Paz and Vizcaíno Desert. ResultsWe identified 24 potential lineages within the five co‐distributed complexes. Contact zones between lineages were observed at the Isthmus of La Paz in four of the five complexes, and in all five within the Vizcaíno Desert. The time‐calibrated species tree indicates that within each complex, divergences at the Isthmus of La Paz predate those across the Vizcaíno Desert. We found strong support for at least three independent divergence events at the Isthmus of La Paz and the Vizcaíno Desert, thereby rejecting the simultaneous divergence hypothesis. Inferred migration rates between adjacent peninsular populations were generally low (M << 1) to absent. Zebra‐tailed lizards (Callisaurus), in which the earliest diverging lineages are endemic to the southern BCP, exhibit a clear pattern of Pleistocene range expansion from the BCP into the deserts of the western United States and mainland Mexico. The most deeply nested populations inCallisaurusoccur at the northern, eastern and southeastern range limits in temperate, subtropical and tropical biomes, respectively. Main ConclusionsThese results support the BCP's tectonic isolation as a driver of peninsular endemism and a contributing factor to lineage diversification more broadly in the region. Taxonomic adjustments, including resurrectingUrosaurus microscutatus, are proposed to better reflect evolutionary history in taxonomy. 
    more » « less
  5. Abstract Genomic data continue to advance our understanding of species limits and biogeographic patterns. However, there is still no consensus regarding appropriate methods of phylogenomic analysis that make the best use of these heterogeneous data sets. In this study, we used thousands of ultraconserved element (UCE) loci from alligator lizards in the genus Gerrhonotus to compare and contrast species trees inferred using multiple contemporary methods and provide a time frame for biological diversification across the Mexican Transition Zone (MTZ). Concatenated maximum likelihood (ML) and Bayesian analyses provided highly congruent results, with differences limited to poorly supported nodes. Similar topologies were inferred from coalescent analyses in Bayesian Phylogenetics and Phylogeography and SVDquartets, albeit with lower support for some nodes. All divergence times fell within the Miocene, linking speciation to local Neogene vicariance and/or global cooling trends following the mid-Miocene Climatic Optimum. We detected a high level of genomic divergence for a morphologically distinct species restricted to the arid mountains of north-eastern Mexico, and erected a new genus to better reflect evolutionary history. In summary, our results further advocate leveraging the strengths and weaknesses of concatenation and coalescent methods, provide evidence for old divergences for alligator lizards, and indicate that the MTZ continues to harbour substantial unrecognized diversity. 
    more » « less