skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Practical guidelines for Bayesian phylogenetic inference using Markov Chain Monte Carlo (MCMC)
Phylogenetic estimation is, and has always been, a complex endeavor. Estimating a phylogenetic tree involves evaluating many possible solutions and possible evolutionary histories that could explain a set of observed data, typically by using a model of evolution. Modern statistical methods involve not just the estimation of a tree, but also solutions to more complex models involving fossil record information and other data sources. Markov Chain Monte Carlo (MCMC) is a leading method for approximating the posterior distribution of parameters in a mathematical model. It is deployed in all Bayesian phylogenetic tree estimation software. While many researchers use MCMC in phylogenetic analyses, interpreting results and diagnosing problems with MCMC remain vexing issues to many biologists. In this manuscript, we will offer an overview of how MCMC is used in Bayesian phylogenetic inference, with a particular emphasis on complex hierarchical models, such as the fossilized birth-death (FBD) model. We will discuss strategies to diagnose common MCMC problems and troubleshoot difficult analyses, in particular convergence issues. We will show how the study design, the choice of models and priors, but also technical features of the inference tools themselves can all be adjusted to obtain the best results. Finally, we will also discuss the unique challenges created by the incorporation of fossil information in phylogenetic inference, and present tips to address them.  more » « less
Award ID(s):
2045842
PAR ID:
10522089
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Open Research Europe
Date Published:
Journal Name:
Open research Europe
Edition / Version:
2
Volume:
3
ISSN:
2732-5121
Page Range / eLocation ID:
204
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Phylogenetic estimation is, and has always been, a complex endeavor. Estimating a phylogenetic tree involves evaluating many possible solutions and possible evolutionary histories that could explain a set of observed data, typically by using a model of evolution. Values for all model parameters need to be evaluated as well. Modern statistical methods involve not just the estimation of a tree, but also solutions to more complex models involving fossil record information and other data sources. Markov chain Monte Carlo (MCMC) is a leading method for approximating the posterior distribution of parameters in a mathematical model. It is deployed in all Bayesian phylogenetic tree estimation software. While many researchers use MCMC in phylogenetic analyses, interpreting results and diagnosing problems with MCMC remain vexing issues to many biologists. In this manuscript, we will offer an overview of how MCMC is used in Bayesian phylogenetic inference, with a particular emphasis on complex hierarchical models, such as the fossilized birth-death (FBD) model. We will discuss strategies to diagnose common MCMC problems and troubleshoot difficult analyses, in particular convergence issues. We will show how the study design, the choice of models and priors, but also technical features of the inference tools themselves can all be adjusted to obtain the best results. Finally, we will also discuss the unique challenges created by the incorporation of fossil information in phylogenetic inference, and present tips to address them. 
    more » « less
  2. One of the most common types of models that helps us to understand neuron behavior is based on the Hodgkin–Huxley ion channel formulation (HH model). A major challenge with inferring parameters in HH models is non-uniqueness: many different sets of ion channel parameter values produce similar outputs for the same input stimulus. Such phenomena result in an objective function that exhibits multiple modes (i.e., multiple local minima). This non-uniqueness of local optimality poses challenges for parameter estimation with many algorithmic optimization techniques. HH models additionally have severe non-linearities resulting in further challenges for inferring parameters in an algorithmic fashion. To address these challenges with a tractable method in high-dimensional parameter spaces, we propose using a particular Markov chain Monte Carlo (MCMC) algorithm, which has the advantage of inferring parameters in a Bayesian framework. The Bayesian approach is designed to be suitable for multimodal solutions to inverse problems. We introduce and demonstrate the method using a three-channel HH model. We then focus on the inference of nine parameters in an eight-channel HH model, which we analyze in detail. We explore how the MCMC algorithm can uncover complex relationships between inferred parameters using five injected current levels. The MCMC method provides as a result a nine-dimensional posterior distribution, which we analyze visually with solution maps or landscapes of the possible parameter sets. The visualized solution maps show new complex structures of the multimodal posteriors, and they allow for selection of locally and globally optimal value sets, and they visually expose parameter sensitivities and regions of higher model robustness. We envision these solution maps as enabling experimentalists to improve the design of future experiments, increase scientific productivity and improve on model structure and ideation when the MCMC algorithm is applied to experimental data. 
    more » « less
  3. Pupko, Tal (Ed.)
    Abstract Nearly all current Bayesian phylogenetic applications rely on Markov chain Monte Carlo (MCMC) methods to approximate the posterior distribution for trees and other parameters of the model. These approximations are only reliable if Markov chains adequately converge and sample from the joint posterior distribution. Although several studies of phylogenetic MCMC convergence exist, these have focused on simulated data sets or select empirical examples. Therefore, much that is considered common knowledge about MCMC in empirical systems derives from a relatively small family of analyses under ideal conditions. To address this, we present an overview of commonly applied phylogenetic MCMC diagnostics and an assessment of patterns of these diagnostics across more than 18,000 empirical analyses. Many analyses appeared to perform well and failures in convergence were most likely to be detected using the average standard deviation of split frequencies, a diagnostic that compares topologies among independent chains. Different diagnostics yielded different information about failed convergence, demonstrating that multiple diagnostics must be employed to reliably detect problems. The number of taxa and average branch lengths in analyses have clear impacts on MCMC performance, with more taxa and shorter branches leading to more difficult convergence. We show that the usage of models that include both Γ-distributed among-site rate variation and a proportion of invariable sites is not broadly problematic for MCMC convergence but is also unnecessary. Changes to heating and the usage of model-averaged substitution models can both offer improved convergence in some cases, but neither are a panacea. 
    more » « less
  4. Probabilistic programming languages (PPLs) provide language support for expressing flexible probabilistic models and solving Bayesian inference problems. PPLs withprogrammable inferencemake it possible for users to obtain improved results by customizing inference engines usingguideprograms that are tailored to a correspondingmodelprogram. However, errors in guide programs can compromise the statistical soundness of the inference. This article introduces a novel coroutine-based framework for verifying the correctness of user-written guide programs for a broad class of Markov chain Monte Carlo (MCMC) inference algorithms. Our approach rests on a novel type system for describing communication protocols between a model program and a sequence of guides that each update only a subset of random variables. We prove that, by translating guide types to context-free processes with finite norms, it is possible to check structural type equality between models and guides in polynomial time. This connection gives rise to an efficienttype-inference algorithmfor probabilistic programs with flexible constructs such as general recursion and branching. We also contribute acoverage-checking algorithmthat verifies the support of sequentially composed guide programs agrees with that of the model program, which is a key soundness condition for MCMC inference with multiple guides. Evaluations on diverse benchmarks show that our type-inference and coverage-checking algorithms efficiently infer types and detect sound and unsound guides for programs that existing static analyses cannot handle. 
    more » « less
  5. Pleurodonta is an ancient, diverse clade of iguanian lizard distributed primarily in the Western Hemisphere. Although the clade is a frequent subject of systematic research, phylogenetic resolution among the major pleurodontan clades is elusive. That uncertainty has complicated the interpretations of many fossil pleurodontans. I describe a fossil skull of a pleurodontan lizard from the Palaeogene of Wyoming that was previously allocated to the puzzling taxonAciprion formosum, and provide an updated morphological matrix for iguanian lizards. Phylogenetic analyses using Bayesian inference demonstrate that the fossil skull is the oldest and first definitive stem member of Crotaphytidae (collared and leopard lizards), establishing the presence of that clade in North America during the Palaeogene. I also discuss new or revised hypotheses for the relationships of several early pleurodontans. In particular, I examine potential evidence for crown-Pleurodonta in the Cretaceous of Mongolia (Polrussia), stem Pleurodonta in the Cretaceous of North America (Magnuviator) and a stem anole in the Eocene of North America (Afairiguana). I suggest that the placement of the fossil crotaphytid is stable to the uncertain phylogeny of Pleurodonta, but recognize the dynamic nature of fossil diagnosis and the potential for updated systematic hypotheses for the other fossils analysed here. 
    more » « less