Abstract Bayesian data analysis is increasingly used in ecology, but prior specification remains focused on choosing non‐informative priors (e.g., flat or vague priors). One barrier to choosing more informative priors is that priors must be specified on model parameters (e.g., intercepts, slopes, and sigmas), but prior knowledge often exists on the level of the response variable. This is particularly true for common models in ecology, like generalized linear mixed models that have a link function and potentially dozens of parameters, each of which needs a prior distribution. We suggest that this difficulty can be overcome by simulating from the prior predictive distribution and visualizing the results on the scale of the response variable. In doing so, some common choices for non‐informative priors on parameters can easily be seen to produce biologically impossible values of response variables. Such implications of prior choices are difficult to foresee without visualization. We demonstrate a workflow for prior selection using simulation and visualization with two ecological examples (predator–prey body sizes and spider responses to food competition). This approach is not new, but its adoption by ecologists will help to better incorporate prior information in ecological models, thereby maximizing one of the benefits of Bayesian data analysis.
more »
« less
Visualization in Bayesian Workflow
Abstract Bayesian data analysis is about more than just computing a posterior distribution, and Bayesian visualization is about more than trace plots of Markov chains. Practical Bayesian data analysis, like all data analysis, is an iterative process of model building, inference, model checking and evaluation, and model expansion. Visualization is helpful in each of these stages of the Bayesian workflow and it is indispensable when drawing inferences from the types of modern, high dimensional models that are used by applied researchers.
more »
« less
- Award ID(s):
- 1730414
- PAR ID:
- 10400672
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Journal of the Royal Statistical Society Series A: Statistics in Society
- Volume:
- 182
- Issue:
- 2
- ISSN:
- 0964-1998
- Format(s):
- Medium: X Size: p. 389-402
- Size(s):
- p. 389-402
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract In the current literature on latent variable models, much effort has been put on the development of dichotomous and polytomous cognitive diagnostic models (CDMs) for assessments. Recently, the possibility of using continuous responses in CDMs has been brought to discussion. But no Bayesian approach has been developed yet for the analysis of CDMs when responses are continuous. Our work is the first Bayesian framework for the continuous deterministic inputs, noisy, and gate (DINA) model. We also propose new interpretations for item parameters in this DINA model, which makes the analysis more interpretable than before. In addition, we have conducted several simulations to evaluate the performance of the continuous DINA model through our Bayesian approach. Then, we have applied the proposed DINA model to a real data example of risk perceptions for individuals over a range of health‐related activities. The application results exemplify the high potential of the use of the proposed continuous DINA model to classify individuals in the study.more » « less
-
Abstract This paper demonstrates the advantages of sharing information about unknown features of covariates across multiple model components in various nonparametric regression problems including multivariate, heteroscedastic, and semicontinuous responses. In this paper, we present a methodology which allows for information to be shared nonparametrically across various model components using Bayesian sum‐of‐tree models. Our simulation results demonstrate that sharing of information across related model components is often very beneficial, particularly in sparse high‐dimensional problems in which variable selection must be conducted. We illustrate our methodology by analyzing medical expenditure data from the Medical Expenditure Panel Survey (MEPS). To facilitate the Bayesian nonparametric regression analysis, we develop two novel models for analyzing the MEPS data using Bayesian additive regression trees—a heteroskedastic log‐normal hurdle model with a “shrink‐toward‐homoskedasticity” prior and a gamma hurdle model.more » « less
-
Abstract Understanding animal movement often relies upon telemetry and biologging devices. These data are frequently used to estimate latent behavioural states to help understand why animals move across the landscape. While there are a variety of methods that make behavioural inferences from biotelemetry data, some features of these methods (e.g. analysis of a single data stream, use of parametric distributions) may limit their generality to reliably discriminate among behavioural states.To address some of the limitations of existing behavioural state estimation models, we introduce a nonparametric Bayesian framework called the mixed‐membership method for movement (M4), which is available within the open‐sourcebayesmoveR package. This framework can analyse multiple data streams (e.g. step length, turning angle, acceleration) without relying on parametric distributions, which may capture complex behaviours more successfully than current methods. We tested our Bayesian framework using simulated trajectories and compared model performance against two segmentation methods (behavioural change point analysis (BCPA) and segclust2d), one machine learning method [expectation‐maximization binary clustering (EMbC)] and one type of state‐space model [hidden Markov model (HMM)]. We also illustrated this Bayesian framework using movements of juvenile snail kitesRostrhamus sociabilisin Florida, USA.The Bayesian framework estimated breakpoints more accurately than the other segmentation methods for tracks of different lengths. Likewise, the Bayesian framework provided more accurate estimates of behaviour than the other state estimation methods when simulations were generated from less frequently considered distributions (e.g. truncated normal, beta, uniform). Three behavioural states were estimated from snail kite movements, which were labelled as ‘encamped’, ‘area‐restricted search’ and ‘transit’. Changes in these behaviours over time were associated with known dispersal events from the nest site, as well as movements to and from possible breeding locations.Our nonparametric Bayesian framework estimated behavioural states with comparable or superior accuracy compared to the other methods when step lengths and turning angles of simulations were generated from less frequently considered distributions. Since the most appropriate parametric distributions may not be obvious a priori, methods (such as M4) that are agnostic to the underlying distributions can provide powerful alternatives to address questions in movement ecology.more » « less
-
Abstract To improve the predictability of complex computational models in the experimentally-unknown domains, we propose a Bayesian statistical machine learning framework utilizing the Dirichlet distribution that combines results of several imperfect models. This framework can be viewed as an extension of Bayesian stacking. To illustrate the method, we study the ability of Bayesian model averaging and mixing techniques to mine nuclear masses. We show that the global and local mixtures of models reach excellent performance on both prediction accuracy and uncertainty quantification and are preferable to classical Bayesian model averaging. Additionally, our statistical analysis indicates that improving model predictions through mixing rather than mixing of corrected models leads to more robust extrapolations.more » « less
An official website of the United States government
