Ecologists use classifications of individuals in categories to understand composition of populations and communities. These categories might be defined by demographics, functional traits, or species. Assignment of categories is often imperfect, but frequently treated as observations without error. When individuals are observed but not classified, these “partial” observations must be modified to include the missing data mechanism to avoid spurious inference. We developed two hierarchical Bayesian models to overcome the assumption of perfect assignment to mutually exclusive categories in the multinomial distribution of categorical counts, when classifications are missing. These models incorporate auxiliary information to adjust the posterior distributions of the proportions of membership in categories. In one model, we use an empirical Bayes approach, where a subset of data from one year serves as a prior for the missing data the next. In the other approach, we use a small random sample of data within a year to inform the distribution of the missing data. We performed a simulation to show the bias that occurs when partial observations were ignored and demonstrated the altered inference for the estimation of demographic ratios. We applied our models to demographic classifications of elk ( We developed multiple modeling approaches using a generalizable nested multinomial structure to account for partially observed data that were missing not at random for classification counts. Accounting for classification uncertainty is important to accurately understand the composition of populations and communities in ecological studies.
- Publication Date:
- NSF-PAR ID:
- 10206411
- Journal Name:
- 2020 IEEE International Conference on Robotics and Automation (ICRA)
- Page Range or eLocation-ID:
- 2924 to 2931
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract Cervus elaphus nelsoni ) to demonstrate improved inference for the proportionsmore » -
Abstract High-dimensional categorical data are routinely collected in biomedical and social sciences. It is of great importance to build interpretable parsimonious models that perform dimension reduction and uncover meaningful latent structures from such discrete data. Identifiability is a fundamental requirement for valid modeling and inference in such scenarios, yet is challenging to address when there are complex latent structures. In this article, we propose a class of identifiable multilayer (potentially deep) discrete latent structure models for discrete data, termed Bayesian Pyramids. We establish the identifiability of Bayesian Pyramids by developing novel transparent conditions on the pyramid-shaped deep latent directed graph. The proposed identifiability conditions can ensure Bayesian posterior consistency under suitable priors. As an illustration, we consider the two-latent-layer model and propose a Bayesian shrinkage estimation approach. Simulation results for this model corroborate the identifiability and estimatability of model parameters. Applications of the methodology to DNA nucleotide sequence data uncover useful discrete latent features that are highly predictive of sequence types. The proposed framework provides a recipe for interpretable unsupervised learning of discrete data and can be a useful alternative to popular machine learning methods.
-
Abstract An Arctic sea ice‐ocean model is run with three uniform horizontal resolutions (6, 4, and 2 km) and identical sea ice and ocean model parameterizations, including an isotropic viscous‐plastic sea ice rheology, a mechanical ice strength parameterization, and an ice ridging parameterization. Driven by the same atmospheric forcing, the three model versions all produce similar spatial patterns and temporal variations of ice thickness and motion fields, resulting in almost identical magnitude and seasonal evolution of total ice volume and mean ice concentration, ice speed, and fractions of ice of various thickness categories over the Arctic Ocean. Increasing model resolution from 6 to 2 km does not significantly improve model performance when compared to NASA IceBridge ice thickness observations. This suggests that the large‐scale sea ice properties of the model are insensitive to varying high resolutions within the multifloe scale (2–10 km), and it may be unnecessary to adjust model parameters constantly with increasingly high resolutions. This is also true with models within the aggregate scale (10–75 km), indicating that model parameters used at coarse resolution may be used at high or multiscale resolution. However, even though the three versions all yield similar mean state of sea ice, they differ in representing anisotropic propertiesmore »
-
Abstract Multivariate spatially oriented data sets are prevalent in the environmental and physical sciences. Scientists seek to jointly model multiple variables, each indexed by a spatial location, to capture any underlying spatial association for each variable and associations among the different dependent variables. Multivariate latent spatial process models have proved effective in driving statistical inference and rendering better predictive inference at arbitrary locations for the spatial process. High‐dimensional multivariate spatial data, which are the theme of this article, refer to data sets where the number of spatial locations and the number of spatially dependent variables is very large. The field has witnessed substantial developments in scalable models for univariate spatial processes, but such methods for multivariate spatial processes, especially when the number of outcomes are moderately large, are limited in comparison. Here, we extend scalable modeling strategies for a single process to multivariate processes. We pursue Bayesian inference, which is attractive for full uncertainty quantification of the latent spatial process. Our approach exploits distribution theory for the matrix‐normal distribution, which we use to construct scalable versions of a hierarchical linear model of coregionalization (LMC) and spatial factor models that deliver inference over a high‐dimensional parameter space including the latent spatialmore »
-
Abstract Many modern sea ice models used in global climate models represent the subgrid‐scale heterogeneity in sea ice thickness with an ice thickness distribution (ITD), which improves model realism by representing the significant impact of the high spatial heterogeneity of sea ice thickness on thermodynamic and dynamic processes. Most models default to five thickness categories. However, little has been done to explore the effects of the resolution of this distribution (number of categories) on sea‐ice feedbacks in a coupled model framework and resulting representation of the sea ice mean state. Here, we explore this using sensitivity experiments in CESM2 with the standard 5 ice thickness categories and 15 ice thickness categories. Increasing the resolution of the ITD in a run with preindustrial climate forcing results in substantially thicker Arctic sea ice year‐round. Analyses show that this is a result of the ITD influence on ice strength. With 15 ITD categories, weaker ice occurs for the same average thickness, resulting in a higher fraction of ridged sea ice. In contrast, the higher resolution of thin ice categories results in enhanced heat conduction and bottom growth and leads to only somewhat increased winter Antarctic sea ice volume. The spatial resolution of themore »