Abstract In many modern applications, discretely-observed data may be naturally understood as a set of functions. Functional data often exhibit two confounded sources of variability: amplitude (y-axis) and phase (x-axis). The extraction of amplitude and phase, a process known as registration, is essential in exploring the underlying structure of functional data in a variety of areas, from environmental monitoring to medical imaging. Critically, such data are often gathered sequentially with new functional observations arriving over time. Despite this, existing registration procedures do not sequentially update inference based on the new data, requiring model refitting. To address these challenges, we introduce a Bayesian framework for sequential registration of functional data, which updates statistical inference as new sets of functions are assimilated. This Bayesian model-based sequential learning approach utilizes sequential Monte Carlo sampling to recursively update the alignment of observed functions while accounting for associated uncertainty. Distributed computing significantly reduces computational cost relative to refitting the model using an iterative method such as Markov chain Monte Carlo on the full data. Simulation studies and comparisons reveal that the proposed approach performs well even when the target posterior distribution has a challenging structure. We apply the proposed method to three real datasets: (1) functions of annual drought intensity near Kaweah River in California, (2) annual sea surface salinity functions near Null Island, and (3) a sequence of repeated patterns in electrocardiogram signals.
more »
« less
A Bayesian Functional Data Model for Surveys Collected under Informative Sampling with Application to Mortality Estimation Using NHANES
Abstract Functional data are often extremely high-dimensional and exhibit strong dependence structures but can often prove valuable for both prediction and inference. The literature on functional data analysis is well developed; however, there has been very little work involving functional data in complex survey settings. Motivated by physical activity monitor data from the National Health and Nutrition Examination Survey (NHANES), we develop a Bayesian model for functional covariates that can properly account for the survey design. Our approach is intended for non-Gaussian data and can be applied in multivariate settings. In addition, we make use of a variety of Bayesian modeling techniques to ensure that the model is fit in a computationally efficient manner. We illustrate the value of our approach through two simulation studies as well as an example of mortality estimation using NHANES data.
more »
« less
- Award ID(s):
- 1853096
- PAR ID:
- 10485789
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Biometrics
- Volume:
- 79
- Issue:
- 2
- ISSN:
- 0006-341X
- Format(s):
- Medium: X Size: p. 1397-1408
- Size(s):
- p. 1397-1408
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Functional data with non-smooth features (e.g., discontinuities in the functional mean and/or covariance) and monotonicity arise frequently in practice. This paper develops simultaneous inference for concurrent functional linear regression in this setting. We construct a simultaneous confidence band for a functional covariate effect of interest. Along with a Wald-type formulation, our approach is based on a powerful nonparametric likelihood ratio method. Our procedures are flexible enough to allow discontinuities in the coefficient functions and the covariance structure, while accounting for discretization of the observed trajectories under a fixed dense design. A simulation study shows that the proposed likelihood ratio-based procedure outperforms the Wald-type procedure in moderate sample sizes. We apply the proposed methods to studying the effect of age on the occupation time curve derived from wearable device data obtained in an NHANES study.more » « less
-
Abstract Small area estimation (SAE) has become an important tool in official statistics, used to construct estimates of population quantities for domains with small sample sizes. Typical area-level models function as a type of heteroscedastic regression, where the variance for each domain is assumed to be known and plugged in following a design-based estimate. Recent work has considered hierarchical models for the variance, where the design-based estimates are used as an additional data point to model the latent true variance in each domain. These hierarchical models may incorporate covariate information but can be difficult to sample from in high-dimensional settings. Utilizing recent distribution theory, we explore a class of Bayesian hierarchical models for SAE that smooth both the design-based estimate of the mean and the variance. In addition, we develop a class of unit-level models for heteroscedastic Gaussian response data. Importantly, we incorporate both covariate information as well as spatial dependence, while retaining a conjugate model structure that allows for efficient sampling. We illustrate our methodology through an empirical simulation study as well as an application using data from the American Community Survey.more » « less
-
Abstract Causal inference practitioners have increasingly adopted machine learning techniques with the aim of producing principled uncertainty quantification for causal effects while minimizing the risk of model misspecification. Bayesian nonparametric approaches have attracted attention as well, both for their flexibility and their promise of providing natural uncertainty quantification. Priors on high‐dimensional or nonparametric spaces, however, can often unintentionally encode prior information that is at odds with substantive knowledge in causal inference—specifically, the regularization required for high‐dimensional Bayesian models to work can indirectly imply that the magnitude of the confounding is negligible. In this paper, we explain this problem and provide tools for (i) verifying that the prior distribution does not encode an inductive bias away from confounded models and (ii) verifying that the posterior distribution contains sufficient information to overcome this issue if it exists. We provide a proof‐of‐concept on simulated data from a high‐dimensional probit‐ridge regression model, and illustrate on a Bayesian nonparametric decision tree ensemble applied to a large medical expenditure survey.more » « less
-
Abstract This paper demonstrates the advantages of sharing information about unknown features of covariates across multiple model components in various nonparametric regression problems including multivariate, heteroscedastic, and semicontinuous responses. In this paper, we present a methodology which allows for information to be shared nonparametrically across various model components using Bayesian sum‐of‐tree models. Our simulation results demonstrate that sharing of information across related model components is often very beneficial, particularly in sparse high‐dimensional problems in which variable selection must be conducted. We illustrate our methodology by analyzing medical expenditure data from the Medical Expenditure Panel Survey (MEPS). To facilitate the Bayesian nonparametric regression analysis, we develop two novel models for analyzing the MEPS data using Bayesian additive regression trees—a heteroskedastic log‐normal hurdle model with a “shrink‐toward‐homoskedasticity” prior and a gamma hurdle model.more » « less
An official website of the United States government
