NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Review of Data‐Driven Discovery for Dynamic Systems

https://doi.org/10.1111/insr.12554

North, Joshua S.; Wikle, Christopher K.; Schliep, Erin M. (September 2023, International Statistical Review)

Summary Many real‐world scientific processes are governed by complex non‐linear dynamic systems that can be represented by differential equations. Recently, there has been an increased interest in learning, or discovering, the forms of the equations driving these complex non‐linear dynamic systems using data‐driven approaches. In this paper, we review the current literature on data‐driven discovery for dynamic systems. We provide a categorisation to the different approaches for data‐driven discovery and a unified mathematical framework to show the relationship between the approaches. Importantly, we discuss the role of statistics in the data‐driven discovery field, describe a possible approach by which the problem can be cast in a statistical framework and provide avenues for future work.
more » « less
Conjugate Modeling Approaches for Small Area Estimation with Heteroscedastic Structure

https://doi.org/10.1093/jssam/smad002

Parker, Paul A; Holan, Scott H; Janicki, Ryan (February 2023, Journal of Survey Statistics and Methodology)

Abstract Small area estimation (SAE) has become an important tool in official statistics, used to construct estimates of population quantities for domains with small sample sizes. Typical area-level models function as a type of heteroscedastic regression, where the variance for each domain is assumed to be known and plugged in following a design-based estimate. Recent work has considered hierarchical models for the variance, where the design-based estimates are used as an additional data point to model the latent true variance in each domain. These hierarchical models may incorporate covariate information but can be difficult to sample from in high-dimensional settings. Utilizing recent distribution theory, we explore a class of Bayesian hierarchical models for SAE that smooth both the design-based estimate of the mean and the variance. In addition, we develop a class of unit-level models for heteroscedastic Gaussian response data. Importantly, we incorporate both covariate information as well as spatial dependence, while retaining a conjugate model structure that allows for efficient sampling. We illustrate our methodology through an empirical simulation study as well as an application using data from the American Community Survey.
more » « less
Full Text Available
REDS: Random ensemble deep spatial prediction

https://doi.org/10.1002/env.2780

Daw, Ranadeep; Wikle, Christopher_K (December 2022, Environmetrics)

Abstract There has been a great deal of recent interest in the development of spatial prediction algorithms for very large datasets and/or prediction domains. These methods have primarily been developed in the spatial statistics community, but there has been growing interest in the machine learning community for such methods, primarily driven by the success of deep Gaussian process regression approaches and deep convolutional neural networks. These methods are often computationally expensive to train and implement and consequently, there has been a resurgence of interest in random projections and deep learning models based on random weights—so called reservoir computing methods. Here, we combine several of these ideas to develop the random ensemble deep spatial (REDS) approach to predict spatial data. The procedure uses random Fourier features as inputs to an extreme learning machine (a deep neural model with random weights), and with calibrated ensembles of outputs from this model based on different random weights, it provides a simple uncertainty quantification. The REDS method is demonstrated on simulated data and on a classic large satellite data set.
more » « less
A Bayesian Functional Data Model for Surveys Collected under Informative Sampling with Application to Mortality Estimation Using NHANES

https://doi.org/10.1111/biom.13696

Parker, Paul A.; Holan, Scott H. (May 2022, Biometrics)

Abstract Functional data are often extremely high-dimensional and exhibit strong dependence structures but can often prove valuable for both prediction and inference. The literature on functional data analysis is well developed; however, there has been very little work involving functional data in complex survey settings. Motivated by physical activity monitor data from the National Health and Nutrition Examination Survey (NHANES), we develop a Bayesian model for functional covariates that can properly account for the survey design. Our approach is intended for non-Gaussian data and can be applied in multivariate settings. In addition, we make use of a variety of Bayesian modeling techniques to ensure that the model is fit in a computationally efficient manner. We illustrate the value of our approach through two simulation studies as well as an example of mortality estimation using NHANES data.
more » « less
Nonlinear time series classification using bispectrum‐based deep convolutional neural networks

https://doi.org/10.1002/asmb.2536

Parker, Paul_A; Holan, Scott_H; Ravishanker, Nalini (May 2020, Applied Stochastic Models in Business and Industry)

Abstract Time series classification using novel techniques has experienced a recent resurgence and growing interest from statisticians, subject‐domain scientists, and decision makers in business and industry. This is primarily due to the ever increasing amount of big and complex data produced as a result of technological advances. A motivating example is that of Google trends data, which exhibit highly nonlinear behavior. Although a rich literature exists for addressing this problem, existing approaches mostly rely on first‐ and second‐order properties of the time series, since they typically assume linearity of the underlying process. Often, these are inadequate for effective classification of nonlinear time series data such as Google Trends data. Given these methodological deficiencies and the abundance of nonlinear time series that persist among real‐world phenomena, we introduce an approach that merges higher order spectral analysis with deep convolutional neural networks for classifying time series. The effectiveness of our approach is illustrated using simulated data and two motivating industry examples that involve Google trends data and electronic device energy consumption data.
more » « less
Interpolating Population Distributions using Public-Use Data: An Application to Income Segregation using American Community Survey Data

https://doi.org/10.1080/01621459.2022.2126779

Simpson, Matthew; Holan, Scott H.; Wikle, Christopher K.; Bradley, Jonathan R. (January 2023, Journal of the American Statistical Association)

Full Text Available
Computationally efficient Bayesian unit-level models for non-Gaussian data under informative sampling with application to estimation of health insurance coverage

https://doi.org/10.1214/21-AOAS1524

Parker, Paul A.; Holan, Scott H.; Janicki, Ryan (June 2022, The Annals of Applied Statistics)

Full Text Available
Hierarchical Bayesian modeling of spatio-temporal area-interaction processes

https://doi.org/10.1016/j.csda.2021.107349

Chen, Jiaxun; Micheas, Athanasios C.; Holan, Scott H. (March 2022, Computational Statistics & Data Analysis)

Full Text Available
Analysis of Household Pulse Survey Public-Use Microdata via Unit-Level Models for Informative Sampling

https://doi.org/10.3390/stats5010010

Sun, Alexander; Parker, Paul A.; Holan, Scott H. (March 2022, Stats)

The Household Pulse Survey, recently released by the U.S. Census Bureau, gathers information about the respondents’ experiences regarding employment status, food security, housing, physical and mental health, access to health care, and education disruption. Design-based estimates are produced for all 50 states and the District of Columbia (DC), as well as 15 Metropolitan Statistical Areas (MSAs). Using public-use microdata, this paper explores the effectiveness of using unit-level model-based estimators that incorporate spatial dependence for the Household Pulse Survey. In particular, we consider Bayesian hierarchical model-based spatial estimates for both a binomial and a multinomial response under informative sampling. Importantly, we demonstrate that these models can be easily estimated using Hamiltonian Monte Carlo through the Stan software package. In doing so, these models can readily be implemented in a production environment. For both the binomial and multinomial responses, an empirical simulation study is conducted, which compares spatial and non-spatial models. Finally, using public-use Household Pulse Survey micro-data, we provide an analysis that compares both design-based and model-based estimators and demonstrates a reduction in standard errors for the model-based approaches.
more » « less
Full Text Available
A Bayesian semiparametric Jolly–Seber model with individual heterogeneity: An application to migratory mallards at stopover

https://doi.org/10.1214/20-AOAS1421

Wu, Guohui; Holan, Scott H.; Avril, Alexis; Waldenström, Jonas (June 2021, The Annals of Applied Statistics)

Full Text Available

« Prev Next »

Search for: All records