Estimation of the mean and covariance parameters for functional data is a critical task, with local linear smoothing being a popular choice. In recent years, many scientific domains are producing multivariate functional data for which $$p$$, the number of curves per subject, is often much larger than the sample size $$n$$. In this setting of high-dimensional functional data, much of developed methodology relies on preliminary estimates of the unknown mean functions and the auto- and cross-covariance functions. This paper investigates the convergence rates of local linear estimators in terms of the maximal error across components and pairs of components for mean and covariance functions, respectively, in both $L^2$ and uniform metrics. The local linear estimators utilize a generic weighting scheme that can adjust for differing numbers of discrete observations $$N_{ij}$$ across curves $$j$$ and subjects $$i$$, where the $$N_{ij}$$ vary with $$n$$. Particular attention is given to the equal weight per observation (OBS) and equal weight per subject (SUBJ) weighting schemes. The theoretical results utilize novel applications of concentration inequalities for functional data and demonstrate that, similar to univariate functional data, the order of the $$N_{ij}$$ relative to $$p$$ and $$n$$ divides high-dimensional functional data into three regimes (sparse, dense, and ultra-dense), with the high-dimensional parametric convergence rate of $$\left\{\log(p)/n\right\}^{1/2}$$ being attainable in the latter two.
more »
« less
Basis expansions for functional snippets
Summary Estimation of mean and covariance functions is fundamental for functional data analysis. While this topic has been studied extensively in the literature, a key assumption is that there are enough data in the domain of interest to estimate both the mean and covariance functions. We investigate mean and covariance estimation for functional snippets in which observations from a subject are available only in an interval of length strictly, and often much, shorter than the length of the whole interval of interest. For such a sampling plan, no data is available for direct estimation of the off-diagonal region of the covariance function. We tackle this challenge via a basis representation of the covariance function. The proposed estimator enjoys a convergence rate that is adaptive to the smoothness of the underlying covariance function, and has superior finite-sample performance in simulation studies.
more »
« less
- Award ID(s):
- 1914917
- PAR ID:
- 10287093
- Date Published:
- Journal Name:
- Biometrika
- Volume:
- 108
- Issue:
- 3
- ISSN:
- 0006-3444
- Page Range / eLocation ID:
- 709 to 726
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Accurate and efficient power demand forecasting in urban settings is essential for making decisions related to planning, managing and operations in electricity supply. This task, however, is complicated due to many sources of uncertainty such as due to the variation in weather conditions and household or other needs that influence the inherent stochastic and nonlinear characteristics of electricity demand. Due to the modeling flexibility and computational efficiency afforded by it, a Gaussian process model is employed in this study for energy demand prediction as a function of temperature. A Gaussian process model is a Bayesian non-parametric regression method that models data using a joint Gaussian distribution with mean and covariance functions. The selected mean function is modeled as a polynomial function of temperature, whereas the covariance function is appropriately selected to reflect the actual data patterns. We employ real data sets of daily temperature and electricity demand from Austin, Texas, USA to assess the effectiveness of the proposed method for load forecasting. The accuracy of the model prediction is evaluated using metrics such as mean absolute error (MAE), root mean squared error (RMSE), mean absolute percentage error (MAPE) and 95% confidence interval (95% CI). A numerical study undertaken demonstrates that the proposed method has promise for energy demand prediction.more » « less
-
Functional Principal Component Analysis (FPCA) has become a widely used dimension reduction tool for functional data analysis. When additional covariates are available, existing FPCA models integrate them either in the mean function or in both the mean function and the covariance function. However, methods of the first kind are not suitable for data that display second-order variation, while those of the second kind are time-consuming and make it difficult to perform subsequent statistical analyses on the dimension-reduced representations. To tackle these issues, we introduce an eigen-adjusted FPCA model that integrates covariates in the covariance function only through its eigenvalues. In particular, different structures on the covariate-specific eigenvalues—corresponding to different practical problems—are discussed to illustrate the model’s flexibility as well as utility. To handle functional observations under …more » « less
-
Abstract Functional data with non-smooth features (e.g., discontinuities in the functional mean and/or covariance) and monotonicity arise frequently in practice. This paper develops simultaneous inference for concurrent functional linear regression in this setting. We construct a simultaneous confidence band for a functional covariate effect of interest. Along with a Wald-type formulation, our approach is based on a powerful nonparametric likelihood ratio method. Our procedures are flexible enough to allow discontinuities in the coefficient functions and the covariance structure, while accounting for discretization of the observed trajectories under a fixed dense design. A simulation study shows that the proposed likelihood ratio-based procedure outperforms the Wald-type procedure in moderate sample sizes. We apply the proposed methods to studying the effect of age on the occupation time curve derived from wearable device data obtained in an NHANES study.more » « less
-
Abstract This paper is motivated by studying differential brain activities to multiple experimental condition presentations in intracranial electroencephalography (iEEG) experiments. Contrasting effects of experimental conditions are often zero in most regions and nonzero in some local regions, yielding locally sparse functions. Such studies are essentially a function-on-scalar regression problem, with interest being focused not only on estimating nonparametric functions but also on recovering the function supports. We propose a weighted group bridge approach for simultaneous function estimation and support recovery in function-on-scalar mixed effect models, while accounting for heterogeneity present in functional data. We use B-splines to transform sparsity of functions to its sparse vector counterpart of increasing dimension, and propose a fast nonconvex optimization algorithm using nested alternative direction method of multipliers (ADMM) for estimation. Large sample properties are established. In particular, we show that the estimated coefficient functions are rate optimal in the minimax sense under the L2 norm and resemble a phase transition phenomenon. For support estimation, we derive a convergence rate under the norm that leads to a selection consistency property under δ-sparsity, and obtain a result under strict sparsity using a simple sufficient regularity condition. An adjusted extended Bayesian information criterion is proposed for parameter tuning. The developed method is illustrated through simulations and an application to a novel iEEG data set to study multisensory integration.more » « less
An official website of the United States government

