The paper is concerned with testing normality in samples of curves and error curves estimated from functional regression models. We propose a general paradigm based on the application of multivariate normality tests to vectors of functional principal components scores. We examine finite sample performance of a number of such tests and select the best performing tests. We apply them to several extensively used functional data sets and determine which can be treated as normal, possibly after a suitable transformation. We also offer practical guidance on software implementations of all tests we study and develop large sample justification for tests based on sample skewness and kurtosis of functional principal component scores.
more » « less- NSF-PAR ID:
- 10455097
- Publisher / Repository:
- Wiley-Blackwell
- Date Published:
- Journal Name:
- International Statistical Review
- Volume:
- 88
- Issue:
- 3
- ISSN:
- 0306-7734
- Page Range / eLocation ID:
- p. 677-697
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
We develop tests of normality for time series of functions. The tests are related to the commonly used Jarque–Bera test. The assumption of normality has played an important role in many methodological and theoretical developments in the field of functional data analysis. Yet, no inferential procedures to verify it have been proposed so far, even for i.i.d. functions. We propose several approaches which handle two paramount challenges: (i) the unknown temporal dependence structure and (ii) the estimation of the optimal finite-dimensional projection space.We evaluate the tests via simulations and establish their large sample validity under general conditions. We obtain useful insights by applying them to pollution and intraday price curves. While the pollution curves can be treated as normal, the normality of high-frequency price curves is rejected.more » « less
-
Summary In longitudinal data analysis one frequently encounters non-Gaussian data that are repeatedly collected for a sample of individuals over time. The repeated observations could be binomial, Poisson or of another discrete type or could be continuous. The timings of the repeated measurements are often sparse and irregular. We introduce a latent Gaussian process model for such data, establishing a connection to functional data analysis. The functional methods proposed are non-parametric and computationally straightforward as they do not involve a likelihood. We develop functional principal components analysis for this situation and demonstrate the prediction of individual trajectories from sparse observations. This method can handle missing data and leads to predictions of the functional principal component scores which serve as random effects in this model. These scores can then be used for further statistical analysis, such as inference, regression, discriminant analysis or clustering. We illustrate these non-parametric methods with longitudinal data on primary biliary cirrhosis and show in simulations that they are competitive in comparisons with generalized estimating equations and generalized linear mixed models.
-
Abstract This paper deals with analyzing structural breaks in the covariance operator of sequentially observed functional data. For this purpose, procedures are developed to segment an observed stretch of curves into periods for which second‐order stationarity may be reasonably assumed. The proposed methods are based on measuring the fluctuations of sample eigenvalues, either individually or jointly, and traces of the sample covariance operator computed from segments of the data. To implement the tests, new limit results are introduced that deal with the large‐sample behavior of vector‐valued processes built from partial sample eigenvalue estimates. These results in turn enable the calibration of the tests to a prescribed asymptotic level. Applications to Australian annual minimum temperature curves and sea surface temperature anomaly records confirm that the proposed methods work well in finite samples. The first application suggests that the variation in annual minimum temperature underwent a structural break in the 1950s, after which typical fluctuations from the generally increasing trend started to be significantly smaller.
-
Abstract Quantifying the association between components of multivariate random curves is of general interest and is a ubiquitous and basic problem that can be addressed with functional data analysis. An important application is the problem of assessing functional connectivity based on functional magnetic resonance imaging (fMRI), where one aims to determine the similarity of fMRI time courses that are recorded on anatomically separated brain regions. In the functional brain connectivity literature, the static temporal Pearson correlation has been the prevailing measure for functional connectivity. However, recent research has revealed temporally changing patterns of functional connectivity, leading to the study of dynamic functional connectivity. This motivates new similarity measures for pairs of random curves that reflect the dynamic features of functional similarity. Specifically, we introduce gradient synchronization measures in a general setting. These similarity measures are based on the concordance and discordance of the gradients between paired smooth random functions. Asymptotic normality of the proposed estimates is obtained under regularity conditions. We illustrate the proposed synchronization measures via simulations and an application to resting-state fMRI signals from the Alzheimer’s Disease Neuroimaging Initiative and they are found to improve discrimination between subjects with different disease status.
-
Summary Principal component analysis has become a fundamental tool of functional data analysis. It represents the functional data as Xi(t) = μ(t)+Σ1≤l<∞ηi, l+ vl(t), where μ is the common mean, vl are the eigenfunctions of the covariance operator and the ηi, l are the scores. Inferential procedures assume that the mean function μ(t) is the same for all values of i. If, in fact, the observations do not come from one population, but rather their mean changes at some point(s), the results of principal component analysis are confounded by the change(s). It is therefore important to develop a methodology to test the assumption of a common functional mean. We develop such a test using quantities which can be readily computed in the R package fda. The null distribution of the test statistic is asymptotically pivotal with a well-known asymptotic distribution. The asymptotic test has excellent finite sample performance. Its application is illustrated on temperature data from England.