skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on March 7, 2026

Title: Empirical Likelihood in Functional Data Analysis
Functional data analysis (FDA) studies data that include infinite-dimensional functions or objects, generalizing traditional univariate or multivariate observations from each study unit. Among inferential approaches without parametric assumptions, empirical likelihood (EL) offers a principled method in that it extends the framework of parametric likelihood ratio–based inference via the nonparametric likelihood. There has been increasing use of EL in FDA due to its many favorable properties, including self-normalization and the data-driven shape of confidence regions. This article presents a review of EL approaches in FDA, starting with finite-dimensional features, then covering infinite-dimensional features. We contrast smooth and nonsmooth frameworks in FDA and show how EL has been incorporated into both of them. The article concludes with a discussion of some future research directions, including the possibility of applying EL to conformal inference.  more » « less
Award ID(s):
2112938
PAR ID:
10627798
Author(s) / Creator(s):
;
Publisher / Repository:
Annual Reviews
Date Published:
Journal Name:
Annual Review of Statistics and Its Application
Volume:
12
Issue:
1
ISSN:
2326-8298
Page Range / eLocation ID:
425 to 448
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Proliferation of high‐resolution imaging data in recent years has led to substantial improvements in the two popular approaches for analyzing shapes of data objects based on landmarks and/or continuous curves. We provide an expository account of elastic shape analysis of parametric planar curves representing shapes of two‐dimensional (2D) objects by discussing its differences, and its commonalities, to the landmark‐based approach. Particular attention is accorded to the role of reparameterization of a curve, which in addition to rotation, scaling and translation, represents an important shape‐preserving transformation of a curve. The transition to the curve‐based approach moves the mathematical setting of shape analysis from finite‐dimensional non‐Euclidean spaces to infinite‐dimensional ones. We discuss some of the challenges associated with the infinite‐dimensionality of the shape space, and illustrate the use of geometry‐based methods in the computation of intrinsic statistical summaries and in the definition of statistical models on a 2D imaging dataset consisting of mouse vertebrae. We conclude with an overview of the current state‐of‐the‐art in the field. This article is categorized under: Image and Spatial Data < Data: Types and StructureComputational Mathematics < Applications of Computational Statistics 
    more » « less
  2. High‐dimensional inference is one of fundamental problems in modern biomedical studies. However, the existing methods do not perform satisfactorily. Based on the Markov property of graphical models and the likelihood ratio test, this article provides a simple justification for the Markov neighborhood regression method such that it can be applied to statistical inference for high‐dimensional generalized linear models with mixed features. The Markov neighborhood regression method is highly attractive in that it breaks the high‐dimensional inference problems into a series of low‐dimensional inference problems. The proposed method is applied to the cancer cell line encyclopedia data for identification of the genes and mutations that are sensitive to the response of anti‐cancer drugs. The numerical results favor the Markov neighborhood regression method to the existing ones. 
    more » « less
  3. Cosmic demographics—the statistical study of populations of astrophysical objects—has long relied on tools from multivariate statistics for analyzing data comprising fixed-length vectors of properties of objects, as might be compiled in a tabular astronomical catalog (say, with sky coordinates, and brightness measurements in a fixed number of spectral passbands). But beginning with the emergence of automated digital sky surveys, ca. 2000, astronomers began producing large collections of data with more complex structures: light curves (brightness time series) and spectra (brightness vs. wavelength). These comprise what statisticians call functional data—measurements of populations of functions. Upcoming automated sky surveys will soon provide astronomers with a flood of functional data. New methods are needed to accurately and optimally analyze large ensembles of light curves and spectra, accumulating information both along individual measured functions and across a population of such functions. Functional data analysis (FDA) provides tools for statistical modeling of functional data. Astronomical data presents several challenges for FDA methodology, e.g., sparse, irregular, and asynchronous sampling, and heteroscedastic measurement error. Bayesian FDA uses hierarchical Bayesian models for function populations, and is well suited to addressing these challenges. We provide an overview of astronomical functional data and some key Bayesian FDA modeling approaches, including functional mixed effects models, and stochastic process models. We briefly describe a Bayesian FDA framework combining FDA and machine learning methods to build low-dimensional parametric models for galaxy spectra. 
    more » « less
  4. Big data is ubiquitous in various fields of sciences, engineering, medicine, social sciences, and humanities. It is often accompanied by a large number of variables and features. While adding much greater flexibility to modeling with enriched feature space, ultra-high dimensional data analysis poses fundamental challenges to scalable learning and inference with good statistical efficiency. Sure independence screening is a simple and effective method to this endeavor. This framework of two-scale statistical learning, consisting of large-scale screening followed by moderate-scale variable selection introduced in Fan and Lv (2008), has been extensively investigated and extended to various model settings ranging from parametric to semiparametric and nonparametric for regression, classification, and survival analysis. This article provides an overview on the developments of sure independence screening over the past decade. These developments demonstrate the wide applicability of the sure independence screening based learning and inference for big data analysis with desired scalability and theoretical guarantees. 
    more » « less
  5. The log‐Gaussian Cox process is a flexible and popular stochastic process for modeling point patterns exhibiting spatial and space‐time dependence. Model fitting requires approximation of stochastic integrals which is implemented through discretization over the domain of interest. With fine scale discretization, inference based on Markov chain Monte Carlo is computationally burdensome because of the cost of matrix decompositions and storage, such as the Cholesky, for high dimensional covariance matrices associated with latent Gaussian variables. This article addresses these computational bottlenecks by combining two recent developments: (i) a data augmentation strategy that has been proposed for space‐time Gaussian Cox processes that is based on exact Bayesian inference and does not require fine grid approximations for infinite dimensional integrals, and (ii) a recently developed family of sparsity‐inducing Gaussian processes, called nearest‐neighbor Gaussian processes, to avoid expensive matrix computations. Our inference is delivered within the fully model‐based Bayesian paradigm and does not sacrifice the richness of traditional log‐Gaussian Cox processes. We apply our method to crime event data in San Francisco and investigate the recovery of the intensity surface. 
    more » « less