Abstract Proliferation of high‐resolution imaging data in recent years has led to substantial improvements in the two popular approaches for analyzing shapes of data objects based on landmarks and/or continuous curves. We provide an expository account of elastic shape analysis of parametric planar curves representing shapes of two‐dimensional (2D) objects by discussing its differences, and its commonalities, to the landmark‐based approach. Particular attention is accorded to the role of reparameterization of a curve, which in addition to rotation, scaling and translation, represents an important shape‐preserving transformation of a curve. The transition to the curve‐based approach moves the mathematical setting of shape analysis from finite‐dimensional non‐Euclidean spaces to infinite‐dimensional ones. We discuss some of the challenges associated with the infinite‐dimensionality of the shape space, and illustrate the use of geometry‐based methods in the computation of intrinsic statistical summaries and in the definition of statistical models on a 2D imaging dataset consisting of mouse vertebrae. We conclude with an overview of the current state‐of‐the‐art in the field. This article is categorized under: Image and Spatial Data < Data: Types and StructureComputational Mathematics < Applications of Computational Statistics
more »
« less
This content will become publicly available on March 7, 2026
Empirical Likelihood in Functional Data Analysis
Functional data analysis (FDA) studies data that include infinite-dimensional functions or objects, generalizing traditional univariate or multivariate observations from each study unit. Among inferential approaches without parametric assumptions, empirical likelihood (EL) offers a principled method in that it extends the framework of parametric likelihood ratio–based inference via the nonparametric likelihood. There has been increasing use of EL in FDA due to its many favorable properties, including self-normalization and the data-driven shape of confidence regions. This article presents a review of EL approaches in FDA, starting with finite-dimensional features, then covering infinite-dimensional features. We contrast smooth and nonsmooth frameworks in FDA and show how EL has been incorporated into both of them. The article concludes with a discussion of some future research directions, including the possibility of applying EL to conformal inference.
more »
« less
- Award ID(s):
- 2112938
- PAR ID:
- 10627798
- Publisher / Repository:
- Annual Reviews
- Date Published:
- Journal Name:
- Annual Review of Statistics and Its Application
- Volume:
- 12
- Issue:
- 1
- ISSN:
- 2326-8298
- Page Range / eLocation ID:
- 425 to 448
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
High‐dimensional inference is one of fundamental problems in modern biomedical studies. However, the existing methods do not perform satisfactorily. Based on the Markov property of graphical models and the likelihood ratio test, this article provides a simple justification for the Markov neighborhood regression method such that it can be applied to statistical inference for high‐dimensional generalized linear models with mixed features. The Markov neighborhood regression method is highly attractive in that it breaks the high‐dimensional inference problems into a series of low‐dimensional inference problems. The proposed method is applied to the cancer cell line encyclopedia data for identification of the genes and mutations that are sensitive to the response of anti‐cancer drugs. The numerical results favor the Markov neighborhood regression method to the existing ones.more » « less
-
Big data is ubiquitous in various fields of sciences, engineering, medicine, social sciences, and humanities. It is often accompanied by a large number of variables and features. While adding much greater flexibility to modeling with enriched feature space, ultra-high dimensional data analysis poses fundamental challenges to scalable learning and inference with good statistical efficiency. Sure independence screening is a simple and effective method to this endeavor. This framework of two-scale statistical learning, consisting of large-scale screening followed by moderate-scale variable selection introduced in Fan and Lv (2008), has been extensively investigated and extended to various model settings ranging from parametric to semiparametric and nonparametric for regression, classification, and survival analysis. This article provides an overview on the developments of sure independence screening over the past decade. These developments demonstrate the wide applicability of the sure independence screening based learning and inference for big data analysis with desired scalability and theoretical guarantees.more » « less
-
The log‐Gaussian Cox process is a flexible and popular stochastic process for modeling point patterns exhibiting spatial and space‐time dependence. Model fitting requires approximation of stochastic integrals which is implemented through discretization over the domain of interest. With fine scale discretization, inference based on Markov chain Monte Carlo is computationally burdensome because of the cost of matrix decompositions and storage, such as the Cholesky, for high dimensional covariance matrices associated with latent Gaussian variables. This article addresses these computational bottlenecks by combining two recent developments: (i) a data augmentation strategy that has been proposed for space‐time Gaussian Cox processes that is based on exact Bayesian inference and does not require fine grid approximations for infinite dimensional integrals, and (ii) a recently developed family of sparsity‐inducing Gaussian processes, called nearest‐neighbor Gaussian processes, to avoid expensive matrix computations. Our inference is delivered within the fully model‐based Bayesian paradigm and does not sacrifice the richness of traditional log‐Gaussian Cox processes. We apply our method to crime event data in San Francisco and investigate the recovery of the intensity surface.more » « less
-
Schwartz, Russell (Ed.)Abstract Motivation Cells in an organism share a common evolutionary history, called cell lineage tree. Cell lineage tree can be inferred from single cell genotypes at genomic variation sites. Cell lineage tree inference from noisy single cell data is a challenging computational problem. Most existing methods for cell lineage tree inference assume uniform uncertainty in genotypes. A key missing aspect is that real single cell data usually has non-uniform uncertainty in individual genotypes. Moreover, existing methods are often sampling based and can be very slow for large data. Results In this article, we propose a new method called ScisTree, which infers cell lineage tree and calls genotypes from noisy single cell genotype data. Different from most existing approaches, ScisTree works with genotype probabilities of individual genotypes (which can be computed by existing single cell genotype callers). ScisTree assumes the infinite sites model. Given uncertain genotypes with individualized probabilities, ScisTree implements a fast heuristic for inferring cell lineage tree and calling the genotypes that allow the so-called perfect phylogeny and maximize the likelihood of the genotypes. Through simulation, we show that ScisTree performs well on the accuracy of inferred trees, and is much more efficient than existing methods. The efficiency of ScisTree enables new applications including imputation of the so-called doublets. Availability and implementation The program ScisTree is available for download at: https://github.com/yufengwudcs/ScisTree. Supplementary information Supplementary data are available at Bioinformatics online.more » « less
An official website of the United States government
