skip to main content


Search for: All records

Award ID contains: 1953087

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. ABSTRACT

    The aim of this paper is to systematically investigate merging and ensembling methods for spatially varying coefficient mixed effects models (SVCMEM) in order to carry out integrative learning of neuroimaging data obtained from multiple biomedical studies. The ”merged” approach involves training a single learning model using a comprehensive dataset that encompasses information from all the studies. Conversely, the ”ensemble” approach involves creating a weighted average of distinct learning models, each developed from an individual study. We systematically investigate the prediction accuracy of the merged and ensemble learners under the presence of different degrees of interstudy heterogeneity. Additionally, we establish asymptotic guidelines for making strategic decisions about when to employ either of these models in different scenarios, along with deriving optimal weights for the ensemble learner. To validate our theoretical results, we perform extensive simulation studies. The proposed methodology is also applied to 3 large-scale neuroimaging studies.

     
    more » « less
  2. Abstract

    Functional data analysis (FDA) is a fast-growing area of research and development in statistics. While most FDA literature imposes the classical$$\mathbb {L}^2$$L2Hilbert structure on function spaces, there is an emergent need for a different, shape-based approach for analyzing functional data. This paper reviews and develops fundamental geometrical concepts that help connect traditionally diverse fields of shape and functional analyses. It showcases that focusing on shapes is often more appropriate when structural features (number of peaks and valleys and their heights) carry salient information in data. It recaps recent mathematical representations and associated procedures for comparing, summarizing, and testing the shapes of functions. Specifically, it discusses three tasks: shape fitting, shape fPCA, and shape regression models. The latter refers to the models that separate the shapes of functions from their phases and use them individually in regression analysis. The ensuing results provide better interpretations and tend to preserve geometric structures. The paper also discusses an extension where the functions are not real-valued but manifold-valued. The article presents several examples of this shape-centric functional data analysis using simulated and real data.

     
    more » « less
  3. Functional data contains two components: shape (or amplitude) and phase. This paper focuses on a branch of functional data analysis (FDA), namely Shape-Based FDA, that isolates and focuses on shapes of functions. Specifically, this paper focuses on Scalar-on-Shape (ScoSh) regression models that incorporate the shapes of predictor functions and discard their phases. This aspect sets ScoSh models apart from the traditional Scalar-on-Function (ScoF) regression models that incorporate full predictor functions. ScoSh is motivated by object data analysis, {\it, e.g.}, for neuro-anatomical objects, where object morphologies are relevant and their parameterizations are arbitrary. ScoSh also differs from methods that arbitrarily pre-register data and uses it in subsequent analysis. In contrast, ScoSh models perform registration during regression, using the (non-parametric) Fisher-Rao inner product and nonlinear index functions to capture complex predictor-response relationships. This formulation results in novel concepts of {\it regression phase} and {\it regression mean} of functions. Regression phases are time-warpings of predictor functions that optimize prediction errors, and regression means are optimal regression coefficients. We demonstrate practical applications of the ScoSh model using extensive simulated and real-data examples, including predicting COVID outcomes when daily rate curves are predictors. 
    more » « less
    Free, publicly-accessible full text available November 25, 2025
  4. Wasserstein distances form a family of metrics on spaces of probability measures that have recently seen many applications. However, statistical analysis in these spaces is complex due to the nonlinearity of Wasserstein spaces. One potential solution to this problem is Linear Optimal Transport (LOT). This method allows one to find a Euclidean embedding, called {\it LOT embedding}, of measures in some Wasserstein spaces, but some information is lost in this embedding. So, to understand whether statistical analysis relying on LOT embeddings can make valid inferences about original data, it is helpful to quantify how well these embeddings describe that data. To answer this question, we present a decomposition of the {\it Fr\'echet variance} of a set of measures in the 2-Wasserstein space, which allows one to compute the percentage of variance explained by LOT embeddings of those measures. We then extend this decomposition to the Fused Gromov-Wasserstein setting. We also present several experiments that explore the relationship between the dimension of the LOT embedding, the percentage of variance explained by the embedding, and the classification accuracy of machine learning classifiers built on the embedded data. We use the MNIST handwritten digits dataset, IMDB-50000 dataset, and Diffusion Tensor MRI images for these experiments. Our results illustrate the effectiveness of low dimensional LOT embeddings in terms of the percentage of variance explained and the classification accuracy of models built on the embedded data. 
    more » « less
    Free, publicly-accessible full text available November 15, 2025
  5. Addressing the fundamental challenge of signal estimation from noisy data is a crucial aspect of signal processing and data analysis. Existing literature offers various estimators based on distinct observation models and criteria for estimation. This paper introduces an innovative framework that leverages topological and geometric features of the data for signal estimation. The proposed approach introduces a topological tool -- {\it peak-persistence diagram} (PPD) -- to analyze prominent peaks within potential solutions. Initially, the PPD estimates the unknown shape, incorporating details such as the number of internal peaks and valleys. Subsequently, a shape-constrained optimization strategy is employed to estimate the signal. This approach strikes a balance between two prior approaches: signal averaging without alignment and signal averaging with complete elastic alignment. Importantly, the proposed method provides an estimator within a statistical model where the signal is affected by both additive and warping noise. A computationally efficient procedure for implementing this solution is presented, and its effectiveness is demonstrated through simulations and real-world examples, including applications to COVID rate curves and household electricity consumption curves. The results showcase superior performance of the proposed approach compared to several current state-of-the-art techniques. 
    more » « less
    Free, publicly-accessible full text available November 1, 2025
  6. Free, publicly-accessible full text available September 30, 2025
  7. How can one analyze detailed 3D biological objects, such as neuronal and botanical trees, that exhibit complex geometrical and topological variation? In this paper, we develop a novel mathematical framework for representing, comparing, and computing geodesic deformations between the shapes of such tree-like 3D objects. A hierarchical organization of subtrees characterizes these objects - each subtree has a main branch with some side branches attached - and one needs to match these structures across objects for meaningful comparisons. We propose a novel representation that extends the Square-Root Velocity Function (SRVF), initially developed for Euclidean curves, to tree-shaped 3D objects. We then define a new metric that quantifies the bending, stretching, and branch sliding needed to deform one tree-shaped object into the other. Compared to the current metrics such as the Quotient Euclidean Distance (QED) and the Tree Edit Distance (TED), the proposed representation and metric capture the full elasticity of the branches (i.e., bending and stretching) as well as the topological variations (i.e., branch death/birth and sliding). It completely avoids the shrinkage that results from the edge collapse and node split operations of the QED and TED metrics. We demonstrate the utility of this framework in comparing, matching, and computing geodesics between biological objects such as neuronal and botanical trees. We also demonstrate its application to various shape analysis tasks such as (i) symmetry analysis and symmetrization of tree-shaped 3D objects, (ii) computing summary statistics (means and modes of variations) of populations of tree-shaped 3D objects, (iii) fitting parametric probability distributions to such populations, and (iv) finally synthesizing novel tree-shaped 3D objects through random sampling from estimated probability distributions. 
    more » « less
    Free, publicly-accessible full text available April 1, 2025
  8. Large-scale imaging studies often face challenges stemming from heterogeneity arising from differences in geographic location, instrumental setups, image acquisition protocols, study design, and latent variables that remain undisclosed. While numerous regression models have been developed to elucidate the interplay between imaging responses and relevant covariates, limited attention has been devoted to cases where the imaging responses pertain to the domain of shape. This adds complexity to the problem of imaging heterogeneity, primarily due to the unique properties inherent to shape representations, including nonlinearity, high-dimensionality, and the intricacies of quotient space geometry. To tackle this intricate issue, we propose a novel approach: a shape-on-scalar regression model that incorporates confounder adjustment. In particular, we leverage the square root velocity function to extract elastic shape representations which are embedded within the linear Hilbert space of square integrable functions. Subsequently, we introduce a shape regression model aimed at characterizing the intricate relationship between elastic shapes and covariates of interest, all while effectively managing the challenges posed by imaging heterogeneity. We develop comprehensive procedures for estimating and making inferences about the unknown model parameters. Through real-data analysis, our method demonstrates its superiority in terms of estimation accuracy when compared to existing approaches.

     
    more » « less
  9. To better fit the actual data, this paper will consider both spatio-temporal correlation and heterogeneity to build the model. In order to overcome the “curse of dimensionality” problem in the nonparametric method, we improve the estimation method of the single-index model and combine it with the correlation and heterogeneity of the spatio-temporal model to obtain a good estimation method. In this paper, assuming that the spatio-temporal process obeys the α mixing condition, a nonparametric procedure is developed for estimating the variance function based on a fully nonparametric function or dimensional reduction structure, and the resulting estimator is consistent. Then, a reweighting estimation of the parametric component can be obtained via taking the estimated variance function into account. The rate of convergence and the asymptotic normality of the new estimators are established under mild conditions. Simulation studies are conducted to evaluate the efficacy of the proposed methodologies, and a case study about the estimation of the air quality evaluation index in Nanjing is provided for illustration.

     
    more » « less