skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Homogeneity Tests of Covariance for High-Dimensional Functional Data with Applications to Event Segmentation
Abstract We consider inference problems for high-dimensional (HD) functional data with a dense number of T repeated measurements taken for a large number of p variables from a small number of n experimental units. The spatial and temporal dependence, high dimensionality, and dense number of repeated measurements pose theoretical and computational challenges. This paper has two aims; our first aim is to solve the theoretical and computational challenges in testing equivalence among covariance matrices from HD functional data. The second aim is to provide computationally efficient and tuning-free tools with guaranteed stochastic error control. The weak convergence of the stochastic process formed by the test statistics is established under the “large p, large T, and small n” setting. If the null is rejected, we further show that the locations of the change points can be estimated consistently. The estimator's rate of convergence is shown to depend on the data dimension, sample size, number of repeated measurements, and signal-to-noise ratio. We also show that our proposed computation algorithms can significantly reduce the computation time and are applicable to real-world data with a large number of HD-repeated measurements (e.g., functional magnetic resonance imaging (fMRI) data). Simulation results demonstrate both the finite sample performance and computational effectiveness of our proposed procedures. We observe that the empirical size of the test is well controlled at the nominal level, and the locations of multiple change points can be accurately identified. An application to fMRI data demonstrates that our proposed methods can identify event boundaries in the preface of the television series Sherlock. Code to implement the procedures is available in an R package named TechPhD.  more » « less
Award ID(s):
2152070
PAR ID:
10419797
Author(s) / Creator(s):
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
79
Issue:
4
ISSN:
0006-341X
Format(s):
Medium: X Size: p. 3332-3344
Size(s):
p. 3332-3344
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Canonical polyadic decomposition (CPD) has been a workhorse for multimodal data analytics. This work puts forth a stochastic algorithmic framework for CPD under β-divergence, which is well-motivated in statistical learning—where the Euclidean distance is typically not preferred. Despite the existence of a series of prior works addressing this topic, pressing computational and theoretical challenges, e.g., scalability and convergence issues, still remain. In this paper, a unified stochastic mirror descent framework is developed for large-scale β-divergence CPD. Our key contribution is the integrated design of a tensor fiber sampling strategy and a flexible stochastic Bregman divergence-based mirror descent iterative procedure, which significantly reduces the computation and memory cost per iteration for various β. Leveraging the fiber sampling scheme and the multilinear algebraic structure of low-rank tensors, the proposed lightweight algorithm also ensures global convergence to a stationary point under mild conditions. Numerical results on synthetic and real data show that our framework attains significant computational saving compared with state-of-the-art methods. 
    more » « less
  2. Abstract We consider the problem of estimating multiple change points for a functional data process. There are numerous examples in science and finance in which the process of interest may be subject to some sudden changes in the mean. The process data that are not in a close vicinity of any change point can be analysed by the usual nonparametric smoothing methods. However, the data close to change points and contain the most pertinent information of structural breaks need to be handled with special care. This paper considers a half-kernel approach that addresses the inference of the total number, locations and jump sizes of the changes. Convergence rates and asymptotic distributional results for the proposed procedures are thoroughly investigated. Simulations are conducted to examine the performance of the approach, and a number of real data sets are analysed to provide an illustration. 
    more » « less
  3. The t-Distributed Stochastic Neighbor Embedding (t-SNE) is known to be a successful method at visualizing high-dimensional data, making it very popular in the machine-learning and data analysis community, especially recently. However, there are two glaring unaddressed problems: (a) Existing GPU accelerated implementations of t-SNE do not account for the poor data locality present in the computation. This results in sparse matrix computations being a bottleneck during execution, especially for large data sets. (b) Another problem is the lack of an effective stopping criterion in the literature. In this paper, we report an improved GPU implementation that uses sparse matrix re-ordering to improve t-SNE's memory access pattern and a novel termination criterion that is better suited for visualization purposes. The proposed methods result in up to 4.63 x end-to-end speedup and provide a practical stopping metric, potentially preventing the algorithm from terminating prematurely or running for an excessive amount of iterations. These developments enable high-quality visualizations and accurate analyses of complex large data sets containing up to 10 million data points and requiring thousands of iterations for convergence. 
    more » « less
  4. Stochastic computing (SC) reduces the complexity of computation by representing numbers with long streams of independent bits. However, increasing performance in SC comes with either an increase in area or a loss in accuracy. Processing in memory (PIM) computes data in-place while having high memory density and supporting bit-parallel operations with low energy consumption. In this article, we propose COSMO, an architecture for co mputing with s tochastic numbers in me mo ry, which enables SC in memory. The proposed architecture is general and can be used for a wide range of applications. It is a highly dense and parallel architecture that supports most SC encodings and operations in memory. It maximizes the performance and energy efficiency of SC by introducing several innovations: (i) in-memory parallel stochastic number generation, (ii) efficient implication-based logic in memory, (iii) novel memory bit line segmenting, (iv) a new memory-compatible SC addition operation, and (v) enabling flexible block allocation. To show the generality and efficiency of our stochastic architecture, we implement image processing, deep neural networks (DNNs), and hyperdimensional (HD) computing on the proposed hardware. Our evaluations show that running DNN inference on COSMO is 141× faster and 80× more energy efficient as compared to GPU. 
    more » « less
  5. Abstract Change‐point detection studies the problem of detecting the changes in the underlying distribution of the data stream as soon as possible after the change happens. Modern large‐scale, high‐dimensional, and complex streaming data call for computationally (memory) efficient sequential change‐point detection algorithms that are also statistically powerful. This gives rise to a computation versus statistical power trade‐off, an aspect less emphasized in the past in classic literature. This tutorial takes this new perspective and reviews several sequential change‐point detection procedures, ranging from classic sequential change‐point detection algorithms to more recent non‐parametric procedures that consider computation, memory efficiency, and model robustness in the algorithm design. Our survey also contains classic performance analysis, which provides useful techniques for analyzing new procedures. This article is categorized under:Statistical Models > Time Series ModelsAlgorithms and Computational Methods > AlgorithmsData: Types and Structure > Time Series, Stochastic Processes, and Functional Data 
    more » « less