skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Regularized and smooth double core tensor factorization for heterogeneous data
We introduce a general tensor model suitable for data analytic tasks for heterogeneous datasets, wherein there are joint low-rank structures within groups of observations, but also discriminative structures across different groups. To capture such complex structures, a double core tensor (DCOT) factorization model is introduced together with a family of smoothing loss functions. By leveraging the proposed smoothing function, the model accurately estimates the model factors, even in the presence of missing entries. A linearized ADMM method is employed to solve regularized versions of DCOT factorizations, that avoid large tensor operations and large memory storage requirements. Further, we establish theoretically its global convergence, together with consistency of the estimates of the model parameters. The effectiveness of the DCOT model is illustrated on several realworld examples including image completion, recommender systems, subspace clustering, and detecting modules in heterogeneous Omics multi-modal data, since it provides more insightful decompositions than conventional tensor methods.  more » « less
Award ID(s):
1838179
PAR ID:
10397166
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Journal of machine learning research
ISSN:
1532-4435
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A tensor network is a diagram that specifies a way to "multiply" a collection of tensors together to produce another tensor (or matrix). Many existing algorithms for tensor problems (such as tensor decomposition and tensor PCA), although they are not presented this way, can be viewed as spectral methods on matrices built from simple tensor networks. In this work we leverage the full power of this abstraction to design new algorithms for certain continuous tensor decomposition problems. An important and challenging family of tensor problems comes from orbit recovery, a class of inference problems involving group actions (inspired by applications such as cryo-electron microscopy). Orbit recovery problems over finite groups can often be solved via standard tensor methods. However, for infinite groups, no general algorithms are known. We give a new spectral algorithm based on tensor networks for one such problem: continuous multi-reference alignment over the infinite group SO(2). Our algorithm extends to the more general heterogeneous case. 
    more » « less
  2. Abstract Ionospheric total electron content (TEC) derived from multi-frequency Global Navigation Satellite System (GNSS) signals and the relevant products have become one of the most utilized parameters in the space weather and ionospheric research community. However, there are a couple of challenges in using the global TEC map data including large data gaps over oceans and the potential of losing meso-scale ionospheric structures when applying traditional reconstruction and smoothing algorithms. In this paper, we describe and release a global TEC map database, constructed and completed based on the Madrigal TEC database with a novel video imputation algorithm called VISTA (Video Imputation with SoftImpute, Temporal smoothing and Auxiliary data). The complete TEC maps reveal important large-scale TEC structures and preserve the observed meso-scale structures. Basic ideas and the pipeline of the video imputation algorithm are introduced briefly, followed by discussions on the computational costs and fine tuning of the adopted algorithm. Discussions on potential usages of the complete TEC database are given, together with a concrete example of applying this database. 
    more » « less
  3. Recovery of consciousness after traumatic brain injury (TBI) is heterogeneous and difficult to predict. Structures such as the thalamus and prefrontal cortex are thought to be important in facilitating consciousness. We sought to investigate whether the integrity of thalamo-prefrontal circuits, assessed via diffusion tensor imaging (DTI), was associated with the return of goal-directed behavior after severe TBI. We classified a cohort of severe TBI patients ( N = 25, 20 males) into Early and Late/Never outcome groups based on their ability to follow commands within 30 days post-injury. We assessed connectivity between whole thalamus, and mediodorsal thalamus (MD), to prefrontal cortex (PFC) subregions including dorsolateral PFC (dlPFC), medial PFC (mPFC), anterior cingulate (ACC), and orbitofrontal (OFC) cortices. We found that the integrity of thalamic projections to PFC subregions (L OFC, L and R ACC, and R mPFC) was significantly associated with Early command-following. This association persisted when the analysis was restricted to prefrontal-mediodorsal (MD) thalamus connectivity. In contrast, dlPFC connectivity to thalamus was not significantly associated with command-following. Using the integrity of thalamo-prefrontal connections, we created a linear regression model that demonstrated 72% accuracy in predicting command-following after a leave-one-out analysis. Together, these data support a role for thalamo-prefrontal connectivity in the return of goal-directed behavior following TBI. 
    more » « less
  4. Motivated by the rapid rise in statistical tools in Functional Data Analysis, we consider the Gaussian mechanism for achieving differential privacy (DP) with parameter estimates taking values in a, potentially infinite-dimensional, separable Banach space. Using classic results from probability theory, we show how densities over function spaces can be utilized to achieve the desired DP bounds. This extends prior results of Hall et al (2013) to a much broader class of statistical estimates and summaries, including “path level" summaries, nonlinear functionals, and full function releases. By focusing on Banach spaces, we provide a deeper picture of the challenges for privacy with complex data, especially the role regularization plays in balancing utility and privacy. Using an application to penalized smoothing, we highlight this balance in the context of mean function estimation. Simulations and an application to {diffusion tensor imaging} are briefly presented, with extensive additions included in a supplement. 
    more » « less
  5. Tackling High-Dimensional Tensor Clustering In the paper “Jointly Modeling and Clustering Tensors in High Dimensions,” Cai, Zhang, and Sun address the challenge of jointly modeling and clustering tensors by introducing a high-dimensional tensor mixture model with heterogeneous covariances. The proposed mixture model exploits the intrinsic structures of tensor data. The authors develop a computationally efficient high-dimensional expectation conditional maximization (HECM) algorithm and show that the HECM iterates, with an appropriate initialization, converge geometrically to a neighborhood that is within statistical precision of the true parameter. The theoretical analysis is nontrivial because of the dual nonconvexity arising from both the expectation maximization-type estimation and the nonconvex objective function in the M step. They also study the convergence rate of the algorithm when the number of clusters is overspecified and when the signal-to-noise ratio diminishes with sample size. The efficacy of the proposed method is demonstrated through numerical experiments and a real-world medical data application. 
    more » « less