skip to main content


Title: Regularized and smooth double core tensor factorization for heterogeneous data
We introduce a general tensor model suitable for data analytic tasks for heterogeneous datasets, wherein there are joint low-rank structures within groups of observations, but also discriminative structures across different groups. To capture such complex structures, a double core tensor (DCOT) factorization model is introduced together with a family of smoothing loss functions. By leveraging the proposed smoothing function, the model accurately estimates the model factors, even in the presence of missing entries. A linearized ADMM method is employed to solve regularized versions of DCOT factorizations, that avoid large tensor operations and large memory storage requirements. Further, we establish theoretically its global convergence, together with consistency of the estimates of the model parameters. The effectiveness of the DCOT model is illustrated on several realworld examples including image completion, recommender systems, subspace clustering, and detecting modules in heterogeneous Omics multi-modal data, since it provides more insightful decompositions than conventional tensor methods.  more » « less
Award ID(s):
1838179
NSF-PAR ID:
10397166
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Journal of machine learning research
ISSN:
1532-4435
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A tensor network is a diagram that specifies a way to "multiply" a collection of tensors together to produce another tensor (or matrix). Many existing algorithms for tensor problems (such as tensor decomposition and tensor PCA), although they are not presented this way, can be viewed as spectral methods on matrices built from simple tensor networks. In this work we leverage the full power of this abstraction to design new algorithms for certain continuous tensor decomposition problems. An important and challenging family of tensor problems comes from orbit recovery, a class of inference problems involving group actions (inspired by applications such as cryo-electron microscopy). Orbit recovery problems over finite groups can often be solved via standard tensor methods. However, for infinite groups, no general algorithms are known. We give a new spectral algorithm based on tensor networks for one such problem: continuous multi-reference alignment over the infinite group SO(2). Our algorithm extends to the more general heterogeneous case. 
    more » « less
  2. Smoothing splines provide a powerful and flexible means for nonparametric estimation and inference. With a cubic time complexity, fitting smoothing spline models to large data is computationally prohibitive. In this paper, we use the theoretical optimal eigenspace to derive a low‐rank approximation of the smoothing spline estimates. We develop a method to approximate the eigensystem when it is unknown and derive error bounds for the approximate estimates. The proposed methods are easy to implement with existing software. Extensive simulations show that the new methods are accurate, fast and compare favourably against existing methods.

     
    more » « less
  3. Abstract

    Ionospheric total electron content (TEC) derived from multi-frequency Global Navigation Satellite System (GNSS) signals and the relevant products have become one of the most utilized parameters in the space weather and ionospheric research community. However, there are a couple of challenges in using the global TEC map data including large data gaps over oceans and the potential of losing meso-scale ionospheric structures when applying traditional reconstruction and smoothing algorithms. In this paper, we describe and release a global TEC map database, constructed and completed based on the Madrigal TEC database with a novel video imputation algorithm called VISTA (Video Imputation with SoftImpute, Temporal smoothing and Auxiliary data). The complete TEC maps reveal important large-scale TEC structures and preserve the observed meso-scale structures. Basic ideas and the pipeline of the video imputation algorithm are introduced briefly, followed by discussions on the computational costs and fine tuning of the adopted algorithm. Discussions on potential usages of the complete TEC database are given, together with a concrete example of applying this database.

     
    more » « less
  4. Summary

    Human rights data presents challenges for capture–recapture methodology. Lists of violent acts provided by many different groups create large, sparse tables of data for which saturated models are difficult to fit and for which simple models may be misspecified. We analyze data on killings and disappearances in Casanare, Colombia during years 1998 to 2007. Our estimates differ whether we choose to model marginal reporting probabilities and odds ratios, versus modeling the full reporting pattern in a conditional (log-linear) model. With 2629 observed killings, a marginal model we consider estimates over 9000 killings, while conditional models we consider estimate 6000–7000 killings. The latter agree with previous estimates, also from a conditional model. We see a twofold difference between the high sample coverage estimate of over 10,000 killings and low sample coverage lower bound estimate of 5200 killings. We use a simulation study to compare marginal and conditional models with at most two-way interactions and sample coverage estimators. The simulation results together with model selection criteria lead us to believe the previous estimates of total killings in Casanare may have been biased downward, suggesting that the violence was worse than previously thought. Model specification is an important consideration when interpreting population estimates from capture recapture analysis and the Casanare data is a protypical example of how that manifests.

     
    more » « less
  5. Recovery of consciousness after traumatic brain injury (TBI) is heterogeneous and difficult to predict. Structures such as the thalamus and prefrontal cortex are thought to be important in facilitating consciousness. We sought to investigate whether the integrity of thalamo-prefrontal circuits, assessed via diffusion tensor imaging (DTI), was associated with the return of goal-directed behavior after severe TBI. We classified a cohort of severe TBI patients ( N = 25, 20 males) into Early and Late/Never outcome groups based on their ability to follow commands within 30 days post-injury. We assessed connectivity between whole thalamus, and mediodorsal thalamus (MD), to prefrontal cortex (PFC) subregions including dorsolateral PFC (dlPFC), medial PFC (mPFC), anterior cingulate (ACC), and orbitofrontal (OFC) cortices. We found that the integrity of thalamic projections to PFC subregions (L OFC, L and R ACC, and R mPFC) was significantly associated with Early command-following. This association persisted when the analysis was restricted to prefrontal-mediodorsal (MD) thalamus connectivity. In contrast, dlPFC connectivity to thalamus was not significantly associated with command-following. Using the integrity of thalamo-prefrontal connections, we created a linear regression model that demonstrated 72% accuracy in predicting command-following after a leave-one-out analysis. Together, these data support a role for thalamo-prefrontal connectivity in the return of goal-directed behavior following TBI. 
    more » « less