skip to main content


Title: SOFAR: Large-Scale Association Network Learning
Many modern big data applications feature large scale in both numbers of responses and predictors. Better statistical efficiency and scientific insights can be enabled by understanding the large-scale response-predictor association network structures via layers of sparse latent factors ranked by importance. Yet sparsity and orthogonality have been two largely incompatible goals. To accommodate both features, in this paper, we suggest the method of sparse orthogonal factor regression (SOFAR) via the sparse singular value decomposition with orthogonality constrained optimization to learn the underlying association networks, with broad applications to both unsupervised and supervised learning tasks, such as biclustering with sparse singular value decomposition, sparse principal component analysis, sparse factor analysis, and spare vector autoregression analysis. Exploiting the framework of convexity-assisted nonconvex optimization, we derive nonasymptotic error bounds for the suggested procedure characterizing the theoretical advantages. The statistical guarantees are powered by an efficient SOFAR algorithm with convergence property. Both computational and theoretical advantages of our procedure are demonstrated with several simulations and real data examples.  more » « less
Award ID(s):
1718798 1613295
NSF-PAR ID:
10110531
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
IEEE transactions on information theory
Volume:
65
ISSN:
0018-9448
Page Range / eLocation ID:
4924-4939
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Characterization of relationships between time-varying drug exposures and adverse events (AEs) related to health outcomes represents the primary objective in postmarketing drug safety surveillance. Such surveillance increasingly utilizes large-scale longitudinal observational databases (LODs), containing time-stamped patient-level medical information including periods of drug exposure and dates of diagnoses for millions of patients. Statistical methods for LODs must confront computational challenges related to the scale of the data, and must also address confounding and other biases that can undermine efforts to estimate effect sizes. Methods that compare on-drug with off-drug periods within patient offer specific advantages over between patient analysis on both counts. To accomplish these aims, we extend the self-controlled case series (SCCS) for LODs. SCCS implicitly controls for fixed multiplicative baseline covariates since each individual acts as their own control. In addition, only exposed cases are required for the analysis, which is computationally advantageous. The standard SCCS approach is usually used to assess single drugs and therefore estimates marginal associations between individual drugs and particular AEs. Such analyses ignore confounding drugs and interactions and have the potential to give misleading results. In order to avoid these difficulties, we propose a regularized multiple SCCS approach that incorporates potentially thousands or more of time-varying confounders such as other drugs. The approach successfully handles the high dimensionality and can provide a sparse solution via an regularizer. We present details of the model and the associated optimization procedure, as well as results of empirical investigations.

     
    more » « less
  2. Summary

    Aiming at quantifying the dependence of pairs of functional data (X,Y), we develop the concept of functional singular value decomposition for covariance and functional singular component analysis, building on the concept of ‘canonical expansion’ of compact operators in functional analysis. We demonstrate the estimation of the resulting singular values, functions and components for the practically relevant case of sparse and noise-contaminated longitudinal data and provide asymptotic consistency results. Expanding bivariate functional data into singular functions emerges as a natural extension of the popular functional principal component analysis for single processes to the case of paired processes. A natural application of the functional singular value decomposition is a measure of functional correlation. Owing to the involvement of an inverse operation, most previously considered functional correlation measures are plagued by numerical instabilities and strong sensitivity to the choice of smoothing parameters. These problems are exacerbated for the case of sparse longitudinal data, on which we focus. The functional correlation measure that is derived from the functional singular value decomposition behaves well with respect to numerical stability and statistical error, as we demonstrate in a simulation study. Practical feasibility for applications to longitudinal data is illustrated with examples from a study on aging and on-line auctions.

     
    more » « less
  3. Summary

    This paper addresses matrix approximation problems for matrices that are large, sparse, and/or representations of large graphs. To tackle these problems, we consider algorithms that are based primarily on coarsening techniques, possibly combined with random sampling. A multilevel coarsening technique is proposed, which utilizes a hypergraph associated with the data matrix and a graph coarsening strategy based on column matching. We consider a number of standard applications of this technique as well as a few new ones. Among standard applications, we first consider the problem of computingpartial singular value decomposition, for which a combination of sampling and coarsening yields significantly improved singular value decomposition results relative to sampling alone. We also consider thecolumn subset selectionproblem, a popular low‐rank approximation method used in data‐related applications, and show how multilevel coarsening can be adapted for this problem. Similarly, we consider the problem ofgraph sparsificationand show how coarsening techniques can be employed to solve it. We also establish theoretical results that characterize the approximation error obtained and the quality of the dimension reduction achieved by a coarsening step, when a proper column matching strategy is employed. Numerical experiments illustrate the performances of the methods in a few applications.

     
    more » « less
  4. null (Ed.)
    In the non-negative matrix factorization (NMF) problem, the input is an m×n matrix M with non-negative entries and the goal is to factorize it as M ≈ AW. The m × k matrix A and the k × n matrix W are both constrained to have non-negative entries. This is in contrast to singular value decomposition, where the matrices A and W can have negative entries but must satisfy the orthogonality constraint: the columns of A are orthogonal and the rows of W are also orthogonal. The orthogonal non-negative matrix factorization (ONMF) problem imposes both the non-negativity and the orthogonality constraints, and previous work showed that it leads to better performances than NMF on many clustering tasks. We give the first constant-factor approximation algorithm for ONMF when one or both of A and W are subject to the orthogonality constraint. We also show an interesting connection to the correlation clustering problem on bipartite graphs. Our experiments on synthetic and real-world data show that our algorithm achieves similar or smaller errors compared to previous ONMF algorithms while ensuring perfect orthogonality (many previous algorithms do not satisfy the hard orthogonality constraint). 
    more » « less
  5. Modern deep neural networks (DNNs) often require high memory consumption and large computational loads. In order to deploy DNN algorithms efficiently on edge or mobile devices, a series of DNN compression algorithms have been explored, including factorization methods. Factorization methods approximate the weight matrix of a DNN layer with the multiplication of two or multiple low-rank matrices. However, it is hard to measure the ranks of DNN layers during the training process. Previous works mainly induce low-rank through implicit approximations or via costly singular value decomposition (SVD) process on every training step. The former approach usually induces a high accuracy loss while the latter has a low efficiency. In this work, we propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step. SVD training first decomposes each layer into the form of its full-rank SVD, then performs training directly on the decomposed weights. We add orthogonality regularization to the singular vectors, which ensure the valid form of SVD and avoid gradient vanishing/exploding. Low-rank is encouraged by applying sparsity-inducing regularizers on the singular values of each layer. Singular value pruning is applied at the end to explicitly reach a low-rank model. We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy, comparing to not only previous factorization methods but also state-of-the-art filter pruning methods. 
    more » « less