skip to main content


This content will become publicly available on July 2, 2024

Title: Sparse High-Dimensional Matrix-Valued Graphical Model Learning from Dependent Data
We consider the problem of inferring the conditional independence graph (CIG) of a sparse, high-dimensional, stationary matrix-variate Gaussian time series. All past work on matrix graphical models assume that i.i.d. observations of matrix-variate are available. Here we allow dependent observations. We consider a sparse-group lasso based frequency-domain formulation of the problem with a Kronecker-decomposable power spectral density (PSD), and solve it via an alternating direction method of multipliers (ADMM) approach. The problem is bi-convex which is solved via flip-flop optimization. We provide sufficient conditions for local convergence in the Frobenius norm of the inverse PSD estimators to the true value. This results also yields a rate of convergence. We illustrate our approach using numerical examples.  more » « less
Award ID(s):
2040536
NSF-PAR ID:
10462219
Author(s) / Creator(s):
Date Published:
Journal Name:
2023 IEEE Statistical Signal Processing Workshop (SSP)
Page Range / eLocation ID:
344 to 348
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We consider the problem of inferring the conditional independence graph (CIG) of a high-dimensional stationary multivariate Gaussian time series. A p-variate Gaussian time series graphical model associated with an undirected graph with p vertices is defined as the family of time series that obey the conditional independence restrictions implied by the edge set of the graph. A sparse-group lasso-based frequency-domain formulation of the problem has been considered in the literature where the objective is to estimate the inverse power spectral density (PSD) of the data via optimization of a sparse-group lasso based penalized log-likelihood cost function that is formulated in the frequency-domain. The CIG is then inferred from the estimated inverse PSD. In this paper we establish sufficient conditions for consistency of the inverse PSD estimator resulting from the sparse-group graphical lasso-based approach. 
    more » « less
  2. We consider the problem of inferring the conditional independence graph (CIG) of a high-dimensional stationary multivariate Gaussian time series. A sparse-group lasso based frequency-domain formulation of the problem has been considered in the literature where the objective is to estimate the sparse inverse power spectral density (PSD) of the data. The CIG is then inferred from the estimated inverse PSD. In this paper we investigate use of a sparse-group log-sum penalty (LSP) instead of sparse-group lasso penalty. An alternating direction method of multipliers (ADMM) approach for iterative optimization of the non-convex problem is presented. We provide sufficient conditions for local convergence in the Frobenius norm of the inverse PSD estimators to the true value. This results also yields a rate of convergence. We illustrate our approach using numerical examples utilizing both synthetic and real data. 
    more » « less
  3. Abstract. Scattering codes are used to study the optical properties of polar stratospheric clouds (PSCs). Particle backscattering and depolarization coefficients can be computed with available scattering codes once the particle size distribution (PSD) is known and a suitable refractive index is assumed. However, PSCs often appear as external mixtures of supercooled ternary solution (STS) droplets, solid nitric acid trihydrate (NAT) and possibly ice particles, making the assumption of a single refractive index and a single morphology to model the scatterers questionable.Here we consider a set of 15 coincident measurements of PSCs above McMurdo Station, Antarctica, using ground-based lidar, a balloon-borne optical particle counter (OPC) and in situ observations taken by a laser backscattersonde and OPC during four balloon stratospheric flights from Kiruna, Sweden. This unique dataset of microphysical and optical observations allows us to test the performances of optical scattering models when both spherical and aspherical scatterers of different composition and, possibly, shapes are present. We consider particles as STS if their radius is below a certain threshold value Rth and NAT or possibly ice if it is above it. The refractive indices are assumed known from the literature. Mie scattering is used for the STS, assumed spherical. Scattering from NAT particles, considered spheroids of different aspect ratio (AR), is treated with T-matrix results where applicable. The geometric-optics–integral-equation approach is used whenever the particle size parameter is too large to allow for a convergence of the T-matrix method.The parameters Rth and AR of our model have been varied between 0.1 and 2 µm and between 0.3 and 3, respectively, and the calculated backscattering coefficient and depolarization were compared with the observed ones. The best agreement was found for Rth between 0.5 and 0.8 µm and for AR less than 0.55 and greater than 1.5.To further constrain the variability of AR within the identified intervals, we have sought an agreement with the experimental data by varying AR on a case-by-case basis and further optimizing the agreement by a proper choice of AR smaller than 0.55 and greater than 1.5 and Rth within the interval 0.5 and 0.8 µm. The ARs identified in this way cluster around the values 0.5 and 2.5.The comparison of the calculations with the measurements is presented and discussed. The results of this work help to set limits to the variability of the dimensions and asphericity of PSC solid particles, within the limits of applicability of our model based on the T-matrix theory of scattering and on assumptions on a common particle shape in a PSD and a common threshold radius for all the PSDs. 
    more » « less
  4. We consider the problem of inferring the conditional independence graph (CIG) of a high-dimensional stationary multivariate Gaussian time series. A sparse-group lasso-based frequency-domain formulation of the problem has been considered in the literature where the objective is to estimate the sparse inverse power spectral density (PSD) of the data via optimization of a sparse-group lasso based penalized log-likelihood cost function that is formulated in the frequency-domain. The CIG is then inferred from the estimated inverse PSD. Optimization in the previous approach was performed using an alternating minimization (AM) approach whose performance depends upon choice of a penalty parameter. In this paper we investigate an alternating direction method of multipliers (ADMM) approach for optimization to mitigate dependence on the penalty parameter. We also investigate selection of the tuning parameters based on Bayesian information criterion, and illustrate our approach using synthetic and real data. Comparisons with the "usual" i.i.d. modeling of time series for graph estimation are also provided. 
    more » « less
  5. Recently, there has been significant progress in understanding the convergence and generalization properties of gradient-based methods for training overparameterized learning models. However, many aspects including the role of small random initialization and how the various parameters of the model are coupled during gradient-based updates to facilitate good generalization, remain largely mysterious. A series of recent papers have begun to study this role for non-convex formulations of symmetric Positive Semi-Definite (PSD) matrix sensing problems which involve reconstructing a low-rank PSD matrix from a few linear measurements. The underlying symmetry/PSDness is crucial to existing convergence and generalization guarantees for this problem. In this paper, we study a general overparameterized low-rank matrix sensing problem where one wishes to reconstruct an asymmetric rectangular low-rank matrix from a few linear measurements. We prove that an overparameterized model trained via factorized gradient descent converges to the low-rank matrix generating the measurements. We show that in this setting, factorized gradient descent enjoys two implicit properties: (1) coupling of the trajectory of gradient descent where the factors are coupled in various ways throughout the gradient update trajectory and (2) an algorithmic regularization property where the iterates show a propensity towards low-rank models despite the overparameterized nature of the factorized model. These two implicit properties in turn allow us to show that the gradient descent trajectory from small random initialization moves towards solutions that are both globally optimal and generalize well. 
    more » « less