This content will become publicly available on June 27, 2025
- Award ID(s):
- 2304489
- PAR ID:
- 10536043
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- IEEE Transactions on Knowledge and Data Engineering
- ISSN:
- 1041-4347
- Page Range / eLocation ID:
- 1-13
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
null (Ed.)Abstract This article proposes several advances to sparse nonnegative matrix factorization (SNMF) as a way to identify large-scale patterns in urban traffic data. The input to our model is traffic counts organized by time and location. Nonnegative matrix factorization additively decomposes this information, organized as a matrix, into a linear sum of temporal signatures. Penalty terms encourage this factorization to concentrate on only a few temporal signatures, with weights which are not too large. Our interest here is to quantify and compare the regularity of traffic behavior, particularly across different broad temporal windows. In addition to the rank and error, we adapt a measure introduced by Hoyer to quantify sparsity in the representation. Combining these, we construct several curves which quantify error as a function of rank (the number of possible signatures) and sparsity; as rank goes up and sparsity goes down, the approximation can be better and the error should decreases. Plots of several such curves corresponding to different time windows leads to a way to compare disorder/order at different time scalewindows. In this paper, we apply our algorithms and procedures to study a taxi traffic dataset from New York City. In this dataset, we find weekly periodicity in the signatures, which allows us an extra framework for identifying outliers as significant deviations from weekly medians. We then apply our seasonal disorder analysis to the New York City traffic data and seasonal (spring, summer, winter, fall) time windows. We do find seasonal differences in traffic order.more » « less
-
Imputing missing data is a critical task in data-driven intelligent transportation systems. During recent decades there has been a considerable investment in developing various types of sensors and smart systems, including stationary devices (e.g., loop detectors) and floating vehicles equipped with global positioning system (GPS) trackers to collect large-scale traffic data. However, collected data may not include observations from all road segments in a traffic network for different reasons, including sensor failure, transmission error, and because GPS-equipped vehicles may not always travel through all road segments. The first step toward developing real-time traffic monitoring and disruption prediction models is to estimate missing values through a systematic data imputation process. Many of the existing data imputation methods are based on matrix completion techniques that utilize the inherent spatiotemporal characteristics of traffic data. However, these methods may not fully capture the clustered structure of the data. This paper addresses this issue by developing a novel data imputation method using PARATUCK2 decomposition. The proposed method captures both spatial and temporal information of traffic data and constructs a low-dimensional and clustered representation of traffic patterns. The identified spatiotemporal clusters are used to recover network traffic profiles and estimate missing values. The proposed method is implemented using traffic data in the road network of Manhattan in New York City. The performance of the proposed method is evaluated in comparison with two state-of-the-art benchmark methods. The outcomes indicate that the proposed method outperforms the existing state-of-the-art imputation methods in complex and large-scale traffic networks.
-
Models in which the covariance matrix has the structure of a sparse matrix plus a low rank perturbation are ubiquitous in data science applications. It is often desirable for algorithms to take advantage of such structures, avoiding costly matrix computations that often require cubic time and quadratic storage. This is often accomplished by performing operations that maintain such structures, for example, matrix inversion via the Sherman–Morrison–Woodbury formula. In this article, we consider the matrix square root and inverse square root operations. Given a low rank perturbation to a matrix, we argue that a low‐rank approximate correction to the (inverse) square root exists. We do so by establishing a geometric decay bound on the true correction's eigenvalues. We then proceed to frame the correction as the solution of an algebraic Riccati equation, and discuss how a low‐rank solution to that equation can be computed. We analyze the approximation error incurred when approximately solving the algebraic Riccati equation, providing spectral and Frobenius norm forward and backward error bounds. Finally, we describe several applications of our algorithms, and demonstrate their utility in numerical experiments.more » « less
-
Summary Modern technologies are producing a wealth of data with complex structures. For instance, in two-dimensional digital imaging, flow cytometry and electroencephalography, matrix-type covariates frequently arise when measurements are obtained for each combination of two underlying variables. To address scientific questions arising from those data, new regression methods that take matrices as covariates are needed, and sparsity or other forms of regularization are crucial owing to the ultrahigh dimensionality and complex structure of the matrix data. The popular lasso and related regularization methods hinge on the sparsity of the true signal in terms of the number of its non-zero coefficients. However, for the matrix data, the true signal is often of, or can be well approximated by, a low rank structure. As such, the sparsity is frequently in the form of low rank of the matrix parameters, which may seriously violate the assumption of the classical lasso. We propose a class of regularized matrix regression methods based on spectral regularization. A highly efficient and scalable estimation algorithm is developed, and a degrees-of-freedom formula is derived to facilitate model selection along the regularization path. Superior performance of the method proposed is demonstrated on both synthetic and real examples.
-
Abstract We study the low-rank phase retrieval problem, where our goal is to recover a $d_1\times d_2$ low-rank matrix from a series of phaseless linear measurements. This is a fourth-order inverse problem, as we are trying to recover factors of a matrix that have been observed, indirectly, through some quadratic measurements. We propose a solution to this problem using the recently introduced technique of anchored regression. This approach uses two different types of convex relaxations: we replace the quadratic equality constraints for the phaseless measurements by a search over a polytope and enforce the rank constraint through nuclear norm regularization. The result is a convex program in the space of $d_1 \times d_2$ matrices. We analyze two specific scenarios. In the first, the target matrix is rank-$1$, and the observations are structured to correspond to a phaseless blind deconvolution. In the second, the target matrix has general rank, and we observe the magnitudes of the inner products against a series of independent Gaussian random matrices. In each of these problems, we show that anchored regression returns an accurate estimate from a near-optimal number of measurements given that we have access to an anchor matrix of sufficient quality. We also show how to create such an anchor in the phaseless blind deconvolution problem from an optimal number of measurements and present a partial result in this direction for the general rank problem.more » « less