We propose an algorithm to impute and forecast a time series by transforming the observed time series into a matrix, utilizing matrix estimation to recover missing values and de-noise observed entries, and performing linear regression to make predictions. At the core of our analysis is a representation result, which states that for a large class of models, the transformed time series matrix is (approximately) low-rank. In effect, this generalizes the widely used Singular Spectrum Analysis (SSA) in the time series literature, and allows us to establish a rigorous link between time series analysis and matrix estimation. The key to establishing this link is constructing a Page matrix with non-overlapping entries rather than a Hankel matrix as is commonly done in the literature (e.g., SSA). This particular matrix structure allows us to provide finite sample analysis for imputation and prediction, and prove the asymptotic consistency of our method. Another salient feature of our algorithm is that it is model agnostic with respect to both the underlying time dynamics and the noise distribution in the observations. The noise agnostic property of our approach allows us to recover the latent states when only given access to noisy and partial observations a la a Hidden Markov Model; e.g., recovering the time-varying parameter of a Poisson process without knowing that the underlying process is Poisson. Furthermore, since our forecasting algorithm requires regression with noisy features, our approach suggests a matrix estimation based method-coupled with a novel, non-standard matrix estimation error metric-to solve the error-in-variable regression problem, which could be of interest in its own right. Through synthetic and real-world datasets, we demonstrate that our algorithm outperforms standard software packages (including R libraries) in the presence of missing data as well as high levels of noise.
more »
« less
Model Agnostic Time Series Analysis via Matrix Estimation
We propose an algorithm to impute and forecast a time series by transforming the observed time series into a
matrix, utilizing matrix estimation to recover missing values and de-noise observed entries, and performing
linear regression to make predictions. At the core of our analysis is a representation result, which states that
for a large class of models, the transformed time series matrix is (approximately) low-rank. In effect, this
generalizes the widely used Singular Spectrum Analysis (SSA) in the time series literature, and allows us to
establish a rigorous link between time series analysis and matrix estimation. The key to establishing this link
is constructing a Page matrix with non-overlapping entries rather than a Hankel matrix as is commonly done
in the literature (e.g., SSA). This particular matrix structure allows us to provide finite sample analysis for
imputation and prediction, and prove the asymptotic consistency of our method. Another salient feature of
our algorithm is that it is model agnostic with respect to both the underlying time dynamics and the noise
distribution in the observations. The noise agnostic property of our approach allows us to recover the latent
states when only given access to noisy and partial observations a la a Hidden Markov Model; e.g., recovering
the time-varying parameter of a Poisson process without knowing that the underlying process is Poisson.
Furthermore, since our forecasting algorithm requires regression with noisy features, our approach suggests a
matrix estimation based method—coupled with a novel, non-standard matrix estimation error metric—to solve
the error-in-variable regression problem, which could be of interest in its own right. Through synthetic and
real-world datasets, we demonstrate that our algorithm outperforms standard software packages (including R
libraries) in the presence of missing data as well as high levels of noise.
more »
« less
- PAR ID:
- 10078875
- Date Published:
- Journal Name:
- ACM SIGMETRICS performance evaluation review
- Volume:
- 2
- Issue:
- 3
- ISSN:
- 1557-9484
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We consider sparse matrix estimation where the goal is to estimate an n-by-n matrix from noisy observations of a small subset of its entries. We analyze the estimation error of the popularly used collaborative filtering algorithm for the sparse regime. Specifically, we propose a novel iterative variant of the algorithm, adapted to handle the setting of sparse observations. We establish that as long as the number of entries observed at random scale logarithmically larger than linear in n, the estimation error with respect to the entry-wise max norm decays to zero as n goes to infinity, assuming the underlying matrix of interest has constant rank r. Our result is robust to model misspecification in that if the underlying matrix is approximately rank r, then the estimation error decays to the approximation error with respect to the [Formula: see text]-norm. In the process, we establish the algorithm’s ability to handle arbitrary bounded noise in the observations.more » « less
-
null (Ed.)Matrix completion, the problem of completing missing entries in a data matrix with low-dimensional structure (such as rank), has seen many fruitful approaches and analyses. Tensor completion is the tensor analog that attempts to impute missing tensor entries from similar low-rank type assumptions. In this paper, we study the tensor completion problem when the sampling pattern is deterministic and possibly non-uniform. We first propose an efficient weighted Higher Order Singular Value Decomposition (HOSVD) algorithm for the recovery of the underlying low-rank tensor from noisy observations and then derive the error bounds under a properly weighted metric. Additionally, the efficiency and accuracy of our algorithm are both tested using synthetic and real datasets in numerical simulations.more » « less
-
Low-rank Matrix Completion (LRMC) describes the problem where we wish to recover missing entries of partially observed low-rank matrix. Most existing matrix completion work deals with sampling procedures that are independent of the underlying data values. While this assumption allows the derivation of nice theoretical guarantees, it seldom holds in real-world applications. In this paper, we consider various settings where the sampling mask is dependent on the underlying data values, motivated by applications in sensing, sequential decision-making, and recommender systems. Through a series of experiments, we study and compare the performance of various LRMC algorithms that were originally successful for data-independent sampling patterns.more » « less
-
We consider the problem of tensor estimation from noisy observations with possibly missing entries. A nonparametric approach to tensor completion is developed based on a new model which we coin as sign representable tensors. The model represents the signal tensor of interest using a series of structured sign tensors. Unlike earlier methods, the sign series representation effectively addresses both low- and high-rank signals, while encompassing many existing tensor models— including CP models, Tucker models, single index models, structured tensors with repeating entries—as special cases. We provably reduce the tensor estimation problem to a series of structured classification tasks, and we develop a learning reduction machinery to empower existing low-rank tensor algorithms for more challenging high-rank estimation. Excess risk bounds, estimation errors, and sample complexities are established. We demonstrate the outperformance of our approach over previous methods on two datasets, one on human brain connectivity networks and the other on topic data mining.more » « less