- Award ID(s):
- 1727785
- Publication Date:
- NSF-PAR ID:
- 10310429
- Journal Name:
- ACM Transactions on Knowledge Discovery from Data
- Volume:
- 15
- Issue:
- 1
- ISSN:
- 1556-4681
- Sponsoring Org:
- National Science Foundation
More Like this
-
Matrix completion, the problem of completing missing entries in a data matrix with low-dimensional structure (such as rank), has seen many fruitful approaches and analyses. Tensor completion is the tensor analog that attempts to impute missing tensor entries from similar low-rank type assumptions. In this paper, we study the tensor completion problem when the sampling pattern is deterministic and possibly non-uniform. We first propose an efficient weighted Higher Order Singular Value Decomposition (HOSVD) algorithm for the recovery of the underlying low-rank tensor from noisy observations and then derive the error bounds under a properly weighted metric. Additionally, the efficiency and accuracy of our algorithm are both tested using synthetic and real datasets in numerical simulations.
-
Seismic data are often incomplete due to equipment malfunction, limited source and receiver placement at near and far offsets, and missing crossline data. Seismic data contain redundancies because they are repeatedly recorded over the same or adjacent subsurface regions, causing the data to have a low-rank structure. To recover missing data, one can organize the data into a multidimensional array or tensor and apply a tensor completion method. We can increase the effectiveness and efficiency of low-rank data reconstruction based on tensor singular value decomposition (tSVD) by analyzing the effect of tensor orientation and exploiting the conjugate symmetry of the multidimensional Fourier transform. In fact, these results can be generalized to any order tensor. Relating the singular values of the tSVD to those of a matrix leads to a simplified analysis, revealing that the most square orientation gives the best data structure for low-rank reconstruction. After the first step of the tSVD, a multidimensional Fourier transform, frontal slices of the tensor form conjugate pairs. For each pair, a singular value decomposition can be replaced with a much cheaper conjugate calculation, allowing for faster computation of the tSVD. Using conjugate symmetry in our improved tSVD algorithm reduces the runtime of themore »
-
We propose an algorithm to impute and forecast a time series by transforming the observed time series into a matrix, utilizing matrix estimation to recover missing values and de-noise observed entries, and performing linear regression to make predictions. At the core of our analysis is a representation result, which states that for a large class of models, the transformed time series matrix is (approximately) low-rank. In effect, this generalizes the widely used Singular Spectrum Analysis (SSA) in the time series literature, and allows us to establish a rigorous link between time series analysis and matrix estimation. The key to establishing this link is constructing a Page matrix with non-overlapping entries rather than a Hankel matrix as is commonly done in the literature (e.g., SSA). This particular matrix structure allows us to provide finite sample analysis for imputation and prediction, and prove the asymptotic consistency of our method. Another salient feature of our algorithm is that it is model agnostic with respect to both the underlying time dynamics and the noise distribution in the observations. The noise agnostic property of our approach allows us to recover the latent states when only given access to noisy and partial observations a la amore »
-
We propose an algorithm to impute and forecast a time series by transforming the observed time series into a matrix, utilizing matrix estimation to recover missing values and de-noise observed entries, and performing linear regression to make predictions. At the core of our analysis is a representation result, which states that for a large class of models, the transformed time series matrix is (approximately) low-rank. In effect, this generalizes the widely used Singular Spectrum Analysis (SSA) in the time series literature, and allows us to establish a rigorous link between time series analysis and matrix estimation. The key to establishing this link is constructing a Page matrix with non-overlapping entries rather than a Hankel matrix as is commonly done in the literature (e.g., SSA). This particular matrix structure allows us to provide finite sample analysis for imputation and prediction, and prove the asymptotic consistency of our method. Another salient feature of our algorithm is that it is model agnostic with respect to both the underlying time dynamics and the noise distribution in the observations. The noise agnostic property of our approach allows us to recover the latent states when only given access to noisy and partial observations a la amore »
-
SUMMARY Repeatedly recording seismic data over a period of months or years is one way to identify trapped oil and gas and to monitor CO2 injection in underground storage reservoirs and saline aquifers. This process of recording data over time and then differencing the images assumes the recording of the data over a particular subsurface region is repeatable. In other words, the hope is that one can recover changes in the Earth when the survey parameters are held fixed between data collection times. Unfortunately, perfect experimental repeatability almost never occurs. Acquisition inconsistencies such as changes in weather (currents, wind) for marine seismic data are inevitable, resulting in source and receiver location differences between surveys at the very least. Thus, data processing aimed at improving repeatability between baseline and monitor surveys is extremely useful. One such processing tool is regularization (or binning) that aligns multiple surveys with different source or receiver configurations onto a common grid. Data binned onto a regular grid can be stored in a high-dimensional data structure called a tensor with, for example, x and y receiver coordinates and time as indices of the tensor. Such a higher-order data structure describing a subsection of the Earth often exhibitsmore »