skip to main content


Title: Multi-version Tensor Completion for Time-delayed Spatio-temporal Data

Real-world spatio-temporal data is often incomplete or inaccurate due to various data loading delays. For example, a location-disease-time tensor of case counts can have multiple delayed updates of recent temporal slices for some locations or diseases. Recovering such missing or noisy (under-reported) elements of the input tensor can be viewed as a generalized tensor completion problem. Existing tensor completion methods usually assume that i) missing elements are randomly distributed and ii) noise for each tensor element is i.i.d. zero-mean. Both assumptions can be violated for spatio-temporal tensor data. We often observe multiple versions of the input tensor with different under-reporting noise levels. The amount of noise can be time- or location-dependent as more updates are progressively introduced to the tensor. We model such dynamic data as a multi-version tensor with an extra tensor mode capturing the data updates. We propose a low-rank tensor model to predict the updates over time. We demonstrate that our method can accurately predict the ground-truth values of many real-world tensors. We obtain up to 27.2% lower root mean-squared-error compared to the best baseline method. Finally, we extend our method to track the tensor data over time, leading to significant computational savings.

 
more » « less
Award ID(s):
2034479
NSF-PAR ID:
10298957
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI) 2021
Page Range / eLocation ID:
2906 to 2912
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. SUMMARY

    Repeatedly recording seismic data over a period of months or years is one way to identify trapped oil and gas and to monitor CO2 injection in underground storage reservoirs and saline aquifers. This process of recording data over time and then differencing the images assumes the recording of the data over a particular subsurface region is repeatable. In other words, the hope is that one can recover changes in the Earth when the survey parameters are held fixed between data collection times. Unfortunately, perfect experimental repeatability almost never occurs. Acquisition inconsistencies such as changes in weather (currents, wind) for marine seismic data are inevitable, resulting in source and receiver location differences between surveys at the very least. Thus, data processing aimed at improving repeatability between baseline and monitor surveys is extremely useful. One such processing tool is regularization (or binning) that aligns multiple surveys with different source or receiver configurations onto a common grid. Data binned onto a regular grid can be stored in a high-dimensional data structure called a tensor with, for example, x and y receiver coordinates and time as indices of the tensor. Such a higher-order data structure describing a subsection of the Earth often exhibits redundancies which one can exploit to fill in gaps caused by sampling the surveys onto the common grid. In fact, since data gaps and noise increase the rank of the tensor, seeking to recover the original data by reducing the rank (low-rank tensor-based completion) successfully fills in gaps caused by binning. The tensor nuclear norm (TNN) is defined by the tensor singular value decomposition (tSVD) which generalizes the matrix SVD. In this work we complete missing time-lapse data caused by binning using the alternating direction method of multipliers (or ADMM) to minimize the TNN. For a synthetic experiment with three parabolic events in which the time-lapse difference involves an amplitude increase in one of these events between baseline and monitor data sets, the binning and reconstruction algorithm (TNN-ADMM) correctly recovers this time-lapse change. We also apply this workflow of binning and TNN-ADMM reconstruction to a real marine survey from offshore Western Australia in which the binning onto a regular grid results in significant data gaps. The data after reconstruction varies continuously without the large gaps caused by the binning process.

     
    more » « less
  2. null (Ed.)
    Existing tensor completion formulation mostly relies on partial observations from a single tensor. However, tensors extracted from real-world data often are more complex due to: (i) Partial observation: Only a small subset of tensor elements are available. (ii) Coarse observation: Some tensor modes only present coarse and aggregated patterns (e.g., monthly summary instead of daily reports). In this paper, we are given a subset of the tensor and some aggregated/coarse observations (along one or more modes) and seek to recover the original fine-granular tensor with low-rank factorization. We formulate a coupled tensor completion problem and propose an efficient Multi-resolution Tensor Completion model (MTC) to solve the problem. Our MTC model explores tensor mode properties and leverages the hierarchy of resolutions to recursively initialize an optimization setup, and optimizes on the coupled system using alternating least squares. MTC ensures low computational and space complexity. We evaluate our model on two COVID-19 related spatio-temporal tensors. The experiments show that MTC could provide 65.20% and 75.79% percentage of fitness (PoF) in tensor completion with only 5% fine granular observations, which is 27.96% relative improvement over the best baseline. To evaluate the learned low-rank factors, we also design a tensor prediction task for daily and cumulative disease case predictions, where MTC achieves 50% in PoF and 30% relative improvements over the best baseline. 
    more » « less
  3. In many real-world applications of monitoring multivariate spatio-temporal data that are non-stationary over time, one is often interested in detecting hot-spots with spatial sparsity and temporal consistency, instead of detecting system-wise changes as in traditional statistical process control (SPC) literature. In this paper, we propose an efficient method to detect hot-spots through tensor decomposition, and our method has three steps. First, we fit the observed data into a Smooth Sparse Decomposition Tensor (SSD-Tensor) model that serves as a dimension reduction and de-noising technique: it is an additive model decomposing the original data into: smooth but non-stationary global mean, sparse local anomalies, and random noises. Next, we estimate model parameters by the penalized framework that includes Least Absolute Shrinkage and Selection Operator (LASSO) and fused LASSO penalty. An efficient recursive optimization algorithm is developed based on Fast Iterative Shrinkage Thresholding Algorithm (FISTA). Finally, we apply a Cumulative Sum (CUSUM) Control Chart to monitor model residuals after removing global means, which helps to detect when and where hot-spots occur. To demonstrate the usefulness of our proposed SSD-Tensor method, we compare it with several other methods including scan statistics, LASSO-based, PCA-based, T2-based control chart in extensive numerical simulation studies and a real crime rate dataset. 
    more » « less
  4. Accurate prediction of the transmission of epidemic diseases such as COVID-19 is crucial for implementing effective mitigation measures. In this work, we develop a tensor method to predict the evolution of epidemic trends for many regions simultaneously. We construct a 3-way spatio-temporal tensor (location, attribute, time) of case counts and propose a nonnegative tensor factorization with latent epidemiological model regularization named STELAR. Unlike standard tensor factorization methods which cannot predict slabs ahead, STELAR enables long-term prediction by incorporating latent temporal regularization through a system of discrete time difference equations of a widely adopted epidemiological model. We use latent instead of location/attribute-level epidemiological dynamics to capture common epidemic profile sub-types and improve collaborative learning and prediction. We conduct experiments using both county- and state level COVID-19 data and show that our model can identify interesting latent patterns of the epidemic. Finally, we evaluate the predictive ability of our method and show superior performance compared to the baselines, achieving up to 21% lower root mean square error and 25% lower mean absolute error for county-level prediction. 
    more » « less
  5. Context has been recognized as an important factor to consider in personalized recommender systems. Particularly in location-based services (LBSs), a fundamental task is to recommend to a mobile user where he/she could be interested to visit next at the right time. Additionally, location-based social networks (LBSNs) allow users to share location-embedded information with friends who often co-occur in the same or nearby points-of-interest (POIs) or share similar POI visiting histories, due to the social homophily theory and Tobler’s first law of geography. So, both the time information and LBSN friendship relations should be utilized for POI recommendation. Tensor completion has recently gained some attention in time-aware recommender systems. The problem decomposes a user-item-time tensor into low-rank embedding matrices of users, items and times using its observed entries, so that the underlying low-rank subspace structure can be tracked to fill the missing entries for time-aware recommendation. However, these tensor completion methods ignore the social-spatial context information available in LBSNs, which is important for POI recommendation since people tend to share their preferences with their friends, and near things are more related than distant things. In this paper, we utilize the side information of social networks and POI locations to enhance the tensor completion model paradigm for more effective time-aware POI recommendation. Specifically, we propose a regularization loss head based on a novel social Hausdorff distance function to optimize the reconstructed tensor. We also quantify the popularity of different POIs with location entropy to prevent very popular POIs from being over-represented hence suppressing the appearance of other more diverse POIs. To address the sensitivity of negative sampling, we train the model on the whole data by treating all unlabeled entries in the observed tensor as negative, and rewriting the loss function in a smart way to reduce the computational cost. Through extensive experiments on real datasets, we demonstrate the superiority of our model over state-of-the-art tensor completion methods. 
    more » « less