skip to main content

Title: Time-varying correlation structure estimation and local-feature detection for spatio-temporal data
Spatial–temporal data arise frequently in biomedical, environmental, political and social science studies. Capturing dynamic changes of time-varying correlation structure is scientifically important in spatio-temporal data analysis. We approximate the time-varying empirical estimator of the spatial correlation matrix by groups of selected basis matrices representing substructures of the correlation matrix. After projecting the correlation structure matrix onto a space spanned by basis matrices, we also incorporate varying-coefficient model selection and estimation for signals associated with relevant basis matrices. The unique feature of the proposed method is that signals at local regions corresponding with time can be identified through the proposed penalized objective function. Theoretically, we show model selection consistency and the oracle property in detecting local signals for the varying-coefficient estimators. The proposed method is illustrated through simulation studies and brain fMRI data.
; ;
Award ID(s):
Publication Date:
Journal Name:
Journal of Multivariate Analysis
Page Range or eLocation-ID:
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract. Advances in ambient environmental monitoring technologies are enabling concerned communities and citizens to collect data to better understand their local environment and potential exposures. These mobile, low-cost tools make it possible to collect data with increased temporal and spatial resolution, providing data on a large scale with unprecedented levels of detail. This type of data has the potential to empower people to make personal decisions about their exposure and support the development of local strategies for reducing pollution and improving health outcomes. However, calibration of these low-cost instruments has been a challenge. Often, a sensor package is calibrated via field calibration. This involves colocating the sensor package with a high-quality reference instrument for an extended period and then applying machine learning or other model fitting technique such as multiple linear regression to develop a calibration model for converting raw sensor signals to pollutant concentrations. Although this method helps to correct for the effects of ambient conditions (e.g., temperature) and cross sensitivities with nontarget pollutants, there is a growing body of evidence that calibration models can overfit to a given location or set of environmental conditions on account of the incidental correlation between pollutant levels and environmental conditions, including diurnalmore »cycles. As a result, a sensor package trained at a field site may provide less reliable data when moved, or transferred, to a different location. This is a potential concern for applications seeking to perform monitoring away from regulatory monitoring sites, such as personal mobile monitoring or high-resolution monitoring of a neighborhood. We performed experiments confirming that transferability is indeed a problem and show that it can be improved by collecting data from multiple regulatory sites and building a calibration model that leverages data from a more diverse data set. We deployed three sensor packages to each of three sites with reference monitors (nine packages total) and then rotated the sensor packages through the sites over time. Two sites were in San Diego, CA, with a third outside of Bakersfield, CA, offering varying environmental conditions, general air quality composition, and pollutant concentrations. When compared to prior single-site calibration, the multisite approach exhibits better model transferability for a range of modeling approaches. Our experiments also reveal that random forest is especially prone to overfitting and confirm prior results that transfer is a significant source of both bias and standard error. Linear regression, on the other hand, although it exhibits relatively high error, does not degrade much in transfer. Bias dominated in our experiments, suggesting that transferability might be easily increased by detecting and correcting for bias. Also, given that many monitoring applications involve the deployment of many sensor packages based on the same sensing technology, there is an opportunity to leverage the availability of multiple sensors at multiple sites during calibration to lower the cost of training and better tolerate transfer. We contribute a new neural network architecture model termed split-NN that splits the model into two stages, in which the first stage corrects for sensor-to-sensor variation and the second stage uses the combined data of all the sensors to build a model for a single sensor package. The split-NN modeling approach outperforms multiple linear regression, traditional two- and four-layer neural networks, and random forest models. Depending on the training configuration, compared to random forest the split-NN method reduced error 0 %–11 % for NO2 and 6 %–13 % for O3.« less
  2. Summary Infrasound sensors are deployed in a variety of spatial configurations and scales for geophysical monitoring, including networks of single sensors and networks of multi-sensor infrasound arrays. Infrasound signal detection strategies exploiting these data commonly make use of inter-sensor correlation and coherence (array processing, multi-channel correlation); network-based tracking of signal features (e.g. reverse time migration); or a combination of these such as backazimuth cross-bearings for multiple arrays. Single-sensor trace-based denoising techniques offer significant potential to improve all of these various infrasound data processing strategies, but have not previously been investigated in detail. Single-sensor denoising represents a preprocessing step that could reduce the effects of ambient infrasound and wind noise in infrasound signal association and location workflows. We systematically investigate the utility of a range of single-sensor denoising methods for infrasound data processing, including noise gating, non-negative matrix factorisation, and data-adaptive Wiener filtering. For the data testbed, we use the relatively dense regional infrasound network in Alaska, which records a high rate of volcanic eruptions with signals varying in power, duration, and waveform and spectral character. We primarily use data from the 2016–2017 Bogoslof volcanic eruption, which included multiple explosions, and synthetics. The Bogoslof volcanic sequence provides an opportunity to investigatemore »regional infrasound detection, association, and location for a set of real sources with varying source spectra subject to anisotropic atmospheric propagation and varying noise levels (both incoherent wind noise and coherent ambient infrasound, primarily microbaroms). We illustrate the advantages and disadvantages of the different denoising methods in categories such as event detection, waveform distortion, the need for manual data labelling, and computational cost. For all approaches, denoising generally performs better for signals with higher SNR and with less spectral and temporal overlap between signals and noise. Microbaroms are the most globally pervasive and repetitive coherent ambient infrasound noise source, with such noise often referred to as clutter or interference. We find that denoising offers significant potential for microbarom clutter reduction. Single-channel denoising of microbaroms prior to standard array processing enhances both the quantity and bandwidth of detectable volcanic events. We find that reduction of incoherent wind noise is more challenging using the denoising methods we investigate; thus, station hardware (wind noise reduction systems) and site selection remain critical and cannot be replaced by currently available digital denoising methodologies. Overall, we find that adding single-channel denoising as a component in the processing workflow can benefit a variety of infrasound signal detection, association, and location schemes. The denoising methods can also isolate the noise itself, with utility in statistically characterizing ambient infrasound noise.« less
  3. In many real-world applications of monitoring multivariate spatio-temporal data that are non-stationary over time, one is often interested in detecting hot-spots with spatial sparsity and temporal consistency, instead of detecting system-wise changes as in traditional statistical process control (SPC) literature. In this paper, we propose an efficient method to detect hot-spots through tensor decomposition, and our method has three steps. First, we fit the observed data into a Smooth Sparse Decomposition Tensor (SSD-Tensor) model that serves as a dimension reduction and de-noising technique: it is an additive model decomposing the original data into: smooth but non-stationary global mean, sparse local anomalies, and random noises. Next, we estimate model parameters by the penalized framework that includes Least Absolute Shrinkage and Selection Operator (LASSO) and fused LASSO penalty. An efficient recursive optimization algorithm is developed based on Fast Iterative Shrinkage Thresholding Algorithm (FISTA). Finally, we apply a Cumulative Sum (CUSUM) Control Chart to monitor model residuals after removing global means, which helps to detect when and where hot-spots occur. To demonstrate the usefulness of our proposed SSD-Tensor method, we compare it with several other methods including scan statistics, LASSO-based, PCA-based, T2-based control chart in extensive numerical simulation studies and a real crimemore »rate dataset.« less
  4. The relationships between crop yields and meteorology are naturally non-stationary because of spatiotemporal heterogeneity. Many studies have examined spatial heterogeneity in the regression model, but only limited research has attempted to account for both spatial autocorrelation and temporal variation. In this article, we develop a novel spatiotemporally varying coefficient (STVC) model to understand non-stationary relationships between crop yields and meteorological variables. We compare the proposed model with variant models specialized for time or spatial, namely spatial varying coefficient (SVC) model and temporal varying coefficient (TVC) model. This study was conducted using the county-level corn yield and meteorological data, including seasonal Growing Degree Days (GDD), Killing Degree Days (KDD), Vapor Pressure Deficit (VPD), and precipitation (PCPN), from 1981 to 2018 in three Corn Belt states, including Illinois, Indiana, and Iowa. Allowing model coefficients varying in both temporal and spatial dimensions gives the best performance of STVC in simulating the corn yield responses toward various meteorological conditions. The STVC reduced the root-mean-square error to 10.64 Bu/Ac (0.72 Mg/ha) from 15.68 Bu/Ac (1.06 Mg/ha) for TVC and 16.48 Bu/Ac (1.11 Mg/ha) for SVC. Meanwhile, the STVC resulted in a higher R2 of 0.81 compared to 0.56 for SVC and 0.64 for TVC. Themore »STVC showed better performance in handling spatial dependence of corn production, which tends to cluster estimation residuals when counties are close, with the lowest Moran’s I of 0.10. Considering the spatiotemporal non-stationarity, the proposed model significantly improves the power of the meteorological data in explaining the variations of corn yields.« less
  5. In many biomedical and social science studies, it is important to identify and predict the dynamic changes of associations among network data over time. We propose a varying-coefficient model to incorporate time-varying network data, and impose a piecewise penalty function to capture local features of the network associations. The proposed approach is semi-parametric, and therefore flexible in modeling dynamic changes of association in network data problems. Furthermore, the approach can identify the time regions when dynamic changes of associations occur. To achieve a sparse network estimation at local time intervals, we implement a group penalization strategy involving parameters that overlap between groups. However, this makes the optimization process challenging for large-dimensional network data observed at many time points. We develop a fast algorithm, based on the smoothing proximal-gradient method, that is computationally efficient and accurate. We illustrate the proposed method through simulation studies and children's attention deficit hyperactivity disorder fMRI data, showing that the proposed method and algorithm recover dynamic network changes over time efficiently.