skip to main content


Title: Geographic Boosting Tree: Modeling Non-Stationary Spatial Data
Non-stationarity is often observed in Geographic datasets. One way to explain non-stationarity is to think of it as a hidden "local knowledge" that varies across space. It is inherently difficult to model such data as models built for one region do not necessarily fit another area as the local knowledge could be different. A solution for this problem is to construct multiple local models at various locations, with each local model accounting for a sub-region within which the data remains relatively stationary. However, this approach is sensitive to the size of data, as the local models are only trained from a subset of observations from a particular region. In this paper, we present a novel approach that addresses this problem by aggregating spatially similar sub-regions into relatively large partitions. Our insight is that although local knowledge shifts over space, it is possible for multiple regions to share the same local knowledge. Data from these regions can be aggregated to train a more accurate model. Experiments show that this method can handle non-stationary and outperforms when the dataset is relatively small.  more » « less
Award ID(s):
2018611 1920182 1532061 1338922
NSF-PAR ID:
10275854
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA)
Page Range / eLocation ID:
1205 to 1210
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Recent years saw explosive growth of Human Geography Data, in which spatial non-stationarity is often observed, i.e., relationships between features depend on the location. For these datasets, a single global model cannot accurately describe the relationships among features that vary across space. To address this problem, a viable solution- that has been adopted by many studies-is to create multiple local models instead of a global one, with each local model representing a subregion of the space. However, the challenge with this approach is that the local models are only fitted to nearby observations. For sparsely sampled regions, the data could be too few to generate any high-quality model. This is especially true for Human Geography datasets, as human activities tend to cluster at a few locations. In this paper, we present a modeling method that addresses this problem by letting local models operate within relatively large subregions, where overlapping is allowed. Results from all local models are then fused using an inverse distance weighted approach, to minimize the impact brought by overlapping. Experiments showed that this method handles non-stationary geographic data very Well, even When they are unevenly distributed. 
    more » « less
  2. Geographic datasets are usually accompanied by spatial non-stationarity – a phenomenon that the relationship between features varies across space. Naturally, nonstationarity can be interpreted as the underlying rule that decides how data are generated and alters over space. Therefore, traditional machine learning algorithms are not suitable for handling non-stationary geographic datasets, as they only render a single global model. To solve this problem, researchers often adopt the multiple-local-model approach, which uses different models to account for different sub-regions of space. This approach has been proven efficient but not optimal, as it is inherently difficult to decide the size of subregions. Additionally, the fact that local models are only trained on a subset of data also limits their potential. This paper proposes an entirely different strategy that interprets nonstationarity as a lack of data and addresses it by introducing latent variables to the original dataset. Backpropagation is then used to find the best values for these latent variables. Experiments show that this method is at least as efficient as multiple-local-model-based approaches and has even greater potential. 
    more » « less
  3. In this paper, we propose a Spatial Robust Mixture Regression model to investigate the relationship between a response variable and a set of explanatory variables over the spatial domain, assuming that the relationships may exhibit complex spatially dynamic patterns that cannot be captured by constant regression coefficients. Our method integrates the robust finite mixture Gaussian regression model with spatial constraints, to simultaneously handle the spatial non-stationarity, local homogeneity, and outlier contaminations. Compared with existing spatial regression models, our proposed model assumes the existence a few distinct regression models that are estimated based on observations that exhibit similar response-predictor relationships. As such, the proposed model not only accounts for non-stationarity in the spatial trend, but also clusters observations into a few distinct and homogenous groups. This provides an advantage on interpretation with a few stationary sub-processes identified that capture the predominant relationships between response and predictor variables. Moreover, the proposed method incorporates robust procedures to handle contaminations from both regression outliers and spatial outliers. By doing so, we robustly segment the spatial domain into distinct local regions with similar regression coefficients, and sporadic locations that are purely outliers. Rigorous statistical hypothesis testing procedure has been designed to test the significance of such segmentation. Experimental results on many synthetic and real-world datasets demonstrate the robustness, accuracy, and effectiveness of our proposed method, compared with other robust finite mixture regression, spatial regression and spatial segmentation methods. 
    more » « less
  4. Abstract

    Soils have been heralded as a hidden resource that can be leveraged to mitigate and address some of the major global environmental challenges. Specifically, the organic carbon stored in soils, called soil organic carbon (SOC), can, through proper soil management, help offset fuel emissions, increase food productivity, and improve water quality. As collecting data on SOC are costly and time‐consuming, not much data on SOC are available, although understanding the spatial variability in SOC is of fundamental importance for effective soil management. In this manuscript, we propose a modeling framework that can be used to gain a better understanding of the dependence structure of a spatial process by identifying regions within a spatial domain where the process displays the same spatial correlation range. To achieve this goal, we propose a generalization of the multiresolution approximation (M‐RA) modeling framework of Katzfuss originally introduced as a strategy to reduce the computational burden encountered when analyzing massive spatial datasets. To allow for the possibility that the correlation of a spatial process might be characterized by a different range in different subregions of a spatial domain, we provide the M‐RA basis functions weights with a two‐component mixture prior with one of the mixture components a shrinking prior. We call our approach themixture M‐RA. Application of the mixture M‐RA model to both stationary and nonstationary data show that the mixture M‐RA model can handle both types of data, can correctly establish the type of spatial dependence structure in the data (e.g., stationary versus not), and can identify regions of local stationarity.

     
    more » « less
  5. We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q,R, but unknown and non-stationary dynamics A_t, B_t. The sequence of dynamics matrices can be arbitrary, but with a total variation, V_T, assumed to be o(T) and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially sub-optimal controllers is available for all t, we present an algorithm that achieves the optimal dynamic regret of O(V_T^2/5 T^3/5 ). With piecewise constant dynamics, our algorithm achieves the optimal regret of O(sqrtST ) where S is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems. We also argue that non-adaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of $V_T$. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is non-stationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application. 
    more » « less