skip to main content


Title: Spatially and Robustly Hybrid Mixture Regression Model for Inference of Spatial Dependence
In this paper, we propose a Spatial Robust Mixture Regression model to investigate the relationship between a response variable and a set of explanatory variables over the spatial domain, assuming that the relationships may exhibit complex spatially dynamic patterns that cannot be captured by constant regression coefficients. Our method integrates the robust finite mixture Gaussian regression model with spatial constraints, to simultaneously handle the spatial non-stationarity, local homogeneity, and outlier contaminations. Compared with existing spatial regression models, our proposed model assumes the existence a few distinct regression models that are estimated based on observations that exhibit similar response-predictor relationships. As such, the proposed model not only accounts for non-stationarity in the spatial trend, but also clusters observations into a few distinct and homogenous groups. This provides an advantage on interpretation with a few stationary sub-processes identified that capture the predominant relationships between response and predictor variables. Moreover, the proposed method incorporates robust procedures to handle contaminations from both regression outliers and spatial outliers. By doing so, we robustly segment the spatial domain into distinct local regions with similar regression coefficients, and sporadic locations that are purely outliers. Rigorous statistical hypothesis testing procedure has been designed to test the significance of such segmentation. Experimental results on many synthetic and real-world datasets demonstrate the robustness, accuracy, and effectiveness of our proposed method, compared with other robust finite mixture regression, spatial regression and spatial segmentation methods.  more » « less
Award ID(s):
2047631 1850360
NSF-PAR ID:
10320017
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
2021 IEEE International Conference on Data Mining (ICDM)
Volume:
1
Issue:
1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The relationships between crop yields and meteorology are naturally non-stationary because of spatiotemporal heterogeneity. Many studies have examined spatial heterogeneity in the regression model, but only limited research has attempted to account for both spatial autocorrelation and temporal variation. In this article, we develop a novel spatiotemporally varying coefficient (STVC) model to understand non-stationary relationships between crop yields and meteorological variables. We compare the proposed model with variant models specialized for time or spatial, namely spatial varying coefficient (SVC) model and temporal varying coefficient (TVC) model. This study was conducted using the county-level corn yield and meteorological data, including seasonal Growing Degree Days (GDD), Killing Degree Days (KDD), Vapor Pressure Deficit (VPD), and precipitation (PCPN), from 1981 to 2018 in three Corn Belt states, including Illinois, Indiana, and Iowa. Allowing model coefficients varying in both temporal and spatial dimensions gives the best performance of STVC in simulating the corn yield responses toward various meteorological conditions. The STVC reduced the root-mean-square error to 10.64 Bu/Ac (0.72 Mg/ha) from 15.68 Bu/Ac (1.06 Mg/ha) for TVC and 16.48 Bu/Ac (1.11 Mg/ha) for SVC. Meanwhile, the STVC resulted in a higher R2 of 0.81 compared to 0.56 for SVC and 0.64 for TVC. The STVC showed better performance in handling spatial dependence of corn production, which tends to cluster estimation residuals when counties are close, with the lowest Moran’s I of 0.10. Considering the spatiotemporal non-stationarity, the proposed model significantly improves the power of the meteorological data in explaining the variations of corn yields. 
    more » « less
  2. Summary

    Infants born preterm or small for gestational age have elevated rates of morbidity and mortality. Using birth certificate records in Texas from 2002 to 2004 and Environmental Protection Agency air pollution estimates, we relate the quantile functions of birth weight and gestational age to ozone exposure and multiple predictors, including parental age, race, and education level. We introduce a semi-parametric Bayesian quantile approach that models the full quantile function rather than just a few quantile levels. Our multilevel quantile function model establishes relationships between birth weight and the predictors separately for each week of gestational age and between gestational age and the predictors separately across Texas Public Health Regions. We permit these relationships to vary nonlinearly across gestational age, spatial domain and quantile level and we unite them in a hierarchical model via a basis expansion on the regression coefficients that preserves interpretability. Very low birth weight is a primary concern, so we leverage extreme value theory to supplement our model in the tail of the distribution. Gestational ages are recorded in completed weeks of gestation (integer-valued), so we present methodology for modeling quantile functions of discrete response data. In a simulation study we show that pooling information across gestational age and quantile level substantially reduces MSE of predictor effects. We find that ozone is negatively associated with the lower tail of gestational age in south Texas and across the distribution of birth weight for high gestational ages. Our methods are available in the R package BSquare.

     
    more » « less
  3. Abstract

    In many applications there is interest in estimating the relation between a predictor and an outcome when the relation is known to be monotone or otherwise constrained due to the physical processes involved. We consider one such application‐inferring time‐resolved aerosol concentration from a low‐cost differential pressure sensor. The objective is to estimate a monotone function and make inference on the scaled first derivative of the function. We proposed Bayesian nonparametric monotone regression, which uses a Bernstein polynomial basis to construct the regression function and puts a Dirichlet process prior on the regression coefficients. The base measure of the Dirichlet process is a finite mixture of a mass point at zero and a truncated normal. This construction imposes monotonicity while clustering the basis functions. Clustering the basis functions reduces the parameter space and allows the estimated regression function to be linear. With the proposed approach we can make closed‐formed inference on the derivative of the estimated function including full quantification of uncertainty. In a simulation study the proposed method performs similar to other monotone regression approaches when the true function is wavy but performs better when the true function is linear. We apply the method to estimate time‐resolved aerosol concentration with a newly developed portable aerosol monitor. TheRpackagebnmris made available to implement the method.

     
    more » « less
  4. Abstract

    Soils have been heralded as a hidden resource that can be leveraged to mitigate and address some of the major global environmental challenges. Specifically, the organic carbon stored in soils, called soil organic carbon (SOC), can, through proper soil management, help offset fuel emissions, increase food productivity, and improve water quality. As collecting data on SOC are costly and time‐consuming, not much data on SOC are available, although understanding the spatial variability in SOC is of fundamental importance for effective soil management. In this manuscript, we propose a modeling framework that can be used to gain a better understanding of the dependence structure of a spatial process by identifying regions within a spatial domain where the process displays the same spatial correlation range. To achieve this goal, we propose a generalization of the multiresolution approximation (M‐RA) modeling framework of Katzfuss originally introduced as a strategy to reduce the computational burden encountered when analyzing massive spatial datasets. To allow for the possibility that the correlation of a spatial process might be characterized by a different range in different subregions of a spatial domain, we provide the M‐RA basis functions weights with a two‐component mixture prior with one of the mixture components a shrinking prior. We call our approach themixture M‐RA. Application of the mixture M‐RA model to both stationary and nonstationary data show that the mixture M‐RA model can handle both types of data, can correctly establish the type of spatial dependence structure in the data (e.g., stationary versus not), and can identify regions of local stationarity.

     
    more » « less
  5. The role of perceptual organization in motion analysis has heretofore been minimal. In this work we demonstrate that the use of perceptual organization principles of temporal coherence (common fate) and spatial proximity can result in a robust motion segmentation algorithm that is able to handle drastic illumination changes, occlusion events, and multiple moving objects, without the use of object models. The adopted algorithm does not employ the traditional frame by frame motion analysis, but rather treats the image sequence as a single 3D spatio-temporal block of data. We describe motion using spatio-temporal surfaces, which we, in turn, describe as compositions of finite planar patches. These planar patches, referred to as temporal envelopes, capture the local nature of the motions. We detect these temporal envelopes using 3D-edge detection followed by Hough transform, and represent them with convex hulls. We present a graph-based method to group these temporal envelopes arising from one object based on Gestalt organizational principles. A probabilistic Bayesian network quantifies the saliencies of the relationships between temporal envelopes. We present results on sequences with multiple moving persons, significant occlusions, and scene illumination changes. 
    more » « less