skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 13 until 2:00 AM ET on Saturday, September 14 due to maintenance. We apologize for the inconvenience.

Title: On Semiparametric Inference of Geostatistical Models via Local Karhunen–Loève Expansion

We develop a semiparametric approach to geostatistical modelling and inference. In particular, we consider a geostatistical model with additive components, where the form of the covariance function of the spatial random error is not prespecified and thus is flexible. A novel, local Karhunen–Loève expansion is developed and a likelihood-based method is devised for estimating the model parameters and statistical inference. A simulation study demonstrates sound finite sample properties and a real data example is given for illustration. Finally, the theoretical properties of the estimates are explored and, in particular, consistency results are established.

more » « less
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Medium: X Size: p. 817-832
p. 817-832
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    We develop a Bayesian model–based approach to finite population estimation accounting for spatial dependence. Our innovation here is a framework that achieves inference for finite population quantities in spatial process settings. A key distinction from the small area estimation setting is that we analyze finite populations referenced by their geographic coordinates. Specifically, we consider a two‐stage sampling design in which the primary units are geographic regions, the secondary units are point‐referenced locations, and the measured values are assumed to be a partial realization of a spatial process. Estimation of finite population quantities from geostatistical models does not account for sampling designs, which can impair inferential performance, whereas design‐based estimates ignore the spatial dependence in the finite population. We demonstrate by using simulation experiments that process‐based finite population sampling models improve model fit and inference over models that fail to account for spatial correlation. Furthermore, the process‐based models offer richer inference with spatially interpolated maps over the entire region. We reinforce these improvements and demonstrate scalable inference for groundwater nitrate levels in the population of California Central Valley wells by offering estimates of mean nitrate levels and their spatially interpolated maps.

    more » « less
  2. Abstract

    Geostatistical modeling for continuous point‐referenced data has extensively been applied to neuroimaging because it produces efficient and valid statistical inference. However, diffusion tensor imaging (DTI), a neuroimaging technique characterizing the brain's anatomical structure, produces a positive‐definite (p.d.) matrix for each voxel. Currently, only a few geostatistical models for p.d. matrices have been proposed because introducing spatial dependence among p.d. matrices properly is challenging. In this paper, we use the spatial Wishart process, a spatial stochastic process (random field), where each p.d. matrix‐variate random variable marginally follows a Wishart distribution, and spatial dependence between random matrices is induced by latent Gaussian processes. This process is valid on an uncountable collection of spatial locations and is almost‐surely continuous, leading to a reasonable way of modeling spatial dependence. Motivated by a DTI data set of cocaine users, we propose a spatial matrix‐variate regression model based on the spatial Wishart process. A problematic issue is that the spatial Wishart process has no closed‐form density function. Hence, we propose an approximation method to obtain a feasible Cholesky decomposition model, which we show to be asymptotically equivalent to the spatial Wishart process model. A local likelihood approximation method is also applied to achieve fast computation. The simulation studies and real data application demonstrate that the Cholesky decomposition process model produces reliable inference and improved performance, compared to other methods.

    more » « less
  3. Summary

    We propose several Bayesian models for modelling time-to-event data. We consider a piecewise exponential model, a fully parametric cure rate model and a semiparametric cure rate model. For each model, we derive the likelihood function and examine some of its properties for carrying out Bayesian inference with non-informative priors. We also examine model identifiability issues and give conditions which guarantee identifiability. Also, for each model, we construct a class of informative prior distributions based on historical data, i.e. data from similar previous studies. These priors, called power priors, prove to be quite useful in this context. We examine the properties of the power priors for Bayesian inference and, in particular, we study their effect on the current analysis. Tools for model comparison and model assessment are also proposed. A detailed case-study of a recently completed melanoma clinical trial conducted by the Eastern Cooperative Oncology Group is presented and the methodology proposed is demonstrated in detail.

    more » « less
  4. Summary

    To assess the compliance of air quality regulations, the Environmental Protection Agency (EPA) must know if a site exceeds a pre-specified level. In the case of ozone, the level for compliance is fixed at 75 parts per billion, which is high, but not extreme at all locations. We present a new space-time model for threshold exceedances based on the skew-t process. Our method incorporates a random partition to permit long-distance asymptotic independence while allowing for sites that are near one another to be asymptotically dependent, and we incorporate thresholding to allow the tails of the data to speak for themselves. We also introduce a transformed AR(1) time-series to allow for temporal dependence. Finally, our model allows for high-dimensional Bayesian inference that is comparable in computation time to traditional geostatistical methods for large data sets. We apply our method to an ozone analysis for July 2005, and find that our model improves over both Gaussian and max-stable methods in terms of predicting exceedances of a high level.

    more » « less
  5. Abstract

    While most spatial data can be modeled with the assumption that distant points are uncorrelated, some problems require dependence at both far and short distances. We introduce a model to directly incorporate dependence in phenomena that influence a distant response. Spatial climate problems often have such modeling needs as data are influenced by local factors in addition to remote phenomena, known as teleconnections. Teleconnections arise from complex interactions between the atmosphere and ocean, of which the El Niño–Southern Oscillation teleconnection is a well‐known example. Our model extends the standard geostatistical modeling framework to account for effects of covariates observed on a spatially remote domain. We frame our model as an extension of spatially varying coefficient models. Connections to existing methods are highlighted, and further modeling needs are addressed by additionally drawing on spatial basis functions and predictive processes. Notably, our approach allows users to model teleconnected data without prespecifying teleconnection indices, which other methods often require. We adopt a hierarchical Bayesian framework to conduct inference and make predictions. The method is demonstrated by predicting precipitation in Colorado while accounting for local factors and teleconnection effects with Pacific Ocean sea surface temperatures. We show how the proposed model improves upon standard methods for estimating teleconnection effects and discuss its utility for climate applications.

    more » « less