skip to main content


Title: Scalable inference for space‐time Gaussian Cox processes

The log‐Gaussian Cox process is a flexible and popular stochastic process for modeling point patterns exhibiting spatial and space‐time dependence. Model fitting requires approximation of stochastic integrals which is implemented through discretization over the domain of interest. With fine scale discretization, inference based on Markov chain Monte Carlo is computationally burdensome because of the cost of matrix decompositions and storage, such as the Cholesky, for high dimensional covariance matrices associated with latent Gaussian variables. This article addresses these computational bottlenecks by combining two recent developments: (i) a data augmentation strategy that has been proposed for space‐time Gaussian Cox processes that is based on exact Bayesian inference and does not require fine grid approximations for infinite dimensional integrals, and (ii) a recently developed family of sparsity‐inducing Gaussian processes, called nearest‐neighbor Gaussian processes, to avoid expensive matrix computations. Our inference is delivered within the fully model‐based Bayesian paradigm and does not sacrifice the richness of traditional log‐Gaussian Cox processes. We apply our method to crime event data in San Francisco and investigate the recovery of the intensity surface.

 
more » « less
Award ID(s):
1916349
NSF-PAR ID:
10091086
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Journal of Time Series Analysis
Volume:
40
Issue:
3
ISSN:
0143-9782
Page Range / eLocation ID:
p. 269-287
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Remote sensing data have been widely used to study various geophysical processes. With the advances in remote sensing technology, massive amount of remote sensing data are collected in space over time. Different satellite instruments typically have different footprints, measurement‐error characteristics, and data coverages. To combine data sets from different satellite instruments, we propose a dynamic fused Gaussian process (DFGP) model that enables fast statistical inference such as filtering and smoothing for massive spatio‐temporal data sets in a data‐fusion context. Based upon a spatio‐temporal‐random‐effect model, the DFGP methodology represents the underlying true process with two components: a linear combination of a small number of basis functions and random coefficients with a general covariance matrix, together with a linear combination of a large number of basis functions and Markov random coefficients. To model the underlying geophysical process at different spatial resolutions, we rely on the change‐of‐support property, which also allows efficient computations in the DFGP model. To estimate model parameters, we devise a computationally efficient stochastic expectation‐maximization algorithm to ensure its scalability for massive data sets. The DFGP model is applied to a total of 3.7 million sea surface temperature data sets in the tropical Pacific Ocean for a one‐week time period in 2010 from Moderate Resolution Imaging Spectroradiometer (MODIS) and Advanced Microwave Scanning Radiometer‐Earth Observing System (AMSR‐E) instruments.

     
    more » « less
  2. ABSTRACT

    Viral deep-sequencing data play a crucial role toward understanding disease transmission network flows, providing higher resolution compared to standard Sanger sequencing. To more fully utilize these rich data and account for the uncertainties in outcomes from phylogenetic analyses, we propose a spatial Poisson process model to uncover human immunodeficiency virus (HIV) transmission flow patterns at the population level. We represent pairings of individuals with viral sequence data as typed points, with coordinates representing covariates such as gender and age and point types representing the unobserved transmission statuses (linkage and direction). Points are associated with observed scores on the strength of evidence for each transmission status that are obtained through standard deep-sequence phylogenetic analysis. Our method is able to jointly infer the latent transmission statuses for all pairings and the transmission flow surface on the source-recipient covariate space. In contrast to existing methods, our framework does not require preclassification of the transmission statuses of data points, and instead learns them probabilistically through a fully Bayesian inference scheme. By directly modeling continuous spatial processes with smooth densities, our method enjoys significant computational advantages compared to previous methods that rely on discretization of the covariate space. We demonstrate that our framework can capture age structures in HIV transmission at high resolution, bringing valuable insights in a case study on viral deep-sequencing data from Southern Uganda.

     
    more » « less
  3. Abstract

    Multivariate spatially oriented data sets are prevalent in the environmental and physical sciences. Scientists seek to jointly model multiple variables, each indexed by a spatial location, to capture any underlying spatial association for each variable and associations among the different dependent variables. Multivariate latent spatial process models have proved effective in driving statistical inference and rendering better predictive inference at arbitrary locations for the spatial process. High‐dimensional multivariate spatial data, which are the theme of this article, refer to data sets where the number of spatial locations and the number of spatially dependent variables is very large. The field has witnessed substantial developments in scalable models for univariate spatial processes, but such methods for multivariate spatial processes, especially when the number of outcomes are moderately large, are limited in comparison. Here, we extend scalable modeling strategies for a single process to multivariate processes. We pursue Bayesian inference, which is attractive for full uncertainty quantification of the latent spatial process. Our approach exploits distribution theory for the matrix‐normal distribution, which we use to construct scalable versions of a hierarchical linear model of coregionalization (LMC) and spatial factor models that deliver inference over a high‐dimensional parameter space including the latent spatial process. We illustrate the computational and inferential benefits of our algorithms over competing methods using simulation studies and an analysis of a massive vegetation index data set.

     
    more » « less
  4. Summary

    To assess the compliance of air quality regulations, the Environmental Protection Agency (EPA) must know if a site exceeds a pre-specified level. In the case of ozone, the level for compliance is fixed at 75 parts per billion, which is high, but not extreme at all locations. We present a new space-time model for threshold exceedances based on the skew-t process. Our method incorporates a random partition to permit long-distance asymptotic independence while allowing for sites that are near one another to be asymptotically dependent, and we incorporate thresholding to allow the tails of the data to speak for themselves. We also introduce a transformed AR(1) time-series to allow for temporal dependence. Finally, our model allows for high-dimensional Bayesian inference that is comparable in computation time to traditional geostatistical methods for large data sets. We apply our method to an ozone analysis for July 2005, and find that our model improves over both Gaussian and max-stable methods in terms of predicting exceedances of a high level.

     
    more » « less
  5. Abstract

    This paper introduces a new approach to inferring the second-order properties of a multivariate log Gaussian Cox process (LGCP) with a complex intensity function. We assume a semi-parametric model for the multivariate intensity function containing an unspecified complex factor common to all types of points. Given this model, we construct a second-order conditional composite likelihood to infer the pair correlation and cross pair correlation functions of the LGCP. Crucially this likelihood does not depend on the unspecified part of the intensity function. We also introduce a cross-validation method for model selection and an algorithm for regularized inference that can be used to obtain sparse models for cross pair correlation functions. The methodology is applied to simulated data as well as data examples from microscopy and criminology. This shows how the new approach outperforms existing alternatives where the intensity functions are estimated non-parametrically.

     
    more » « less