skip to main content


Title: Vecchia Approximations and Optimization for Multivariate Matérn Models
We describe our implementation of the multivariate Matérn model for multivariate spatial datasets, using Vecchia’s approximation and a Fisher scoring optimization algorithm. We consider various pararameterizations for the multivariate Matérn that have been proposed in the literature for ensuring model validity, as well as an unconstrained model. A strength of our study is that the code is tested on many real-world multivariate spatial datasets. We use it to study the effect of ordering and conditioning in Vecchia’s approximation and the restrictions imposed by the various parameterizations. We also consider a model in which co-located nuggets are correlated across components and find that forcing this cross-component nugget correlation to be zero can have a serious impact on the other model parameters, so we suggest allowing cross-component correlation in co-located nugget terms.  more » « less
Award ID(s):
1953088
NSF-PAR ID:
10384563
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Journal of Data Science
ISSN:
1680-743X
Page Range / eLocation ID:
475 to 492
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary For multivariate spatial Gaussian process models, customary specifications of cross-covariance functions do not exploit relational inter-variable graphs to ensure process-level conditional independence between the variables. This is undesirable, especially in highly multivariate settings, where popular cross-covariance functions, such as multivariate Matérn functions, suffer from a curse of dimensionality as the numbers of parameters and floating-point operations scale up in quadratic and cubic order, respectively, with the number of variables. We propose a class of multivariate graphical Gaussian processes using a general construction called stitching that crafts cross-covariance functions from graphs and ensures process-level conditional independence between variables. For the Matérn family of functions, stitching yields a multivariate Gaussian process whose univariate components are Matérn Gaussian processes, and which conforms to process-level conditional independence as specified by the graphical model. For highly multivariate settings and decomposable graphical models, stitching offers massive computational gains and parameter dimension reduction. We demonstrate the utility of the graphical Matérn Gaussian process to jointly model highly multivariate spatial data using simulation examples and an application to air-pollution modelling. 
    more » « less
  2. Abstract

    High spatiotemporal resolution maps of surface vegetation from remote sensing data are desirable for vegetation and disturbance monitoring. However, due to the current limitations of imaging spectrometers, remote sensing datasets of vegetation with high temporal frequency of measurements have lower spatial resolution, and vice versa. In this research, we propose a space-time dynamic linear model to fuse high temporal frequency data (MODIS) with high spatial resolution data (Landsat) to create high spatiotemporal resolution data products of a vegetation greenness index. The model incorporates the spatial misalignment of the data and models dependence within and across land cover types with a latent multivariate Matérn process. To handle the large size of the data, we introduce a fast estimation procedure and a moving window Kalman smoother to produce a daily, 30-m resolution data product with associated uncertainty.

     
    more » « less
  3. Abstract

    Soils have been heralded as a hidden resource that can be leveraged to mitigate and address some of the major global environmental challenges. Specifically, the organic carbon stored in soils, called soil organic carbon (SOC), can, through proper soil management, help offset fuel emissions, increase food productivity, and improve water quality. As collecting data on SOC are costly and time‐consuming, not much data on SOC are available, although understanding the spatial variability in SOC is of fundamental importance for effective soil management. In this manuscript, we propose a modeling framework that can be used to gain a better understanding of the dependence structure of a spatial process by identifying regions within a spatial domain where the process displays the same spatial correlation range. To achieve this goal, we propose a generalization of the multiresolution approximation (M‐RA) modeling framework of Katzfuss originally introduced as a strategy to reduce the computational burden encountered when analyzing massive spatial datasets. To allow for the possibility that the correlation of a spatial process might be characterized by a different range in different subregions of a spatial domain, we provide the M‐RA basis functions weights with a two‐component mixture prior with one of the mixture components a shrinking prior. We call our approach themixture M‐RA. Application of the mixture M‐RA model to both stationary and nonstationary data show that the mixture M‐RA model can handle both types of data, can correctly establish the type of spatial dependence structure in the data (e.g., stationary versus not), and can identify regions of local stationarity.

     
    more » « less
  4. Abstract

    Statistical bias correction techniques are commonly used in climate model projections to reduce systematic biases. Among the several bias correction techniques, univariate linear bias correction (e.g., quantile mapping) is the most popular, given its simplicity. Univariate linear bias correction can accurately reproduce the observed mean of a given climate variable. However, when performed separately on multiple variables, it does not yield the observed multivariate cross‐correlation structure. In the current study, we consider the intrinsic properties of two candidate univariate linear bias‐correction approaches (simple linear regression and asynchronous regression) in estimating the observed cross‐correlation between precipitation and temperature. Two linear regression models are applied separately on both the observed and the projected variables. The analytical solution suggests that two candidate approaches simply reproduce the cross‐correlation from the general circulation models (GCMs) in the bias‐corrected data set because of their linearity. Our study adopts two frameworks, based on the Fisherz‐transformation and bootstrapping, to provide 95% lower and upper confidence limits (referred as the permissible bound) for the GCM cross‐correlation. Beyond the permissible bound, raw/bias‐corrected GCM cross‐correlation significantly differs from those observed. Two frameworks are applied on three GCMs from the CMIP5 multimodel ensemble over the coterminous United States. We found that (a) the univariate linear techniques fail to reproduce the observed cross‐correlation in the bias‐corrected data set over 90% (30–50%) of the grid points where the multivariate skewness coefficient values are substantial (small) and statistically significant (statistically insignificant) from zero; (b) the performance of the univariate linear techniques under bootstrapping (Fisherz‐transformation) remains uniform (non‐uniform) across climate regions, months, and GCMs; (c) grid points, where the observed cross‐correlation is statistically significant, witness a failure fraction of around 0.2 (0.8) under the Fisherz‐transformation (bootstrapping). The importance of reproducing cross‐correlations is also discussed along with an enquiry into the multivariate approaches that can potentially address the bias in yielding cross‐correlations.

     
    more » « less
  5. Abstract

    We present 0.″22-resolution Atacama Large Millimeter/submillimeter Array (ALMA) observations of CO(2−1) emission from the circumnuclear gas disk in the red nugget relic galaxy PGC 11179. The disk shows regular rotation, with projected velocities near the center of 400 km s−1. We assume the CO emission originates from a dynamically cold, thin disk and fit gas-dynamical models directly to the ALMA data. In addition, we explore systematic uncertainties by testing the impacts of various model assumptions on our results. The supermassive black hole (BH) mass (MBH) is measured to beMBH= (1.91 ± 0.04 [1σstatistical]0.51+0.11[systematic]) × 109M, and theH-band stellar mass-to-light ratioM/LH= 1.620 ± 0.004 [1σstatistical]0.107+0.211[systematic]M/L. ThisMBHis consistent with the BH mass−stellar velocity dispersion relation but over-massive compared to the BH mass−bulge luminosity relation by a factor of 3.7. PGC 11179 is part of a sample of local compact early-type galaxies that are plausible relics ofz∼ 2 red nuggets, and its behavior relative to the scaling relations echoes that of three relic galaxy BHs previously measured with stellar dynamics. These over-massive BHs could suggest that BHs gain most of their mass before their host galaxies do. However, our results could also be explained by greater intrinsic scatter at the high-mass end of the scaling relations, or by systematic differences in gas- and stellar-dynamical methods. AdditionalMBHmeasurements in the sample, including independent cross-checks between molecular gas- and stellar-dynamical methods, will advance our understanding of the co-evolution of BHs and their host galaxies.

     
    more » « less