skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Efficient multi-scale Gaussian process regression for massive remote sensing data with satGP v0.1.2
Abstract. Satellite remote sensing provides a global view to processes on Earth that has unique benefits compared to making measurements on the ground, such as global coverage and enormous data volume. The typical downsides are spatial and temporal gaps and potentially low data quality. Meaningful statistical inference from such data requires overcoming these problems and developing efficient and robust computational tools.We design and implement a computationally efficient multi-scale Gaussian process (GP) software package, satGP, geared towards remote sensing applications. The software is able to handle problems of enormous sizes and to compute marginals and sample from the random field conditioning on at least hundreds of millions of observations. This is achieved by optimizing the computation by, e.g., randomization and splitting the problem into parallel local subproblems which aggressively discard uninformative data. We describe the mean function of the Gaussian process by approximating marginals of a Markov random field (MRF). Variability around the mean is modeled with a multi-scale covariance kernel, which consists of Matérn, exponential, and periodic components. We also demonstrate how winds can be used to inform covariances locally.The covariance kernel parameters are learned by calculating an approximate marginal maximum likelihood estimate, and the validity of both the multi-scale approach and the method used to learn the kernel parameters is verified in synthetic experiments. We apply these techniques to a moderate size ozone data set produced by an atmospheric chemistry model and to the very large number of observations retrieved from the Orbiting Carbon Observatory 2 (OCO-2) satellite. The satGP software is released under an open-source license.  more » « less
Award ID(s):
1723011
PAR ID:
10206573
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Geoscientific Model Development
Volume:
13
Issue:
7
ISSN:
1991-9603
Page Range / eLocation ID:
3439 to 3463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary A new proliferation of optical instruments that can be attached to towers over or within ecosystems, or ‘proximal’ remote sensing, enables a comprehensive characterization of terrestrial ecosystem structure, function, and fluxes of energy, water, and carbon. Proximal remote sensing can bridge the gap between individual plants, site‐level eddy‐covariance fluxes, and airborne and spaceborne remote sensing by providing continuous data at a high‐spatiotemporal resolution. Here, we review recent advances in proximal remote sensing for improving our mechanistic understanding of plant and ecosystem processes, model development, and validation of current and upcoming satellite missions. We provide current best practices for data availability and metadata for proximal remote sensing: spectral reflectance, solar‐induced fluorescence, thermal infrared radiation, microwave backscatter, and LiDAR. Our paper outlines the steps necessary for making these data streams more widespread, accessible, interoperable, and information‐rich, enabling us to address key ecological questions unanswerable from space‐based observations alone and, ultimately, to demonstrate the feasibility of these technologies to address critical questions in local and global ecology. 
    more » « less
  2. We propose an extrinsic Bayesian optimization (eBO) framework for general optimization problems on manifolds. Bayesian optimization algorithms build a surrogate of the objective function by employing Gaussian processes and utilizing the uncertainty in that surrogate by deriving an acquisition function. This acquisition function represents the probability of improvement based on the kernel of the Gaussian process, which guides the search in the optimization process. The critical challenge for designing Bayesian optimization algorithms on manifolds lies in the difficulty of constructing valid covariance kernels for Gaussian processes on general manifolds. Our approach is to employ extrinsic Gaussian processes by first embedding the manifold onto some higher dimensional Euclidean space via equivariant embeddings and then constructing a valid covariance kernel on the image manifold after the embedding. This leads to efficient and scalable algorithms for optimization over complex manifolds. Simulation study and real data analyses are carried out to demonstrate the utilities of our eBO framework by applying the eBO to various optimization problems over manifolds such as the sphere, the Grassmannian, and the manifold of positive definite matrices. 
    more » « less
  3. Sustained observations are required to determine the marine plastic debris mass balance and to support effective policy for planning remedial action. However, observations currently remain scarce at the global scale. A satellite remote sensing system could make a substantial contribution to tackling this problem. Here, we make initial steps towards the potential design of such a remote sensing system by: (1) identifying the properties of marine plastic debris amenable to remote sensing methods and (2) highlighting the oceanic processes relevant to scientific questions about marine plastic debris. Remote sensing approaches are reviewed and matched to the optical properties of marine plastic debris and the relevant spatio-temporal scales of observation to identify challenges and opportunities in the field. Finally, steps needed to develop marine plastic debris detection by remote sensing platforms are proposed in terms of fundamental science as well as linkages to ongoing planning for satellite systems with similar observation requirements. 
    more » « less
  4. Abstract Distance covariance is a popular dependence measure for two random vectors $$X$$ and $$Y$$ of possibly different dimensions and types. Recent years have witnessed concentrated efforts in the literature to understand the distributional properties of the sample distance covariance in a high-dimensional setting, with an exclusive emphasis on the null case that $$X$$ and $$Y$$ are independent. This paper derives the first non-null central limit theorem for the sample distance covariance, and the more general sample (Hilbert–Schmidt) kernel distance covariance in high dimensions, in the distributional class of $(X,Y)$ with a separable covariance structure. The new non-null central limit theorem yields an asymptotically exact first-order power formula for the widely used generalized kernel distance correlation test of independence between $$X$$ and $$Y$$. The power formula in particular unveils an interesting universality phenomenon: the power of the generalized kernel distance correlation test is completely determined by $$n\cdot \operatorname{dCor}^{2}(X,Y)/\sqrt{2}$$ in the high-dimensional limit, regardless of a wide range of choices of the kernels and bandwidth parameters. Furthermore, this separation rate is also shown to be optimal in a minimax sense. The key step in the proof of the non-null central limit theorem is a precise expansion of the mean and variance of the sample distance covariance in high dimensions, which shows, among other things, that the non-null Gaussian approximation of the sample distance covariance involves a rather subtle interplay between the dimension-to-sample ratio and the dependence between $$X$$ and $$Y$$. 
    more » « less
  5. Abstract Vegetation water content (VWC) plays a key role in transpiration, plant mortality, and wildfire risk. Although land surface models now often contain plant hydraulics schemes, there are few direct VWC measurements to constrain these models at global scale. One proposed solution to this data gap is passive microwave remote sensing, which is sensitive to temporal changes in VWC. Here, we test that approach by using synthetic microwave observations to constrain VWC and surface soil moisture within the Climate Modeling Alliance Land model. We further investigate the possible utility of sub‐daily observations of VWC, which could be obtained through a satellite in geostationary orbit or combinations of multiple satellites. These high‐temporal‐resolution observations could allow for improved determination of ecosystem parameters, carbon and water fluxes, and subsurface hydraulics, relative to the currently available twice‐daily sun‐synchronous observational patterns. We find that incorporating observations at four different times in the diurnal cycle (such as could be available from two sun‐synchronous satellites) provides a significantly better constraint on water and carbon fluxes than twice‐daily observations do. For example, the root mean square error of projected evapotranspiration and gross primary productivity during drought periods was reduced by approximately 40%, when using four‐times‐daily relative to twice‐daily observations. Adding hourly observations of the entire diurnal cycle did not further improve the inferred parameters and fluxes. Our comparison of observational strategies may be informative in the design of future satellite missions to study plant hydraulics, as well as when using existing remotely sensed data to study vegetation water stress response. 
    more » « less