NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

nnSVG for the scalable identification of spatially variable genes using nearest-neighbor Gaussian processes

https://doi.org/10.1038/s41467-023-39748-z

Weber, Lukas M.; Saha, Arkajyoti; Datta, Abhirup; Hansen, Kasper D.; Hicks, Stephanie C. (December 2023, Nature Communications)

Abstract Feature selection to identify spatially variable genes or other biologically informative genes is a key step during analyses of spatially-resolved transcriptomics data. Here, we propose nnSVG, a scalable approach to identify spatially variable genes based on nearest-neighbor Gaussian processes. Our method (i) identifies genes that vary in expression continuously across the entire tissue or within a priori defined spatial domains, (ii) uses gene-specific estimates of length scale parameters within the Gaussian process models, and (iii) scales linearly with the number of spatial locations. We demonstrate the performance of our method using experimental data from several technological platforms and simulations. A software implementation is available at https://bioconductor.org/packages/nnSVG .
more » « less
Full Text Available
Scalable Predictions for Spatial Probit Linear Mixed Models Using Nearest Neighbor Gaussian Processes

https://doi.org/10.6339/22-JDS1073

Saha, Arkajyoti; Datta, Abhirup; Banerjee, Sudipto (November 2022, Journal of Data Science)

Spatial probit generalized linear mixed models (spGLMM) with a linear fixed effect and a spatial random effect, endowed with a Gaussian Process prior, are widely used for analysis of binary spatial data. However, the canonical Bayesian implementation of this hierarchical mixed model can involve protracted Markov Chain Monte Carlo sampling. Alternate approaches have been proposed that circumvent this by directly representing the marginal likelihood from spGLMM in terms of multivariate normal cummulative distribution functions (cdf). We present a direct and fast rendition of this latter approach for predictions from a spatial probit linear mixed model. We show that the covariance matrix of the cdf characterizing the marginal cdf of binary spatial data from spGLMM is amenable to approximation using Nearest Neighbor Gaussian Processes (NNGP). This facilitates a scalable prediction algorithm for spGLMM using NNGP that only involves sparse or small matrix computations and can be deployed in an embarrassingly parallel manner. We demonstrate the accuracy and scalability of the algorithm via numerous simulation experiments and an analysis of species presence-absence data.
more » « less
Full Text Available
RandomForestsGLS: An R package for Random Forests fordependent data

https://doi.org/10.21105/joss.03780

Saha, Arkajyoti; Basu, Sumanta; Datta, Abhirup (February 2022, Journal of Open Source Software)

Full Text Available
Random Forests for Spatially Dependent Data

https://doi.org/10.1080/01621459.2021.1950003

Saha, Arkajyoti; Basu, Sumanta; Datta, Abhirup (June 2021, Journal of the American Statistical Association)

Full Text Available
Statistical field calibration of a low-cost PM2.5 monitoring network in Baltimore

https://doi.org/10.1016/j.atmosenv.2020.117761

Datta, Abhirup; Saha, Arkajyoti; Zamora, Misti Levy; Buehler, Colby; Hao, Lei; Xiong, Fulizi; Gentner, Drew R.; Koehler, Kirsten (December 2020, Atmospheric Environment)
null (Ed.)
Full Text Available

Search for: All records