NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Impact of Data Quality on Deep Learning Prediction of Spatial Transcriptomics from Histology Images

https://doi.org/10.1101/2025.09.04.674228

Hallinan, Caleb; Lucas, Calixto-Hope G; Fan, Jean (September 2025, bioRxiv)

Abstract Spatial transcriptomics technologies enable high-throughput quantification of gene expression at specific locations across tissue sections, facilitating insights into the spatial organization of biological processes. However, high costs associated with these technologies have motivated the development of deep learning methods to predict spatial gene expression from inexpensive hematoxylin and eosin-stained histology images. While most efforts have focused on modifying model architectures to boost predictive performance, the influence of training data quality remains largely unexplored. Here, we investigate how variation in molecular and image data quality stemming from differences in imaging (Xenium) versus sequencing (Visium) spatial transcriptomics technologies impact deep learning-based gene expression prediction from histology images. To delineate the aspects of data quality that impact predictive performance, we conductedin silicoablation experiments, which showed that increased sparsity and noise in molecular data degraded predictive performance, whilein silicorescue experiments via imputation provided only limited improvements that failed to generalize beyond the test set. Likewise, reduced image resolution can degrade predictive performance and further impacts model interpretability. Overall, our results underscore how improving data quality offers an orthogonal strategy to tuning model architecture in enhancing predictive modeling using spatial transcriptomics and emphasize the need for careful consideration of technological limitations that directly impact data quality when developing predictive methodologies.
more » « less
Free, publicly-accessible full text available September 9, 2026
scatterbar: an R package for visualizing proportional data across spatially resolved coordinates

https://doi.org/10.1093/bioinformatics/btaf047

Velazquez, Dee; Fan, Jean; Kelso, ed., Janet (January 2025, Bioinformatics)

Abstract MotivationDisplaying proportional data across many spatially resolved coordinates is a challenging but important data visualization task, particularly for spatially resolved transcriptomics data. Scatter pie plots are one type of commonly used data visualization for such data but present perceptual challenges that may lead to difficulties in interpretation. Increasing the visual saliency of such data visualizations can help viewers more accurately identify proportional trends and compare proportional differences across spatial locations. ResultsWe developed scatterbar, an open-source R package that extends ggplot2, to visualize proportional data across many spatially resolved coordinates using scatter stacked bar plots. We apply scatterbar to visualize deconvolved cell-type proportions from a spatial transcriptomics dataset of the adult mouse brain to demonstrate how scatter stacked bar plots can enhance the distinguishability of proportional distributions compared to scatter pie plots. Availability and implementationscatterbar is available on CRAN https://cran.r-project.org/package=scatterbar with additional documentation and tutorials at https://jef.works/scatterbar/.
more » « less
SEraster: a rasterization preprocessing framework for scalable spatial omics data analysis

https://doi.org/10.1093/bioinformatics/btae412

Aihara, Gohta; Clifton, Kalen; Chen, Mayling; Li, Zhuoyan; Atta, Lyla; Miller, Brendan_F; Satija, Rahul; Hickey, John_W; Fan, Jean; Martelli, ed., Pier_Luigi (June 2024, Bioinformatics)

Abstract MotivationSpatial omics data demand computational analysis but many analysis tools have computational resource requirements that increase with the number of cells analyzed. This presents scalability challenges as researchers use spatial omics technologies to profile millions of cells. ResultsTo enhance the scalability of spatial omics data analysis, we developed a rasterization preprocessing framework called SEraster that aggregates cellular information into spatial pixels. We apply SEraster to both real and simulated spatial omics data prior to spatial variable gene expression analysis to demonstrate that such preprocessing can reduce computational resource requirements while maintaining high performance, including as compared to other down-sampling approaches. We further integrate SEraster with existing analysis tools to characterize cell-type spatial co-enrichment across length scales. Finally, we apply SEraster to enable analysis of a mouse pup spatial omics dataset with over a million cells to identify tissue-level and cell-type-specific spatially variable genes as well as spatially co-enriched cell types that recapitulate expected organ structures. Availability and implementationSEraster is implemented as an R package on GitHub (https://github.com/JEFworks-Lab/SEraster) with additional tutorials at https://JEF.works/SEraster.
more » « less
Cross-modality mapping using image varifolds to align tissue-scale atlases to molecular-scale measures with application to 2D brain sections

https://doi.org/10.1038/s41467-024-47883-4

Stouffer, Kaitlin M.; Trouvé, Alain; Younes, Laurent; Kunst, Michael; Ng, Lydia; Zeng, Hongkui; Anant, Manjari; Fan, Jean; Kim, Yongsoo; Chen, Xiaoyin; et al (April 2024, Nature Communications)

Abstract This paper explicates a solution to building correspondences between molecular-scale transcriptomics and tissue-scale atlases. This problem arises in atlas construction and cross-specimen/technology alignment where specimens per emerging technology remain sparse and conventional image representations cannot efficiently model the high dimensions from subcellular detection of thousands of genes. We address these challenges by representing spatial transcriptomics data as generalized functions encoding position and high-dimensional feature (gene, cell type) identity. We map onto low-dimensional atlas ontologies by modeling regions as homogeneous random fields with unknown transcriptomic feature distribution. We solve simultaneously for the minimizing geodesic diffeomorphism of coordinates through LDDMM and for these latent feature densities. We map tissue-scale mouse brain atlases to gene-based and cell-based transcriptomics data from MERFISH and BARseq technologies and to histopathology and cross-species atlases to illustrate integration of diverse molecular and cellular datasets into a single coordinate system as a means of comparison and further atlas construction.
more » « less
STalign: Alignment of spatial transcriptomics data using diffeomorphic metric mapping

https://doi.org/10.1038/s41467-023-43915-7

Clifton, Kalen; Anant, Manjari; Aihara, Gohta; Atta, Lyla; Aimiuwu, Osagie K.; Kebschull, Justus M.; Miller, Michael I.; Tward, Daniel; Fan, Jean (December 2023, Nature Communications)

Abstract Spatial transcriptomics (ST) technologies enable high throughput gene expression characterization within thin tissue sections. However, comparing spatial observations across sections, samples, and technologies remains challenging. To address this challenge, we develop STalign to align ST datasets in a manner that accounts for partially matched tissue sections and other local non-linear distortions using diffeomorphic metric mapping. We apply STalign to align ST datasets within and across technologies as well as to align ST datasets to a 3D common coordinate framework. We show that STalign achieves high gene expression and cell-type correspondence across matched spatial locations that is significantly improved over landmark-based affine alignments. Applying STalign to align ST datasets of the mouse brain to the 3D common coordinate framework from the Allen Brain Atlas, we highlight how STalign can be used to lift over brain region annotations and enable the interrogation of compositional heterogeneity across anatomical structures. STalign is available as an open-source Python toolkit athttps://github.com/JEFworks-Lab/STalignand as Supplementary Software with additional documentation and tutorials available athttps://jef.works/STalign.
more » « less
VeloViz: RNA velocity-informed embeddings for visualizing cellular trajectories

https://doi.org/10.1093/bioinformatics/btab653

Atta, Lyla; Sahoo, Arpan; Fan, Jean (September 2021, Bioinformatics)
Mathelier, Anthony (Ed.)
Abstract Motivation Single-cell transcriptomics profiling technologies enable genome-wide gene expression measurements in individual cells but can currently only provide a static snapshot of cellular transcriptional states. RNA velocity analysis can help infer cell state changes using such single-cell transcriptomics data. To interpret these cell state changes inferred from RNA velocity analysis as part of underlying cellular trajectories, current approaches rely on visualization with principal components, t-distributed stochastic neighbor embedding and other 2D embeddings derived from the observed single-cell transcriptional states. However, these 2D embeddings can yield different representations of the underlying cellular trajectories, hindering the interpretation of cell state changes. Results We developed VeloViz to create RNA velocity-informed 2D and 3D embeddings from single-cell transcriptomics data. Using both real and simulated data, we demonstrate that VeloViz embeddings are able to capture underlying cellular trajectories across diverse trajectory topologies, even when intermediate cell states may be missing. By considering the predicted future transcriptional states from RNA velocity analysis, VeloViz can help visualize a more reliable representation of underlying cellular trajectories. Availability and implementation Source code is available on GitHub (https://github.com/JEFworks-Lab/veloviz) and Bioconductor (https://bioconductor.org/packages/veloviz) with additional tutorials at https://JEF.works/veloviz/. Datasets used can be found on Zenodo (https://doi.org/10.5281/zenodo.4632471). Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Computational challenges and opportunities in spatially resolved transcriptomic data analysis

https://doi.org/10.1038/s41467-021-25557-9

Atta, Lyla; Fan, Jean (September 2021, Nature Communications)
Characterizing spatial gene expression heterogeneity in spatially resolved single-cell transcriptomic data with nonuniform cellular densities

https://doi.org/10.1101/gr.271288.120

Miller, Brendan F.; Bambah-Mukku, Dhananjay; Dulac, Catherine; Zhuang, Xiaowei; Fan, Jean (October 2021, Genome Research)

Recent technological advances have enabled spatially resolved measurements of expression profiles for hundreds to thousands of genes in fixed tissues at single-cell resolution. However, scalable computational analysis methods able to take into consideration the inherent 3D spatial organization of cell types and nonuniform cellular densities within tissues are still lacking. To address this, we developed MERINGUE, a computational framework based on spatial autocorrelation and cross-correlation analysis to identify genes with spatially heterogeneous expression patterns, infer putative cell–cell communication, and perform spatially informed cell clustering in 2D and 3D in a density-agnostic manner using spatially resolved transcriptomic data. We applied MERINGUE to a variety of spatially resolved transcriptomic data sets including multiplexed error-robust fluorescence in situ hybridization (MERFISH), spatial transcriptomics, Slide-seq, and aligned in situ hybridization (ISH) data. We anticipate that such statistical analysis of spatially resolved transcriptomic data will facilitate our understanding of the interplay between cell state and spatial organization in tissue development and disease.
more » « less
Full Text Available
Single cell biology—a Keystone Symposia report

https://doi.org/10.1111/nyas.14692

Cable, Jennifer; Elowitz, Michael B.; Domingos, Ana I.; Habib, Naomi; Itzkovitz, Shalev; Hamidzada, Homaira; Balzer, Michael S.; Yanai, Itai; Liberali, Prisca; Whited, Jessica; et al (December 2021, Annals of the New York Academy of Sciences)

Full Text Available
Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis

https://doi.org/10.1038/nmeth.3734

Fan, Jean; Salathia, Neeraj; Liu, Rui; Kaeser, Gwendolyn E; Yung, Yun C; Herman, Joseph L; Kaper, Fiona; Fan, Jian-Bing; Zhang, Kun; Chun, Jerold; et al (January 2016, Nature Methods)

Full Text Available

Search for: All records