NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Protocol Compliance in Popular RTC Applications

https://doi.org/10.1145/3730567.3764438

Chen, Peiqing; Qiu, Peng; Liu, Zaoxing (October 2025, ACM Internet Measurement Conference (IMC))

Free, publicly-accessible full text available October 28, 2026
Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images

Zhu, Sichen; Zhu, Yuchen; Tao, Molei; Qiu, Peng (April 2025, ICLR)

Free, publicly-accessible full text available April 24, 2026
Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images

Zhu, Sichen; Zhu, Yuchen; Tao, Molei; Qiu, Peng (April 2025, ICLR)

Free, publicly-accessible full text available April 24, 2026
scDisInFact: disentangled learning for integration and prediction of multi-batch multi-condition single-cell RNA-sequencing data

https://doi.org/10.1038/s41467-024-45227-w

Zhang, Ziqi; Zhao, Xinye; Bindra, Mehak; Qiu, Peng; Zhang, Xiuwei (December 2024, Nature Communications)

Abstract Single-cell RNA-sequencing (scRNA-seq) has been widely used for disease studies, where sample batches are collected from donors under different conditions including demographic groups, disease stages, and drug treatments. It is worth noting that the differences among sample batches in such a study are a mixture of technical confounders caused by batch effect and biological variations caused by condition effect. However, current batch effect removal methods often eliminate both technical batch effect and meaningful condition effect, while perturbation prediction methods solely focus on condition effect, resulting in inaccurate gene expression predictions due to unaccounted batch effect. Here we introduce scDisInFact, a deep learning framework that models both batch effect and condition effect in scRNA-seq data. scDisInFact learns latent factors that disentangle condition effect from batch effect, enabling it to simultaneously perform three tasks: batch effect removal, condition-associated key gene detection, and perturbation prediction. We evaluate scDisInFact on both simulated and real datasets, and compare its performance with baseline methods for each task. Our results demonstrate that scDisInFact outperforms existing methods that focus on individual tasks, providing a more comprehensive and accurate approach for integrating and predicting multi-batch multi-condition single-cell RNA-sequencing data.
more » « less
Full Text Available
Gene representation bias in spatial transcriptomics

https://doi.org/10.1142/S0219720024500070

Li, Xinling; Qiu, Peng (June 2024, Journal of Bioinformatics and Computational Biology)

For sequencing-based spatial transcriptomics data, the gene-spot count matrix is highly sparse. This feature is similar to scRNA-seq. The goal of this paper is to identify whether there exist genes that are frequently under-detected in Visium compared to bulk RNA-seq, and the underlying potential mechanism of under-detection in Visium. We collected paired Visium and bulk RNA-seq data for 28 human samples and 19 mouse samples, which covered diverse tissue sources. We compared the two data types and observed that there indeed exists a collection of genes frequently under-detected in Visium compared to bulk RNA-seq. We performed a motif search to examine the last 350 bp of the frequently under-detected genes, and we observed that the poly (T) motif was significantly enriched in genes identified from both human and mouse data, which matches with our previous finding about frequently under-detected genes in scRNA-seq. We hypothesized that the poly (T) motif may be able to form a hairpin structure with the poly (A) tails of their mRNA transcripts, making it difficult for their mRNA transcripts to be captured during Visium library preparation.
more » « less
Full Text Available
Quantifying the clusterness and trajectoriness of single-cell RNA-seq data

https://doi.org/10.1371/journal.pcbi.1011866

Lim, Hong Seo; Qiu, Peng (February 2024, PLOS Computational Biology)
Zhang, Shihua (Ed.)
Among existing computational algorithms for single-cell RNA-seq analysis, clustering and trajectory inference are two major types of analysis that are routinely applied. For a given dataset, clustering and trajectory inference can generate vastly different visualizations that lead to very different interpretations of the data. To address this issue, we propose multiple scores to quantify the “clusterness” and “trajectoriness” of single-cell RNA-seq data, in other words, whether the data looks like a collection of distinct clusters or a continuum of progression trajectory. The scores we introduce are based on pairwise distance distribution, persistent homology, vector magnitude, Ripley’s K, and degrees of connectivity. Using simulated datasets, we demonstrate that the proposed scores are able to effectively differentiate between cluster-like data and trajectory-like data. Using real single-cell RNA-seq datasets, we demonstrate the scores can serve as indicators of whether clustering analysis or trajectory inference is a more appropriate choice for biological interpretation of the data.
more » « less
Full Text Available
Characterization of Expression-Based Gene Clusters Gives Insights into Variation in Patient Response to Cancer Therapies

https://doi.org/10.1177/11769351241271560

Neary, Bridget; Qiu, Peng (September 2024, Cancer Informatics)

Background:Transcriptomics can reveal much about cellular activity, and cancer transcriptomics have been useful in investigating tumor cell behaviors. Patterns in transcriptome-wide gene expression can be used to investigate biological mechanisms and pathways that can explain the variability in patient response to cancer therapies. Methods:We identified gene expression patterns related to patient drug response by clustering tumor gene expression data and selecting from the resulting gene clusters those where expression of cluster genes was related to patient survival on specific drugs. We then investigated these gene clusters for biological meaning using several approaches, including identifying common genomic locations and transcription factors whose targets were enriched in these clusters and performing survival analyses to support these candidate transcription factor-drug relationships. Results:We identified gene clusters related to drug-specific survival, and through these, we were able to associate observed variations in patient drug response to specific known biological phenomena. Specifically, our analysis implicated 2 stem cell-related transcription factors, HOXB4 and SALL4, in poor response to temozolomide in brain cancers. In addition, expression of SNRNP70 and its targets were implicated in cetuximab response by 3 different analyses, although the mechanism remains unclear. We also found evidence that 2 cancer-related chromosomal structural changes may impact drug efficacy. Conclusion:In this study, we present the gene clusters identified and the results of our systematic analysis linking drug efficacy to specific transcription factors, which are rich sources of potential mechanistic relationships impacting patient outcomes. We also highlight the most promising of these results, which were supported by multiple analyses and by previous research. We report these findings as promising avenues for independent validation and further research into cancer treatments and patient response.
more » « less
Gene representation in scRNA-seq is correlated with common motifs at the 3′ end of transcripts

https://doi.org/10.3389/fbinf.2023.1120290

Li, Xinling; Gibson, Greg; Qiu, Peng (May 2023, Frontiers in Bioinformatics)

One important characteristic of single-cell RNA sequencing (scRNA-seq) data is its high sparsity, where the gene-cell count data matrix contains high proportion of zeros. The sparsity has motivated widespread discussions on dropouts and missing data, as well as imputation algorithms of scRNA-seq analysis. Here, we aim to investigate whether there exist genes that are more prone to be under-detected in scRNA-seq, and if yes, what commonalities those genes may share. From public data sources, we gathered paired bulk RNA-seq and scRNA-seq data from 53 human samples, which were generated in diverse biological contexts. We derived pseudo-bulk gene expression by averaging the scRNA-seq data across cells. Comparisons of the paired bulk and pseudo-bulk gene expression profiles revealed that there indeed exists a collection of genes that are frequently under-detected in scRNA-seq compared to bulk RNA-seq. This result was robust to randomization when unpaired bulk and pseudo-bulk gene expression profiles were compared. We performed motif search to the last 350 bp of the identified genes, and observed an enrichment of poly(T) motif. The poly(T) motif toward the tails of those genes may be able to form hairpin structures with the poly(A) tails of their mRNA transcripts, making it difficult for their mRNA transcripts to be captured during scRNA-seq library preparation, which is a mechanistic conjecture of why certain genes may be more prone to be under-detected in scRNA-seq.
more » « less
Full Text Available
Quantifying Cell-Type-Specific Differences of Single-Cell Datasets Using Uniform Manifold Approximation and Projection for Dimension Reduction and Shapley Additive exPlanations

https://doi.org/10.1089/cmb.2022.0366

Lim, Hong Seo; Qiu, Peng (April 2023, Journal of Computational Biology)

Full Text Available
Domain adaptation for supervised integration of scRNA-seq data

https://doi.org/10.1038/s42003-023-04668-7

Sun, Yutong; Qiu, Peng (March 2023, Communications Biology)

Abstract Large-scale scRNA-seq studies typically generate data in batches, which often induce nontrivial batch effects that need to be corrected. Given the global efforts for building cell atlases and the increasing number of annotated scRNA-seq datasets accumulated, we propose a supervised strategy for scRNA-seq data integration called SIDA (SupervisedIntegration usingDomainAdaptation), which uses the cell type annotations to guide the integration of diverse batches. The supervised strategy is based on domain adaptation that was initially proposed in the computer vision field. We demonstrate that SIDA is able to generate comprehensive reference datasets that lead to improved accuracy in automated cell type mapping analyses.
more » « less

« Prev Next »

Search for: All records