skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Bayesian Spectral Graph Denoising with Smoothness Prior
Here we consider the problem of denoising features associated to complex data, modeled as signals on a graph, via a smoothness prior. This is motivated in part by settings such as single-cell RNA where the data is very high-dimensional, but its structure can be captured via an affinity graph. This allows us to utilize ideas from graph signal processing. In particular, we present algorithms for the cases where the signal is perturbed by Gaussian noise, dropout, and uniformly distributed noise. The signals are assumed to follow a prior distribution defined in the frequency domain which favors signals which are smooth across the edges of the graph. By pairing this prior distribution with our three models of noise generation, we propose Maximum A Posteriori (M.A.P.) estimates of the true signal in the presence of noisy data and provide algorithms for computing the M.A.P. Finally, we demonstrate the algorithms’ ability to effectively restore signals from white noise on image data and from severe dropout in single-cell RNA sequence data.  more » « less
Award ID(s):
2327211
PAR ID:
10536540
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE
Date Published:
ISBN:
979-8-3503-6929-8
Page Range / eLocation ID:
1 to 6
Subject(s) / Keyword(s):
RNA Gaussian noise Noise reduction Signal processing algorithms White noise Signal processing Data models
Format(s):
Medium: X
Location:
Princeton, NJ, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Nie, Qing (Ed.)
    Single-cell RNA sequencing technology provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses in single-cell transcriptomic studies. We propose a new method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and ten existing imputation methods to eight single-cell transcriptomic datasets and compared their performance. Our results demonstrated that G2S3 has superior overall performance in recovering gene expression, identifying cell subtypes, reconstructing cell trajectories, identifying differentially expressed genes, and recovering gene regulatory and correlation relationships. Moreover, G2S3 is computationally efficient for imputation in large-scale single-cell transcriptomic datasets. 
    more » « less
  2. Abstract Recent advances in biochemistry and single-cell RNA sequencing (scRNA-seq) have allowed us to monitor the biological systems at the single-cell resolution. However, the low capture of mRNA material within individual cells often leads to inaccurate quantification of genetic material. Consequently, a significant amount of expression values are reported as missing, which are often referred to as dropouts. To overcome this challenge, we develop a novel imputation method, named single-cell Imputation via Subspace Regression (scISR), that can reliably recover the dropout values of scRNA-seq data. The scISR method first uses a hypothesis-testing technique to identify zero-valued entries that are most likely affected by dropout events and then estimates the dropout values using a subspace regression model. Our comprehensive evaluation using 25 publicly available scRNA-seq datasets and various simulation scenarios against five state-of-the-art methods demonstrates that scISR is better than other imputation methods in recovering scRNA-seq expression profiles via imputation. scISR consistently improves the quality of cluster analysis regardless of dropout rates, normalization techniques, and quantification schemes. The source code of scISR can be found on GitHub at https://github.com/duct317/scISR . 
    more » « less
  3. The accuracy of biosensor ratio imaging is limited by signal/noise. Signals can be weak when biosensor concentrations must be limited to avoid cell perturbation. This can be especially problematic in imaging of low volume regions, e.g., along the cell edge. The cell edge is an important imaging target in studies of cell motility. We show how the division of fluorescence intensities with low signal-to-noise at the cell edge creates specific artifacts due to background subtraction and division by small numbers, and that simply improving the accuracy of background subtraction cannot address these issues. We propose a new approach where, rather than simply subtracting background from the numerator and denominator, we subtract a noise correction factor (NCF) from the numerator only. This NCF can be derived from the analysis of noise distribution in the background near the cell edge or from ratio measurements in the cell regions where signal-to-noise is high. We test the performance of the method first by examining two noninteracting fluorophores distributed evenly in cells. This generated a uniform ratio that could provide a ground truth. We then analyzed actual protein activities reported by a single chain biosensor for the guanine exchange factor (GEF) Asef, and a dual chain biosensor for the GTPase Cdc42. The reduction of edge artifacts revealed persistent Asef activity in a narrow band (∼640 nm wide) immediately adjacent to the cell edge. For Cdc42, the NCF method revealed an artifact that would have been obscured by traditional background subtraction approaches. 
    more » « less
  4. Baseband processing algorithms often require knowledge of the noise power, signal power, or signal-to-noise ratio (SNR). In practice, these parameters are typically unknown and must be estimated. Furthermore, the mean-square error (MSE) is a desirable metric to be minimized in a variety of estimation and signal recovery algorithms. However, the MSE cannot directly be used as it depends on the true signal that is generally unknown to the estimator. In this paper, we propose novel blind estimators for the average noise power, average receive signal power, SNR, and MSE. The proposed estimators can be computed at low complexity and solely rely on the large-dimensional and sparse nature of the processed data. Our estimators can be used (i) to quickly track some of the key system parameters while avoiding additional pilot overhead, (ii) to design low-complexity nonparametric algorithms that require such quantities, and (iii) to accelerate more sophisticated estimation or recovery algorithms. We conduct a theoretical analysis of the proposed estimators for a Bernoulli complex Gaussian (BCG) prior, and we demonstrate their efficacy via synthetic experiments. We also provide three application examples that deviate from the BCG prior in millimeter-wave multi-antenna and cell-free wireless systems for which we develop nonparametric denoising algorithms that improve channel-estimation accuracy with a performance comparable to denoisers that assume perfect knowledge of the system parameters. 
    more » « less
  5. Single-cell RNA-sequencing (scRNA-seq) enables high throughput measurement of RNA expression in individual cells. Due to technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells. These zero counts, or dropout events, complicate the analysis of scRNA-seq data using standard analysis methods developed for bulk RNA-seq data. Current scRNA-seq analysis methods typically overcome dropout by combining information across cells, leveraging the observation that cells generally occupy a small number of RNA expression states. We introduce netNMF-sc, an algorithm for scRNA-seq analysis that leverages information across both cells and genes. netNMF-sc combines network-regularized non-negative matrix factorization with a procedure for handling zero inflation in transcript count matrices. The matrix factorization results in a low-dimensional representation of the transcript count matrix, which imputes gene abundance for both zero and non-zero entries and can be used to cluster cells. The network regularization leverages prior knowledge of gene-gene interactions, encouraging pairs of genes with known interactions to be close in the low-dimensional representation. We show that netNMF-sc outperforms existing methods on simulated and real scRNA-seq data, with increasing advantage at higher dropout rates (e.g. above 60%). Furthermore, we show that the results from netNMF-sc -- including estimation of gene-gene covariance -- are robust to choice of network, with more representative networks leading to greater performance gains. 
    more » « less