skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: From local to global gene co-expression estimation using single-cell RNA-seq data
ABSTRACT In genomics studies, the investigation of gene relationships often brings important biological insights. Currently, the large heterogeneous datasets impose new challenges for statisticians because gene relationships are often local. They change from one sample point to another, may only exist in a subset of the sample, and can be nonlinear or even nonmonotone. Most previous dependence measures do not specifically target local dependence relationships, and the ones that do are computationally costly. In this paper, we explore a state-of-the-art network estimation technique that characterizes gene relationships at the single cell level, under the name of cell-specific gene networks. We first show that averaging the cell-specific gene relationship over a population gives a novel univariate dependence measure, the averaged Local Density Gap (aLDG), that accumulates local dependence and can detect any nonlinear, nonmonotone relationship. Together with a consistent nonparametric estimator, we establish its robustness on both the population and empirical levels. Then, we show that averaging the cell-specific gene relationship over mini-batches determined by some external structure information (eg, spatial or temporal factor) better highlights meaningful local structure change points. We explore the application of aLDG and its minibatch variant in many scenarios, including pairwise gene relationship estimation, bifurcating point detection in cell trajectory, and spatial transcriptomics structure visualization. Both simulations and real data analysis show that aLDG outperforms existing ones.  more » « less
Award ID(s):
2015492
PAR ID:
10494809
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
80
Issue:
1
ISSN:
0006-341X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract A rich body of knowledge links biodiversity to ecosystem functioning (BEF), but it is primarily focused on small scales. We review the current theory and identify six expectations for scale dependence in the BEF relationship: (1) a nonlinear change in the slope of the BEF relationship with spatial scale; (2) a scale‐dependent relationship between ecosystem stability and spatial extent; (3) coexistence within and among sites will result in a positive BEF relationship at larger scales; (4) temporal autocorrelation in environmental variability affects species turnover and thus the change in BEF slope with scale; (5) connectivity in metacommunities generates nonlinear BEF and stability relationships by affecting population  synchrony at local and regional scales; (6) spatial scaling in food web structure and diversity will generate scale dependence in ecosystem functioning. We suggest directions for synthesis that combine approaches in metaecosystem and metacommunity ecology and integrate cross‐scale feedbacks. Tests of this theory may combine remote sensing with a generation of networked experiments that assess effects at multiple scales. We also show how anthropogenic land cover change may alter the scaling of the BEF relationship. New research on the role of scale in BEF will guide policy linking the goals of managing biodiversity and ecosystems. 
    more » « less
  2. Small area estimation models are critical for dissemination and understanding of important population characteristics within sub-domains that often have limited sample size. The classic Fay-Herriot model is perhaps the most widely used approach to generate such estimates. However, a limiting assumption of this approach is that the latent true population quantity has a linear relationship with the given covariates. Through the use of random weight neural networks, we develop a Bayesian hierarchical extension of this framework that allows for estimation of nonlinear relationships between the true population quantity and the covariates. We illustrate our approach through an empirical simulation study as well as an analysis of median household income for census tracts in the state of California. 
    more » « less
  3. Abstract BackgroundSingle-cell RNA-sequencing (scRNA-seq) technologies allow for the study of gene expression in individual cells. Often, it is of interest to understand how transcriptional activity is associated with cell-specific covariates, such as cell type, genotype, or measures of cell health. Traditional approaches for this type of association mapping assume independence between the outcome variables (or genes), and perform a separate regression for each. However, these methods are computationally costly and ignore the substantial correlation structure of gene expression. Furthermore, count-based scRNA-seq data pose challenges for traditional models based on Gaussian assumptions. ResultsWe aim to resolve these issues by developing a reduced-rank regression model that identifies low-dimensional linear associations between a large number of cell-specific covariates and high-dimensional gene expression readouts. Our probabilistic model uses a Poisson likelihood in order to account for the unique structure of scRNA-seq counts. We demonstrate the performance of our model using simulations, and we apply our model to a scRNA-seq dataset, a spatial gene expression dataset, and a bulk RNA-seq dataset to show its behavior in three distinct analyses. ConclusionWe show that our statistical modeling approach, which is based on reduced-rank regression, captures associations between gene expression and cell- and sample-specific covariates by leveraging low-dimensional representations of transcriptional states. 
    more » « less
  4. This study addresses COVID-19 testing as a nonlinear sampling problem, aiming to uncover the dependence of the true infection count in the population on COVID-19 testing metrics such as testing volume and positivity rates. Employing an artificial neural network, we explore the relationship among daily confirmed case counts, testing data, population statistics, and the actual daily case count. The trained artificial neural network undergoes testing in in-sample, out-of-sample, and several hypothetical scenarios. A substantial focus of this paper lies in the estimation of the daily true case count, which serves as the output set of our training process. To achieve this, we implement a regularized backcasting technique that utilizes death counts and the infection fatality ratio (IFR), as the death statistics and serological surveys (providing the IFR) as more reliable COVID-19 data sources. Addressing the impact of factors such as age distribution, vaccination, and emerging variants on the IFR time series is a pivotal aspect of our analysis. We expect our study to enhance our understanding of the genuine implications of the COVID-19 pandemic, subsequently benefiting mitigation strategies. 
    more » « less
  5. Rosen, D (Ed.)
    This paper proposes a new test for a change point in the mean of high-dimensional data based on the spatial sign and self-normalization. The test is easy to implement with no tuning parameters, robust to heavy-tailedness and theoretically justified with both fixed-and sequential asymptotics under both null and alternatives, where n is the sample size. We demonstrate that the fixed-n asymptotics provide a better approximation to the finite sample distribution and thus should be preferred in both testing and testing-based estimation. To estimate the number and locations when multiple change-points are present, we propose to combine the p-value under the fixed-n asymptotics with the seeded binary segmentation (SBS) algorithm. Through numerical experiments, we show that the spatial sign based procedures are robust with respect to the heavy-tailedness and strong coordinate-wise dependence, whereas their non-robust counterparts proposed in Wang et al. (2022) [28] appear to under-perform. A real data example is also provided to illustrate the robustness and broad applicability of the proposed test and its corresponding estimation algorithm. 
    more » « less