skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A regularized Bayesian Dirichlet-multinomial regression model for integrating single-cell-level omics and patient-level clinical study data
ABSTRACT The abundance of various cell types can vary significantly among patients with varying phenotypes and even those with the same phenotype. Recent scientific advancements provide mounting evidence that other clinical variables, such as age, gender, and lifestyle habits, can also influence the abundance of certain cell types. However, current methods for integrating single-cell-level omics data with clinical variables are inadequate. In this study, we propose a regularized Bayesian Dirichlet-multinomial regression framework to investigate the relationship between single-cell RNA sequencing data and patient-level clinical data. Additionally, the model employs a novel hierarchical tree structure to identify such relationships at different cell-type levels. Our model successfully uncovers significant associations between specific cell types and clinical variables across three distinct diseases: pulmonary fibrosis, COVID-19, and non-small cell lung cancer. This integrative analysis provides biological insights and could potentially inform clinical interventions for various diseases.  more » « less
Award ID(s):
2210912
PAR ID:
10569540
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
81
Issue:
1
ISSN:
0006-341X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Organelles play important roles in human health and disease, such as maintaining homeostasis, regulating growth and aging, and generating energy. Organelle diversity in cells not only exists between cell types but also between individual cells. Therefore, studying the distribution of organelles at the single-cell level is important to understand cellular function. Mesenchymal stem cells are multipotent cells that have been explored as a therapeutic method for treating a variety of diseases. Studying how organelles are structured in these cells can answer questions about their characteristics and potential. Herein, rapid multiplexed immunofluorescence (RapMIF) was performed to understand the spatial organization of 10 organelle proteins and the interactions between them in the bone marrow (BM) and umbilical cord (UC) mesenchymal stem cells (MSCs). Spatial correlations, colocalization, clustering, statistical tests, texture, and morphological analyses were conducted at the single cell level, shedding light onto the interrelations between the organelles and comparisons of the two MSC subtypes. Such analytics toolsets indicated that UC MSCs exhibited higher organelle expression and spatially spread distribution of mitochondria accompanied by several other organelles compared to BM MSCs. This data-driven single-cell approach provided by rapid subcellular proteomic imaging enables personalized stem cell therapeutics. 
    more » « less
  2. Abstract Bispecific antibodies (BsAbs) represent an emerging class of immunotherapy, but inefficiency in the current discovery has limited their broad clinical availability. Here we report a high throughput, agnostic, single-cell-based functional screening pipeline, comprising molecular and cell engineering for efficient generation of BsAb library cells, followed by functional interrogation at the single-cell level to identify and sort positive clones and downstream sequence identification and functionality characterization. Using a CD19xCD3 bispecific T cell engager (BiTE) as a model, we demonstrate that our single-cell platform possesses a high throughput screening efficiency of up to one and a half million variant library cells per run and can isolate rare functional clones at a low abundance of 0.008%. Using a complex CD19xCD3 BiTE-expressing cell library with approximately 22,300 unique variants comprising combinatorially varied scFvs, connecting linkers and VL/VH orientations, we have identified 98 unique clones, including extremely rare ones (~ 0.001% abundance). We also discovered BiTEs that exhibit novel properties and insights to design variable preferences for functionality. We expect our single-cell platform to not only increase the discovery efficiency of new immunotherapeutics, but also enable identifying generalizable design principles based on an in-depth understanding of the inter-relationships between sequence, structure, and function. 
    more » « less
  3. Abstract SummaryWith the advancements of high-throughput single-cell RNA-sequencing protocols, there has been a rapid increase in the tools available to perform an array of analyses on the gene expression data that results from such studies. For example, there exist methods for pseudo-time series analysis, differential cell usage, cell-type detection RNA-velocity in single cells, etc. Most analysis pipelines validate their results using known marker genes (which are not widely available for all types of analysis) and by using simulated data from gene-count-level simulators. Typically, the impact of using different read-alignment or unique molecular identifier (UMI) deduplication methods has not been widely explored. Assessments based on simulation tend to start at the level of assuming a simulated count matrix, ignoring the effect that different approaches for resolving UMI counts from the raw read data may produce. Here, we present minnow, a comprehensive sequence-level droplet-based single-cell RNA-sequencing (dscRNA-seq) experiment simulation framework. Minnow accounts for important sequence-level characteristics of experimental scRNA-seq datasets and models effects such as polymerase chain reaction amplification, cellular barcodes (CB) and UMI selection and sequence fragmentation and sequencing. It also closely matches the gene-level ambiguity characteristics that are observed in real scRNA-seq experiments. Using minnow, we explore the performance of some common processing pipelines to produce gene-by-cell count matrices from droplet-bases scRNA-seq data, demonstrate the effect that realistic levels of gene-level sequence ambiguity can have on accurate quantification and show a typical use-case of minnow in assessing the output generated by different quantification pipelines on the simulated experiment. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  4. Summary CRISPR genome engineering and single-cell RNA sequencing have accelerated biological discovery. Single-cell CRISPR screens unite these two technologies, linking genetic perturbations in individual cells to changes in gene expression and illuminating regulatory networks underlying diseases. Despite their promise, single-cell CRISPR screens present considerable statistical challenges. We demonstrate through theoretical and real data analyses that a standard method for estimation and inference in single-cell CRISPR screens—“thresholded regression”—exhibits attenuation bias and a bias-variance tradeoff as a function of an intrinsic, challenging-to-select tuning parameter. To overcome these difficulties, we introduce GLM-EIV (“GLM-based errors-in-variables”), a new method for single-cell CRISPR screen analysis. GLM-EIV extends the classical errors-in-variables model to responses and noisy predictors that are exponential family-distributed and potentially impacted by the same set of confounding variables. We develop a computational infrastructure to deploy GLM-EIV across hundreds of processors on clouds (e.g. Microsoft Azure) and high-performance clusters. Leveraging this infrastructure, we apply GLM-EIV to analyze two recent, large-scale, single-cell CRISPR screen datasets, yielding several new insights. 
    more » « less
  5. Abstract Lineage tracing technology using CRISPR/Cas9 genome editing has enabled simultaneous readouts of gene expressions and lineage barcodes in single cells, which allows for inference of cell lineage and cell types at the whole organism level. While most state-of-the-art methods for lineage reconstruction utilize only the lineage barcode data, methods that incorporate gene expressions are emerging. Effectively incorporating the gene expression data requires a reasonable model of how gene expression data changes along generations of divisions. Here, we present LinRace (Lineage Reconstruction with asymmetric cell division model), which integrates lineage barcode and gene expression data using asymmetric cell division model and infers cell lineages and ancestral cell states using Neighbor-Joining and maximum-likelihood heuristics. On both simulated and real data, LinRace outputs more accurate cell division trees than existing methods. With inferred ancestral states, LinRace can also show how a progenitor cell generates a large population of cells with various functionalities. 
    more » « less