NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Optimal marker gene selection for cell type discrimination in single cell analyses

https://doi.org/10.1038/s41467-021-21453-4

Dumitrascu, Bianca; Villar, Soledad; Mixon, Dustin G.; Engelhardt, Barbara E. (February 2021, Nature Communications)

Abstract Single-cell technologies characterize complex cell populations across multiple data modalities at unprecedented scale and resolution. Multi-omic data for single cell gene expression, in situ hybridization, or single cell chromatin states are increasingly available across diverse tissue types. When isolating specific cell types from a sample of disassociated cells or performing in situ sequencing in collections of heterogeneous cells, one challenging task is to select a small set of informative markers that robustly enable the identification and discrimination of specific cell types or cell states as precisely as possible. Given single cell RNA-seq data and a set of cellular labels to discriminate, scGeneFit selects gene markers that jointly optimize cell label recovery using label-aware compressive classification methods. This results in a substantially more robust and less redundant set of markers than existing methods, most of which identify markers that separate each cell label from the rest. When applied to a data set given a hierarchy of cell types as labels, the markers found by our method improves the recovery of the cell type hierarchy with fewer markers than existing methods using a computationally efficient and principled optimization.
more » « less
Sparse multi-output Gaussian processes for online medical time series prediction

https://doi.org/10.1186/s12911-020-1069-4

Cheng, Li-Fang; Dumitrascu, Bianca; Darnell, Gregory; Chivers, Corey; Draugelis, Michael; Li, Kai; Engelhardt, Barbara E (December 2020, BMC Medical Informatics and Decision Making)

Full Text Available
Causal network inference from gene transcriptional time-series response to glucocorticoids

https://doi.org/10.1371/journal.pcbi.1008223

Lu, Jonathan; Dumitrascu, Bianca; McDowell, Ian C.; Jo, Brian; Barrera, Alejandro; Hong, Linda K.; Leichter, Sarah M.; Reddy, Timothy E.; Engelhardt, Barbara E. (January 2021, PLOS Computational Biology)
Leslie, Christina S. (Ed.)
Gene regulatory network inference is essential to uncover complex relationships among gene pathways and inform downstream experiments, ultimately enabling regulatory network re-engineering. Network inference from transcriptional time-series data requires accurate, interpretable, and efficient determination of causal relationships among thousands of genes. Here, we develop Bootstrap Elastic net regression from Time Series (BETS), a statistical framework based on Granger causality for the recovery of a directed gene network from transcriptional time-series data. BETS uses elastic net regression and stability selection from bootstrapped samples to infer causal relationships among genes. BETS is highly parallelized, enabling efficient analysis of large transcriptional data sets. We show competitive accuracy on a community benchmark, the DREAM4 100-gene network inference challenge, where BETS is one of the fastest among methods of similar performance and additionally infers whether causal effects are activating or inhibitory. We apply BETS to transcriptional time-series data of differentially-expressed genes from A549 cells exposed to glucocorticoids over a period of 12 hours. We identify a network of 2768 genes and 31,945 directed edges (FDR ≤ 0.2). We validate inferred causal network edges using two external data sources: Overexpression experiments on the same glucocorticoid system, and genetic variants associated with inferred edges in primary lung tissue in the Genotype-Tissue Expression (GTEx) v6 project. BETS is available as an open source software package at https://github.com/lujonathanh/BETS .
more » « less
Full Text Available
Patient-Specific Effects of Medication Using Latent Force Models with Gaussian Processes

Cheng, Li-Fang; Dumitrascu, Bianca; Zhang, Michael; Chivers, Corey; Draugelis, Michael; Li, Kai; Engelhardt, Barbara E. (January 2020, Proceedings of the 23rdInternational Conference on Artificial Intelligence and Statistics (AISTATS) 2020, Palermo, Italy. PMLR)

Full Text Available
netNMF-sc: Leveraging gene-gene interactions for imputation and dimensionality reduction in single-cell expression analysis

Elyanow, Rebecca; Dumitrascu, Bianca; Engelhardt, Barbara E; Raphael, Benjamin J. (January 2019, Research in Computational Biology (RECOMB))

Single-cell RNA-sequencing (scRNA-seq) enables high throughput measurement of RNA expression in individual cells. Due to technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells. These zero counts, or dropout events, complicate the analysis of scRNA-seq data using standard analysis methods developed for bulk RNA-seq data. Current scRNA-seq analysis methods typically overcome dropout by combining information across cells, leveraging the observation that cells generally occupy a small number of RNA expression states. We introduce netNMF-sc, an algorithm for scRNA-seq analysis that leverages information across both cells and genes. netNMF-sc combines network-regularized non-negative matrix factorization with a procedure for handling zero inflation in transcript count matrices. The matrix factorization results in a low-dimensional representation of the transcript count matrix, which imputes gene abundance for both zero and non-zero entries and can be used to cluster cells. The network regularization leverages prior knowledge of gene-gene interactions, encouraging pairs of genes with known interactions to be close in the low-dimensional representation. We show that netNMF-sc outperforms existing methods on simulated and real scRNA-seq data, with increasing advantage at higher dropout rates (e.g. above 60%). Furthermore, we show that the results from netNMF-sc -- including estimation of gene-gene covariance -- are robust to choice of network, with more representative networks leading to greater performance gains.
more » « less
Full Text Available
Statistical tests for detecting variance effects in quantitative trait studies

https://doi.org/10.1093/bioinformatics/bty565

Dumitrascu, Bianca; Darnell, Gregory; Ayroles, Julien; Engelhardt, Barbara E; Hancock, John (July 2018, Bioinformatics)

Full Text Available
PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits

Dumitrascu, Bianca; Feng, Karen; Engelhardt, Barbara E. (January 2018, Neural Information Processing - Letters and Reviews)

We address the problem of regret minimization in logistic contextual bandits, where a learner decides among sequential actions or arms given their respective contexts to maximize binary rewards. Using a fast inference procedure with Pólya-Gamma distributed augmentation variables, we propose an improved version of Thompson Sampling, a Bayesian formulation of contextual bandits with near-optimal performance. Our approach, Pólya-Gamma augmented Thompson Sampling (PG-TS), achieves state-of-the-art performance on simulated and real data. PG-TS explores the action space efficiently and exploits high-reward arms, quickly converging to solutions of low regret. Its explicit estimation of the posterior distribution of the context feature covariance leads to substantial empirical gains over approximate approaches. PG-TS is the first approach to demonstrate the benefits of Pólya Gamma augmentation in bandits and to propose an efficient Gibbs sampler for approximating the analytically unsolvable integral of logistic contextual bandits.
more » « less
Full Text Available
Bayesian nonparametric discovery of isoforms and individual specific quantification

https://doi.org/10.1038/s41467-018-03402-w

Aguiar, Derek; Cheng, Li-Fang; Dumitrascu, Bianca; Mordelet, Fantine; Pai, Athma A.; Engelhardt, Barbara E. (December 2018, Nature Communications)

Full Text Available

Search for: All records