NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Joint analysis of expression levels and histological images identifies genes associated with tissue morphology

https://doi.org/10.1038/s41467-021-21727-x

Ash, Jordan_T; Darnell, Gregory; Munro, Daniel; Engelhardt, Barbara_E (March 2021, Nature Communications)

Abstract Histopathological images are used to characterize complex phenotypes such as tumor stage. Our goal is to associate features of stained tissue images with high-dimensional genomic markers. We use convolutional autoencoders and sparse canonical correlation analysis (CCA) on paired histological images and bulk gene expression to identify subsets of genes whose expression levels in a tissue sample correlate with subsets of morphological features from the corresponding sample image. We apply our approach, ImageCCA, to two TCGA data sets, and find gene sets associated with the structure of the extracellular matrix and cell wall infrastructure, implicating uncharacterized genes in extracellular processes. We find sets of genes associated with specific cell types, including neuronal cells and cells of the immune system. We apply ImageCCA to the GTEx v6 data, and find image features that capture population variation in thyroid and in colon tissues associated with genetic variants (image morphology QTLs, or imQTLs), suggesting that genetic variation regulates population variation in tissue morphological traits.
more » « less
Optimal marker gene selection for cell type discrimination in single cell analyses

https://doi.org/10.1038/s41467-021-21453-4

Dumitrascu, Bianca; Villar, Soledad; Mixon, Dustin G.; Engelhardt, Barbara E. (February 2021, Nature Communications)

Abstract Single-cell technologies characterize complex cell populations across multiple data modalities at unprecedented scale and resolution. Multi-omic data for single cell gene expression, in situ hybridization, or single cell chromatin states are increasingly available across diverse tissue types. When isolating specific cell types from a sample of disassociated cells or performing in situ sequencing in collections of heterogeneous cells, one challenging task is to select a small set of informative markers that robustly enable the identification and discrimination of specific cell types or cell states as precisely as possible. Given single cell RNA-seq data and a set of cellular labels to discriminate, scGeneFit selects gene markers that jointly optimize cell label recovery using label-aware compressive classification methods. This results in a substantially more robust and less redundant set of markers than existing methods, most of which identify markers that separate each cell label from the rest. When applied to a data set given a hierarchy of cell types as labels, the markers found by our method improves the recovery of the cell type hierarchy with fewer markers than existing methods using a computationally efficient and principled optimization.
more » « less
Brain kernel: A new spatial covariance function for fMRI data

https://doi.org/10.1016/j.neuroimage.2021.118580

Wu, Anqi; Nastase, Samuel A.; Baldassano, Christopher A.; Turk-Browne, Nicholas B.; Norman, Kenneth A.; Engelhardt, Barbara E.; Pillow, Jonathan W. (December 2021, NeuroImage)

Full Text Available
A self-exciting point process to study multicellular spatial signaling patterns

https://doi.org/10.1073/pnas.2026123118

Verma, Archit; Jena, Siddhartha G.; Isakov, Danielle R.; Aoki, Kazuhiro; Toettcher, Jared E.; Engelhardt, Barbara E. (August 2021, Proceedings of the National Academy of Sciences)

Multicellular organisms rely on spatial signaling among cells to drive their organization, development, and response to stimuli. Several models have been proposed to capture the behavior of spatial signaling in multicellular systems, but existing approaches fail to capture both the autonomous behavior of single cells and the interactions of a cell with its neighbors simultaneously. We propose a spatiotemporal model of dynamic cell signaling based on Hawkes processes—self-exciting point processes—that model the signaling processes within a cell and spatial couplings between cells. With this cellular point process (CPP), we capture both the single-cell pathway activation rate and the magnitude and duration of signaling between cells relative to their spatial location. Furthermore, our model captures tissues composed of heterogeneous cell types with different bursting rates and signaling behaviors across multiple signaling proteins. We apply our model to epithelial cell systems that exhibit a range of autonomous and spatial signaling behaviors basally and under pharmacological exposure. Our model identifies known drug-induced signaling deficits, characterizes signaling changes across a wound front, and generalizes to multichannel observations.
more » « less
Full Text Available
Causal network inference from gene transcriptional time-series response to glucocorticoids

https://doi.org/10.1371/journal.pcbi.1008223

Lu, Jonathan; Dumitrascu, Bianca; McDowell, Ian C.; Jo, Brian; Barrera, Alejandro; Hong, Linda K.; Leichter, Sarah M.; Reddy, Timothy E.; Engelhardt, Barbara E. (January 2021, PLOS Computational Biology)
Leslie, Christina S. (Ed.)
Gene regulatory network inference is essential to uncover complex relationships among gene pathways and inform downstream experiments, ultimately enabling regulatory network re-engineering. Network inference from transcriptional time-series data requires accurate, interpretable, and efficient determination of causal relationships among thousands of genes. Here, we develop Bootstrap Elastic net regression from Time Series (BETS), a statistical framework based on Granger causality for the recovery of a directed gene network from transcriptional time-series data. BETS uses elastic net regression and stability selection from bootstrapped samples to infer causal relationships among genes. BETS is highly parallelized, enabling efficient analysis of large transcriptional data sets. We show competitive accuracy on a community benchmark, the DREAM4 100-gene network inference challenge, where BETS is one of the fastest among methods of similar performance and additionally infers whether causal effects are activating or inhibitory. We apply BETS to transcriptional time-series data of differentially-expressed genes from A549 cells exposed to glucocorticoids over a period of 12 hours. We identify a network of 2768 genes and 31,945 directed edges (FDR ≤ 0.2). We validate inferred causal network edges using two external data sources: Overexpression experiments on the same glucocorticoid system, and genetic variants associated with inferred edges in primary lung tissue in the Genotype-Tissue Expression (GTEx) v6 project. BETS is available as an open source software package at https://github.com/lujonathanh/BETS .
more » « less
Full Text Available
Sparse multi-output Gaussian processes for online medical time series prediction

https://doi.org/10.1186/s12911-020-1069-4

Cheng, Li-Fang; Dumitrascu, Bianca; Darnell, Gregory; Chivers, Corey; Draugelis, Michael; Li, Kai; Engelhardt, Barbara E (December 2020, BMC Medical Informatics and Decision Making)

Full Text Available
Measuring the predictability of life outcomes with a scientific mass collaboration

https://doi.org/10.1073/pnas.1915006117

Salganik, Matthew J.; Lundberg, Ian; Kindel, Alexander T.; Ahearn, Caitlin E.; Al-Ghoneim, Khaled; Almaatouq, Abdullah; Altschul, Drew M.; Brand, Jennie E.; Carnegie, Nicole Bohme; Compton, Ryan James; et al (April 2020, Proceedings of the National Academy of Sciences)

How predictable are life trajectories? We investigated this question with a scientific mass collaboration using the common task method; 160 teams built predictive models for six life outcomes using data from the Fragile Families and Child Wellbeing Study, a high-quality birth cohort study. Despite using a rich dataset and applying machine-learning methods optimized for prediction, the best predictions were not very accurate and were only slightly better than those from a simple benchmark model. Within each outcome, prediction error was strongly associated with the family being predicted and weakly associated with the technique used to generate the prediction. Overall, these results suggest practical limits to the predictability of life outcomes in some settings and illustrate the value of mass collaborations in the social sciences.
more » « less
Full Text Available
Defining admissible rewards for high-confidence policy evaluation in batch reinforcement learning

https://doi.org/10.1145/3368555.3384450

Prasad, Niranjani; Engelhardt, Barbara; Doshi-Velez, Finale (April 2020, ACM Conference on Health, Inference, and Learning)

Full Text Available
The Human Tumor Atlas Network: Charting Tumor Transitions across Space and Time at Single-Cell Resolution

https://doi.org/10.1016/j.cell.2020.03.053

Rozenblatt-Rosen, Orit; Regev, Aviv; Oberdoerffer, Philipp; Nawy, Tal; Hupalowska, Anna; Rood, Jennifer E.; Ashenberg, Orr; Cerami, Ethan; Coffey, Robert J.; Demir, Emek; et al (April 2020, Cell)

Full Text Available
Patient-Specific Effects of Medication Using Latent Force Models with Gaussian Processes

Cheng, L-F; Dumitrascu, B; Zhang, MM; Chivers, C; Draugelis, ME; Li, K; Engelhardt, BE. (January 2020, Proeedings of the International Workshop on Artificial Intelligence and Statistics)

Full Text Available

« Prev Next »

Search for: All records