Abstract Although multiple high-performing epigenetic aging clocks exist, few are based directly on gene expression. Such transcriptomic aging clocks allow us to extract age-associated genes directly. However, most existing transcriptomic clocks model a subset of genes and are limited in their ability to predict novel biomarkers. With the growing popularity of single-cell sequencing, there is a need for robust single-cell transcriptomic aging clocks. Moreover, clocks have yet to be applied to investigate the elusive phenomenon of sex differences in aging. We introduce TimeFlies, a pan-cell-type scRNA-seq aging clock for theDrosophila melanogasterhead. TimeFlies uses deep learning to classify the donor age of cells based on genome-wide gene expression profiles. Using explainability methods, we identified key marker genes contributing to the classification, with lncRNAs showing up as highly enriched among predicted biomarkers. The top biomarker gene across cell types is lncRNA:roX1, a regulator of X chromosome dosage compensation, a pathway previously identified as a top biomarker of aging in the mouse brain. We validated this finding experimentally, showing a decrease in survival probability in the absence of roX1in vivo. Furthermore, we trained sex-specific TimeFlies clocks and noted significant differences in model predictions and explanations between male and female clocks, suggesting that different pathways drive aging in males and females. Graphical Abstract
more »
« less
MousiPLIER: A Mouse Pathway-Level Information Extractor Model
High-throughput gene expression profiling measures individual gene expression across conditions. However, genes are regulated in complex networks, not as individual entities, limiting the interpretability of gene expression data. Machine learning models that incorporate prior biological knowledge are a powerful tool to extract meaningful biology from gene expression data. Pathway-level information extractor (PLIER) is an unsupervised machine learning method that defines biological pathways by leveraging the vast amount of published transcriptomic data. PLIER converts gene expression data into known pathway gene sets, termed latent variables (LVs), to substantially reduce data dimensionality and improve interpretability. In the current study, we trained the first mouse PLIER model on 190,111 mouse brain RNA-sequencing samples, the greatest amount of training data ever used by PLIER. We then validated the mousiPLIER approach in a study of microglia and astrocyte gene expression across mouse brain aging. mousiPLIER identified biological pathways that are significantly associated with aging, including one latent variable (LV41) corresponding to striatal signal. To gain further insight into the genes contained in LV41, we performedk-means clustering on the training data to identify studies that respond strongly to LV41. We found that the variable was relevant to striatum and aging across the scientific literature. Finally, we built a Web server (http://mousiplier.greenelab.com/) for users to easily explore the learned latent variables. Taken together, this study defines mousiPLIER as a method to uncover meaningful biological processes in mouse brain transcriptomic studies.
more »
« less
- Award ID(s):
- 2238125
- PAR ID:
- 10532429
- Publisher / Repository:
- eNeuro
- Date Published:
- Journal Name:
- eneuro
- Volume:
- 11
- Issue:
- 6
- ISSN:
- 2373-2822
- Page Range / eLocation ID:
- ENEURO.0313-23.2024
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The rapid growth of diverse -omics datasets has made multiomics data integration crucial in cancer research. This study adapts the expectation–maximization routine for the joint latent variable modeling of multiomics patient profiles. By combining this approach with traditional biological feature selection methods, this study optimizes latent distribution, enabling efficient patient clustering from well-studied cancer types with reduced computational expense. The proposed optimization subroutines enhance survival analysis and improve runtime performance. This article presents a framework for distinguishing cancer subtypes and identifying potential biomarkers for breast cancer. Key insights into individual subtype expression and function were obtained through differentially expressed gene analysis and pathway enrichment for BRCA patients. The analysis compared 302 tumor samples to 113 normal samples across 60,660 genes. The highly upregulated gene COL10A1, promoting breast cancer progression and poor prognosis, and the consistently downregulated gene CDG300LG, linked to brain metastatic cancer, were identified. Pathway enrichment analysis revealed similarities in cellular matrix organization pathways across subtypes, with notable differences in functions like cell proliferation regulation and endocytosis by host cells. GO Semantic Similarity analysis quantified gene relationships in each subtype, identifying potential biomarkers like MATN2, similar to COL10A1. These insights suggest deeper relationships within clusters and highlight personalized treatment potential based on subtypes.more » « less
-
Abstract Spatial transcriptomics (ST) technologies enable high throughput gene expression characterization within thin tissue sections. However, comparing spatial observations across sections, samples, and technologies remains challenging. To address this challenge, we develop STalign to align ST datasets in a manner that accounts for partially matched tissue sections and other local non-linear distortions using diffeomorphic metric mapping. We apply STalign to align ST datasets within and across technologies as well as to align ST datasets to a 3D common coordinate framework. We show that STalign achieves high gene expression and cell-type correspondence across matched spatial locations that is significantly improved over landmark-based affine alignments. Applying STalign to align ST datasets of the mouse brain to the 3D common coordinate framework from the Allen Brain Atlas, we highlight how STalign can be used to lift over brain region annotations and enable the interrogation of compositional heterogeneity across anatomical structures. STalign is available as an open-source Python toolkit athttps://github.com/JEFworks-Lab/STalignand as Supplementary Software with additional documentation and tutorials available athttps://jef.works/STalign.more » « less
-
Abstract The endoplasmic reticulum (ER) houses sensors that respond to environmental stress and underly plants' adaptative responses. These sensors transduce signals that lead to changes in nuclear gene expression. The ER to nuclear signaling pathways are primarily attributed to the unfolded protein response (UPR) and are also integrated with a wide range of development, hormone, immune, and stress signaling pathways. Understanding the role of the UPR in signaling network mechanisms that associate with particular phenotypes is crucially important. While UPR‐associated genes are the subject of ongoing investigations in a few model plant systems, most remain poorly annotated, hindering the identification of candidates across plant species. This open‐source curated database provides a centralized resource of peer reviewed knowledge of ER to nuclear signaling pathways for the plant community. We provide a UPRome interactive viewer for users to navigate through the pathways and to access annotated information. The plant ER UPRome website is located athttp://uprome.tamu.edu. We welcome contributions from the researchers studying the ER UPR to incorporate additional genes into the database through the “contact us” page.more » « less
-
Differential polyadenylation sites (PAs) critically regulate gene expression, but their cell type–specific usage and spatial distribution in the brain have not been systematically characterized. Here, we present Infernape, which infers and quantifies PA usage from single-cell and spatial transcriptomic data and show its application in the mouse brain. Infernape uncovers alternative intronic PAs and 3′-UTR lengthening during cortical neurogenesis. Progenitor–neuron comparisons in the excitatory and inhibitory neuron lineages show overlapping PA changes in embryonic brains, suggesting that the neural proliferation–differentiation axis plays a prominent role. In the adult mouse brain, we uncover cell type–specific PAs and visualize such events using spatial transcriptomic data. Over two dozen neurodevelopmental disorder–associated genes such as Csnk2a1 and Mecp2 show differential PAs during brain development. This study presents Infernape to identify PAs from scRNA-seq and spatial data, and highlights the role of alternative PAs in neuronal gene regulation.more » « less
An official website of the United States government

