skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The landscape of gene loss and missense variation across the mammalian tree informs on gene essentiality
ABSTRACT BackgroundThe degree of gene and sequence preservation across species provides valuable insights into the relative necessity of genes from the perspective of natural selection. Here, we developed novel interspecies metrics across 462 mammalian species, GISMO (Gene identity score of mammalian orthologs) and GISMO-mis (GISMO-missense), to quantify gene loss traversing millions of years of evolution. GISMO is a measure of gene loss across mammals weighed by evolutionary distance relative to humans, whereas GISMO-mis quantifies the ratio of missense to synonymous variants across mammalian species for a given gene. RationaleDespite large sample sizes, current human constraint metrics are still not well calibrated for short genes. Traversing over 100 million years of evolution across hundreds of mammals can identify the most essential genes and improve gene-disease association. Beyond human genetics, these metrics provide measures of gene constraint to further enable mammalian genetics research. ResultsOur analyses showed that both metrics are strongly correlated with measures of human gene constraint for loss-of-function, missense, and copy number dosage derived from upwards of a million human samples, which highlight the power of interspecies constraint. Importantly, neither GISMO nor GISMO-mis are strongly correlated with coding sequence length. Therefore both metrics can identify novel constrained genes that were too small for existing human constraint metrics to capture. We also found that GISMO scores capture rare variant association signals across a range of phenotypes associated with decreased fecundity, such as schizophrenia, autism, and neurodevelopmental disorders. Moreover, common variant heritability of disease traits are highly enriched in the most constrained deciles of both metrics, further underscoring the biological relevance of these metrics in identifying functionally important genes. We further showed that both scores have the lowest duplication and deletion rate in the most constrained deciles for copy number variants in the UK Biobank, suggesting that it may be an important metric for dosage sensitivity. We additionally demonstrate that GISMO can improve prioritization of recessive disorder genes and captures homozygous selection. ConclusionsOverall, we demonstrate that the most constrained genes for gene loss and missense variation capture the largest fraction of heritability, GISMO can help prioritize recessive disorder genes, and identify the most conserved genes across the mammalian tree.  more » « less
Award ID(s):
2022007
PAR ID:
10539112
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
bioRxiv
Date Published:
Format(s):
Medium: X
Institution:
bioRxiv
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Although multiple high-performing epigenetic aging clocks exist, few are based directly on gene expression. Such transcriptomic aging clocks allow us to extract age-associated genes directly. However, most existing transcriptomic clocks model a subset of genes and are limited in their ability to predict novel biomarkers. With the growing popularity of single-cell sequencing, there is a need for robust single-cell transcriptomic aging clocks. Moreover, clocks have yet to be applied to investigate the elusive phenomenon of sex differences in aging. We introduce TimeFlies, a pan-cell-type scRNA-seq aging clock for theDrosophila melanogasterhead. TimeFlies uses deep learning to classify the donor age of cells based on genome-wide gene expression profiles. Using explainability methods, we identified key marker genes contributing to the classification, with lncRNAs showing up as highly enriched among predicted biomarkers. The top biomarker gene across cell types is lncRNA:roX1, a regulator of X chromosome dosage compensation, a pathway previously identified as a top biomarker of aging in the mouse brain. We validated this finding experimentally, showing a decrease in survival probability in the absence of roX1in vivo. Furthermore, we trained sex-specific TimeFlies clocks and noted significant differences in model predictions and explanations between male and female clocks, suggesting that different pathways drive aging in males and females. Graphical Abstract 
    more » « less
  2. Abstract RNA molecules adopt complex structures that perform essential biological functions across all forms of life, making them promising candidates for therapeutic applications. However, our ability to design new RNA structures remains limited by an incomplete understanding of their folding principles. While global metrics such as the minimum free energy are widely used, they are at odds with naturally occurring structures and incompatible with established design rules. Here, we introduce local stability compensation (LSC), a principle that RNA folding is governed by the local balance between destabilizing loops and their stabilizing adjacent stems, challenging the focus on global energetic optimization. Analysis of over 100,000 RNA structures revealed that LSC signatures are particularly pronounced in bulges and their adjacent stems, with distinct patterns across different RNA families that align with their biological functions. To validate LSC experimentally, we systematically analyzed thousands of RNA variants using DMS chemical mapping. Our results demonstrate that stem folding, as measured by reactivity, correlates with LSC (R2= 0.458 for hairpin loops) and that instabilities show no significant effect on folding for distal stems. These findings demonstrate that LSC can be a guiding principle for understanding RNA function and for the rational design of custom RNAs. Graphical Abstract 
    more » « less
  3. Abstract Lipid nanoparticles (LNPs) are the most clinically advanced nonviral RNA-delivery vehicles, though challenges remain in fully understanding how LNPs interact with biological systems.In vivo, proteins form an associated corona on LNPs that redefines their physicochemical properties and influences delivery outcomes. Despite its importance, the LNP protein corona is challenging to study owing to the technical difficulty of selectively recovering soft nanoparticles from biological samples. Herein, we developed a quantitative, label-free mass spectrometry-based proteomics approach to characterize the protein corona on LNPs. Critically, this protein corona isolation workflow avoids artifacts introduced by the presence of endogenous nanoparticles in human biofluids. We applied continuous density gradient ultracentrifugation for protein-LNP complex isolation, with mass spectrometry for protein identification normalized to protein composition in the biofluid alone. With this approach, we quantify proteins consistently enriched in the LNP corona including vitronectin, C-reactive protein, and alpha-2-macroglobulin. We explore the impact of these corona proteins on cell uptake and mRNA expression in HepG2 human liver cells, and find that, surprisingly, increased levels of cell uptake do not correlate with increased mRNA expression in part likely due to protein corona-induced lysosomal trafficking of LNPs. Our results underscore the need to consider the protein corona in the design of LNP-based therapeutics. Abstract Figure 
    more » « less
  4. ABSTRACT Microbes can be programmed to record participation in gene transfer by coding biological-recording devices into mobile DNA. Upon DNA uptake, these devices transcribe a catalytic RNA (cat-RNA) that binds to conserved sequences within ribosomal RNA (rRNA) and perform a trans-splicing reaction that adds a barcode to the rRNA. Existing cat-RNA designs were generated to be broad-host range, providing no control over the organisms that were barcoded. To achieve control over the organisms barcoded by cat-RNA, we created a program called Ribodesigner that uses input sets of rRNA sequences to create designs with varying specificities. We show how this algorithm can be used to identify designs that enable kingdom-wide barcoding, or selective barcoding of specific taxonomic groups within a kingdom. We use Ribodesigner to create cat-RNA designs that target Pseudomonadales while avoiding Enterobacterales, and we compare the performance of one design to a cat-RNA that was previously found to be broad host range. When conjugated into a mixture ofEscherichia coliandPseudomonas putida, the new design presents increased selectivity compared to a broad host range cat-RNA. Ribodesigner is expected to aid in developing cat-RNA that store information within user-defined sets of microbes in environmental communities for gene transfer studies. GRAPHICAL ABSTRACT 
    more » « less
  5. Abstract Pollen function is critical for successful plant reproduction and crop productivity and it is important to develop accessible methods to quantitatively analyze pollen performance to enhance reproductive resilience. Here we introduce TubeTracker as a method to quantify key parameters of pollen performance such as, time to pollen grain germination, pollen tube tip velocity and pollen tube survival. TubeTracker integrates manual and automatic image processing routines and the graphical user interface allows the user to interact with the software to make manual corrections of automated steps. TubeTracker does not depend on training data sets required to implement machine learning approaches and thus can be immediately implemented using readily available imaging systems. Furthermore, TubeTracker is an excellent tool to produce the pollen performance data sets necessary to take advantage of emerging AI-based methods to fully automate analysis. We tested TubeTracker and found it to be accurate in measuring pollen tube germination and pollen tube tip elongation across multiple cultivars of tomato. Abstract FigureGraphical AbstractGraphical user interface of TubeTracker showing all supported functionalities. 
    more » « less