skip to main content


Title: Accelerating Biological Insight for Understudied Genes
Synopsis The rapid expansion of genome sequence data is increasing the discovery of protein-coding genes across all domains of life. Annotating these genes with reliable functional information is necessary to understand evolution, to define the full biochemical space accessed by nature, and to identify target genes for biotechnology improvements. The majority of proteins are annotated based on sequence conservation with no specific biological, biochemical, genetic, or cellular function identified. Recent technical advances throughout the biological sciences enable experimental research on these understudied protein-coding genes in a broader collection of species. However, scientists have incentives and biases to continue focusing on well documented genes within their preferred model organism. This perspective suggests a research model that seeks to break historic silos of research bias by enabling interdisciplinary teams to accelerate biological functional annotation. We propose an initiative to develop coordinated projects of collaborating evolutionary biologists, cell biologists, geneticists, and biochemists that will focus on subsets of target genes in multiple model organisms. Concurrent analysis in multiple organisms takes advantage of evolutionary divergence and selection, which causes individual species to be better suited as experimental models for specific genes. Most importantly, multisystem approaches would encourage transdisciplinary critical thinking and hypothesis testing that is inherently slow in current biological research.  more » « less
Award ID(s):
2111069
NSF-PAR ID:
10315258
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Integrative and Comparative Biology
ISSN:
1540-7063
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Introduction

    Eukaryotic life depends on the functional elements encoded by both the nuclear genome and organellar genomes, such as those contained within the mitochondria. The content, size, and structure of the mitochondrial genome varies across organisms with potentially large implications for phenotypic variance and resulting evolutionary trajectories. Among yeasts in the subphylum Saccharomycotina, extensive differences have been observed in various species relative to the model yeastSaccharomyces cerevisiae, but mitochondrial genome sampling across many groups has been scarce, even as hundreds of nuclear genomes have become available.

    Methods

    By extracting mitochondrial assemblies from existing short-read genome sequence datasets, we have greatly expanded both the number of available genomes and the coverage across sparsely sampled clades.

    Results

    Comparison of 353 yeast mitochondrial genomes revealed that, while size and GC content were fairly consistent across species, those in the generaMetschnikowiaandSaccharomycestrended larger, while several species in the order Saccharomycetales, which includesS. cerevisiae, exhibited lower GC content. Extreme examples for both size and GC content were scattered throughout the subphylum. All mitochondrial genomes shared a core set of protein-coding genes for Complexes III, IV, and V, but they varied in the presence or absence of mitochondrially-encoded canonical Complex I genes. We traced the loss of Complex I genes to a major event in the ancestor of the orders Saccharomycetales and Saccharomycodales, but we also observed several independent losses in the orders Phaffomycetales, Pichiales, and Dipodascales. In contrast to prior hypotheses based on smaller-scale datasets, comparison of evolutionary rates in protein-coding genes showed no bias towards elevated rates among aerobically fermenting (Crabtree/Warburg-positive) yeasts. Mitochondrial introns were widely distributed, but they were highly enriched in some groups. The majority of mitochondrial introns were poorly conserved within groups, but several were shared within groups, between groups, and even across taxonomic orders, which is consistent with horizontal gene transfer, likely involving homing endonucleases acting as selfish elements.

    Discussion

    As the number of available fungal nuclear genomes continues to expand, the methods described here to retrieve mitochondrial genome sequences from these datasets will prove invaluable to ensuring that studies of fungal mitochondrial genomes keep pace with their nuclear counterparts.

     
    more » « less
  2. Summary

    Spirodela polyrhizais a fast‐growing aquatic monocot with highly reduced morphology, genome size and number of protein‐coding genes. Considering these biological features of Spirodela and its basal position in the monocot lineage, understanding its genome architecture could shed light on plant adaptation and genome evolution. Like many draft genomes, however, the 158‐Mb Spirodela genome sequence has not been resolved to chromosomes, and important genome characteristics have not been defined. Here we deployed rapid genome‐wide physical maps combined with high‐coverage short‐read sequencing to resolve the 20 chromosomes of Spirodela and to empirically delineate its genome features. Our data revealed a dramatic reduction in the number of therDNArepeat units in Spirodela to fewer than 100, which is even fewer than that reported for yeast. Consistent with its unique phylogenetic position, smallRNAsequencing revealed 29 Spirodela‐specific microRNA, with only two being shared withElaeis guineensis(oil palm) andMusa balbisiana(banana). CombiningDNAmethylation data and smallRNAsequencing enabled the accurate prediction of 20.5% long terminal repeats (LTRs) that doubled the previous estimate, and revealed a high Solo:IntactLTRratio of 8.2. Interestingly, we found that Spirodela has the lowest globalDNAmethylation levels (9%) of any plant species tested. Taken together our results reveal a genome that has undergone reduction, likely through eliminating non‐essential protein coding genes,rDNAandLTRs. In addition to delineating the genome features of this unique plant, the methodologies described and large‐scale genome resources from this work will enable future evolutionary and functional studies of this basal monocot family.

     
    more » « less
  3. INTRODUCTION A major challenge in genomics is discerning which bases among billions alter organismal phenotypes and affect health and disease risk. Evidence of past selective pressure on a base, whether highly conserved or fast evolving, is a marker of functional importance. Bases that are unchanged in all mammals may shape phenotypes that are essential for organismal health. Bases that are evolving quickly in some species, or changed only in species that share an adaptive trait, may shape phenotypes that support survival in specific niches. Identifying bases associated with exceptional capacity for cellular recovery, such as in species that hibernate, could inform therapeutic discovery. RATIONALE The power and resolution of evolutionary analyses scale with the number and diversity of species compared. By analyzing genomes for hundreds of placental mammals, we can detect which individual bases in the genome are exceptionally conserved (constrained) and likely to be functionally important in both coding and noncoding regions. By including species that represent all orders of placental mammals and aligning genomes using a method that does not require designating humans as the reference species, we explore unusual traits in other species. RESULTS Zoonomia’s mammalian comparative genomics resources are the most comprehensive and statistically well-powered produced to date, with a protein-coding alignment of 427 mammals and a whole-genome alignment of 240 placental mammals representing all orders. We estimate that at least 10.7% of the human genome is evolutionarily conserved relative to neutrally evolving repeats and identify about 101 million significantly constrained single bases (false discovery rate < 0.05). We cataloged 4552 ultraconserved elements at least 20 bases long that are identical in more than 98% of the 240 placental mammals. Many constrained bases have no known function, illustrating the potential for discovery using evolutionary measures. Eighty percent are outside protein-coding exons, and half have no functional annotations in the Encyclopedia of DNA Elements (ENCODE) resource. Constrained bases tend to vary less within human populations, which is consistent with purifying selection. Species threatened with extinction have few substitutions at constrained sites, possibly because severely deleterious alleles have been purged from their small populations. By pairing Zoonomia’s genomic resources with phenotype annotations, we find genomic elements associated with phenotypes that differ between species, including olfaction, hibernation, brain size, and vocal learning. We associate genomic traits, such as the number of olfactory receptor genes, with physical phenotypes, such as the number of olfactory turbinals. By comparing hibernators and nonhibernators, we implicate genes involved in mitochondrial disorders, protection against heat stress, and longevity in this physiologically intriguing phenotype. Using a machine learning–based approach that predicts tissue-specific cis - regulatory activity in hundreds of species using data from just a few, we associate changes in noncoding sequence with traits for which humans are exceptional: brain size and vocal learning. CONCLUSION Large-scale comparative genomics opens new opportunities to explore how genomes evolved as mammals adapted to a wide range of ecological niches and to discover what is shared across species and what is distinctively human. High-quality data for consistently defined phenotypes are necessary to realize this potential. Through partnerships with researchers in other fields, comparative genomics can address questions in human health and basic biology while guiding efforts to protect the biodiversity that is essential to these discoveries. Comparing genomes from 240 species to explore the evolution of placental mammals. Our new phylogeny (black lines) has alternating gray and white shading, which distinguishes mammalian orders (labeled around the perimeter). Rings around the phylogeny annotate species phenotypes. Seven species with diverse traits are illustrated, with black lines marking their branch in the phylogeny. Sequence conservation across species is described at the top left. IMAGE CREDIT: K. MORRILL 
    more » « less
  4. Genome sequencing has uncovered tremendous sequence variation within and between species. In plants, in addition to large variations in genome size, a great deal of sequence polymorphism is also evident in several large multi-gene families, including those involved in the ubiquitin-26S proteasome protein degradation system. However, the biological function of this sequence variation is yet not clear. In this work, we explicitly demonstrated a single origin of retroposed Arabidopsis Skp1-Like ( ASK ) genes using an improved phylogenetic analysis. Taking advantage of the 1,001 genomes project, we here provide several lines of polymorphism evidence showing both adaptive and degenerative evolutionary processes in ASK genes. Yeast two-hybrid quantitative interaction assays further suggested that recent neutral changes in the ASK2 coding sequence weakened its interactions with some F-box proteins. The trend that highly polymorphic upstream regions of ASK1 yield high levels of expression implied negative expression regulation of ASK1 by an as-yet-unknown transcriptional suppression mechanism, which may contribute to the polymorphic roles of Skp1-CUL1-F-box complexes. Taken together, this study provides new evolutionary evidence to guide future functional genomic studies of SCF-mediated protein ubiquitylation. 
    more » « less
  5. Medema, Marnix (Ed.)
    ABSTRACT The scale of post-transcriptional regulation and the implications of its interplay with other forms of regulation in environmental acclimation are underexplored for organisms of the domain Archaea . Here, we have investigated the scale of post-transcriptional regulation in the extremely halophilic archaeon Halobacterium salinarum NRC-1 by integrating the transcriptome-wide locations of transcript processing sites (TPSs) and SmAP1 binding, the genome-wide locations of antisense RNAs (asRNAs), and the consequences of RNase_2099C knockout on the differential expression of all genes. This integrated analysis has discovered that 54% of all protein-coding genes in the genome of this haloarchaeon are likely targeted by multiple mechanisms for putative post-transcriptional processing and regulation, with about 20% of genes likely being regulated by combinatorial schemes involving SmAP1, asRNAs, and RNase_2099C. Comparative analysis of mRNA levels (transcriptome sequencing [RNA-Seq]) and protein levels (sequential window acquisition of all theoretical fragment ion spectra mass spectrometry [SWATH-MS]) for 2,579 genes over four phases of batch culture growth in complex medium generated additional evidence for the conditional post-transcriptional regulation of 7% of all protein-coding genes. We demonstrate that post-transcriptional regulation may act to fine-tune specialized and rapid acclimation to stressful environments, e.g., as a switch to turn on gas vesicle biogenesis to promote vertical relocation under anoxic conditions and modulate the frequency of transposition by insertion sequence (IS) elements of the IS 200 /IS 605 , IS 4 , and IS H3 families. Findings from this study are provided as an atlas in a public Web resource ( https://halodata.systemsbiology.net ). IMPORTANCE While the transcriptional regulation landscape of archaea has been extensively investigated, we currently have limited knowledge about post-transcriptional regulation and its driving mechanisms in this domain of life. In this study, we collected and integrated omics data from multiple sources and technologies to infer post-transcriptionally regulated genes and the putative mechanisms modulating their expression at the protein level in Halobacterium salinarum NRC-1. The results suggest that post-transcriptional regulation may drive environmental acclimation by regulating hallmark biological processes. To foster discoveries by other research groups interested in the topic, we extended our integrated data to the public in the form of an interactive atlas ( https://halodata.systemsbiology.net ). 
    more » « less