skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Accelerating Biological Insight for Understudied Genes
Synopsis The rapid expansion of genome sequence data is increasing the discovery of protein-coding genes across all domains of life. Annotating these genes with reliable functional information is necessary to understand evolution, to define the full biochemical space accessed by nature, and to identify target genes for biotechnology improvements. The majority of proteins are annotated based on sequence conservation with no specific biological, biochemical, genetic, or cellular function identified. Recent technical advances throughout the biological sciences enable experimental research on these understudied protein-coding genes in a broader collection of species. However, scientists have incentives and biases to continue focusing on well documented genes within their preferred model organism. This perspective suggests a research model that seeks to break historic silos of research bias by enabling interdisciplinary teams to accelerate biological functional annotation. We propose an initiative to develop coordinated projects of collaborating evolutionary biologists, cell biologists, geneticists, and biochemists that will focus on subsets of target genes in multiple model organisms. Concurrent analysis in multiple organisms takes advantage of evolutionary divergence and selection, which causes individual species to be better suited as experimental models for specific genes. Most importantly, multisystem approaches would encourage transdisciplinary critical thinking and hypothesis testing that is inherently slow in current biological research.  more » « less
Award ID(s):
2111069
PAR ID:
10315258
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Integrative and Comparative Biology
ISSN:
1540-7063
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This beginner’s guide is intended for plant biologists new to network analysis. Here, we introduce key concepts and resources for researchers interested in incorporating network analysis into research, either as a stand-alone component for generating hypotheses or as a framework for examining and visualizing experimental results. Network analysis provides a powerful tool to predict gene functions. Advances in and reduced costs for systems biology techniques, such as genomics, transcriptomics, and proteomics, have generated abundant -omics data for plants; however, the functional annotation of plant genes lags. Therefore, predictions from network analysis can be a starting point to annotate genes and ultimately elucidate genotype-phenotype relationships. In this paper, we introduce networks and compare network-building resources available for plant biologists, including databases and software for network analysis. We then compare four databases available for plant biologists in more detail: AraNet, GeneMANIA, ATTED-II, and STRING. AraNet, and GeneMANIA are functional association networks, ATTED-II is a gene coexpression database, and STRING is a protein-protein interaction database. AraNet, and ATTED-II are plant-specific databases that can analyze multiple plant species, whereas GeneMANIA builds networks for Arabidopsis thaliana and non-plant species, and STRING for multiple species. Finally, we compare the performance of the four databases in predicting known and probable gene functions of the A. thaliana Nuclear Factor-Y (NF-Y) genes. We conclude that plant biologists have an invaluable resource in these databases and discuss how users can decide which type of database to use depending on their research question. 
    more » « less
  2. Abstract The origin of new genes has long been a central interest of evolutionary biologists. However, their novelty means that they evade reconstruction by the classical tools of evolutionary modelling. This evasion of deep ancestral investigation necessitates intensive study of model species within well‐sampled, recently diversified, clades. One such clade is the model genusNeurospora, members of which lack recent gene duplications. SeveralNeurosporaspecies are comprehensively characterized organisms apt for studying the evolution of lineage‐specific genes (LSGs). Using gene synteny, we documented that 78% ofNeurosporaLSG clusters are located adjacent to the telomeres featuring extensive tracts of non‐coding DNA and duplicated genes. Here, we report several instances of LSGs that are likely from regional rearrangements and potentially from gene rebirth. To broadly investigate the functions of LSGs, we assembled transcriptomics data from 68 experimental data points and identified co‐regulatory modules using Weighted Gene Correlation Network Analysis, revealing that LSGs are widely but peripherally involved in known regulatory machinery for diverse functions. The ancestral status of the LSGmas‐1, a gene with roles in cell‐wall integrity and cellular sensitivity to antifungal toxins, was investigated in detail alongside its genomic neighbours, indicating that it arose from an ancient lysophospholipase precursor that is ubiquitous in lineages of the Sordariomycetes. Our discoveries illuminate a “rummage region” in theN. crassagenome that enables the formation of new genes and functions to arise via gene duplication and relocation, followed by fast mutation and recombination facilitated by sequence repeats and unconstrained non‐coding sequences. 
    more » « less
  3. Genome sequencing has uncovered tremendous sequence variation within and between species. In plants, in addition to large variations in genome size, a great deal of sequence polymorphism is also evident in several large multi-gene families, including those involved in the ubiquitin-26S proteasome protein degradation system. However, the biological function of this sequence variation is yet not clear. In this work, we explicitly demonstrated a single origin of retroposed Arabidopsis Skp1-Like ( ASK ) genes using an improved phylogenetic analysis. Taking advantage of the 1,001 genomes project, we here provide several lines of polymorphism evidence showing both adaptive and degenerative evolutionary processes in ASK genes. Yeast two-hybrid quantitative interaction assays further suggested that recent neutral changes in the ASK2 coding sequence weakened its interactions with some F-box proteins. The trend that highly polymorphic upstream regions of ASK1 yield high levels of expression implied negative expression regulation of ASK1 by an as-yet-unknown transcriptional suppression mechanism, which may contribute to the polymorphic roles of Skp1-CUL1-F-box complexes. Taken together, this study provides new evolutionary evidence to guide future functional genomic studies of SCF-mediated protein ubiquitylation. 
    more » « less
  4. John Davey; Lisa Nagy; Elizabeth Jockusch; Julia Bowsher (Ed.)
    Clade-specific (a.k.a. lineage-specific) genes are very common and found at all taxonomic levels and in all clades examined. They can arise by duplication of previously existing genes, which can involve partial truncations or combinations with other protein domains or regulatory sequences. They can also evolve de novo from non-coding sequences, leading to potentially truly novel protein domains. Finally, since clade-specific genes are generally defined by lack of sequence homology with other proteins, they can also arise by sequence evolution that is rapid enough that previous sequence homology can no longer be detected. In such cases, where the rapid evolution is followed by constraint, we consider them to be ontologically non-novel but likely novel at a functional level. In general, clade-specific genes have received less attention from biologists but there are increasing numbers of fascinating examples of their roles in important traits. Here we review some selected recent examples, and argue that attention to clade-specific genes is an important corrective to the focus on the conserved developmental regulatory toolkit that has been the habit of evo-devo as a field. Finally, we discuss questions that arise about the evolution of clade-specific genes, and how these might be addressed by future studies. We highlight the hy- pothesis that clade-specific genes are more likely to be involved in synapomorphies that arose in the stem group where they appeared, compared to other genes. 
    more » « less
  5. Long noncoding RNA (lncRNA) genes outnumber protein coding genes in the human genome and the majority remain uncharacterized. A major difficulty in generalizing understanding of lncRNA function is the dearth of gross sequence conservation, both for lncRNAs across species and for lncRNAs that perform similar functions within a species. Machine learning based methods which harness vast amounts of information on RNAs are increasingly used to impute certain biological characteristics. This includes interactions with proteins that are important mediators of RNA function, thus enabling the generation of knowledge in contexts for which experimental data are lacking. Here, we applied a natural language-based machine learning approach that enabled us to identify RNA binding protein interactions in lncRNA transcripts, using only RNA sequence as an input. We found that this predictive method is a powerful approach to infer conserved binding across species as distant as human and opossum, even in the absence of sequence conservation, thus informing on sequence-function relationships for these poorly understood RNAs. 
    more » « less