skip to main content


Title: G2PMineR: A Genome to Phenome Literature Review Approach
There is a gap in the conceptual framework linking genes to phenotypes (G2P) for non-model organisms, as most non-model organisms do not yet have genomic resources readily available. To address this, researchers often perform literature reviews to understand G2P linkages by curating a list of likely gene candidates, hinging upon other studies already conducted in closely related systems. Sifting through hundreds to thousands of articles is a cumbersome task that slows down the scientific process and may introduce bias into a study. To fill this gap, we created G2PMineR, a free and open source literature mining tool developed specifically for G2P research. This R package uses automation to make the G2P review process efficient and unbiased, while also generating hypothesized associations between genes and phenotypes within a taxonomical framework. We applied the package to a literature review for drought-tolerance in plants. The analysis provides biologically meaningful results within the known framework of drought tolerance in plants. Overall, the package is useful for conducting literature reviews for genome to phenome projects, and also has broad appeal to scientists investigating a wide range of study systems as it can conduct analyses under the auspices of three different kingdoms (Plantae, Animalia, and Fungi).  more » « less
Award ID(s):
1757324
NSF-PAR ID:
10336229
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Genes
Volume:
12
Issue:
2
ISSN:
2073-4425
Page Range / eLocation ID:
293
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Severe drought conditions and extreme weather events are increasing worldwide with climate change, threatening the persistence of native plant communities and ecosystems. Many studies have investigated the genomic basis of plant responses to drought. However, the extent of this research throughout the plant kingdom is unclear, particularly among species critical for the sustainability of natural ecosystems. This study aimed to broaden our understanding of genome-to-phenome (G2P) connections in drought-stressed plants and identify focal taxa for future research. Bioinformatics pipelines were developed to mine and link information from databases and abstracts from 7730 publications. This approach identified 1634 genes involved in drought responses among 497 plant taxa. Most (83.30%) of these species have been classified for human use, and most G2P interactions have been described within model organisms or crop species. Our analysis identifies several gaps in G2P research literature and database connectivity, with 21% of abstracts being linked to gene and taxonomy data in NCBI. Abstract text mining was more successful at identifying potential G2P pathways, with 34% of abstracts containing gene, taxa, and phenotype information. Expanding G2P studies to include non-model plants, especially those that are adapted to drought stress, will help advance our understanding of drought responsive G2P pathways. 
    more » « less
  2. Resurrection plants have an extraordinary ability to survive extreme water loss but still revive full metabolic activity when rehydrated. These plants are useful models to understand the complex biology of vegetative desiccation tolerance. Despite extensive studies of resurrection plants, many details underlying the mechanisms of desiccation tolerance remain unexplored. To summarize the progress in resurrection plant research and identify unexplored questions, we conducted a systematic review of 15 model angiosperm resurrection plants. This systematic review provides an overview of publication trends on resurrection plants, the geographical distribution of species and studies, and the methodology used. Using the Preferred Reporting Items for Systematic reviews and Meta–Analyses protocol we surveyed all publications on resurrection plants from 2000 and 2020. This yielded 185 empirical articles that matched our selection criteria. The most investigated plants were Craterostigma plantagineum (17.5%), Haberlea rhodopensis (13.7%), Xerophyta viscosa (reclassified as X. schlechteri) (11.9%), Myrothamnus flabellifolia (8.5%), and Boea hygrometrica (8.1%), with all other species accounting for less than 8% of publications. The majority of studies have been conducted in South Africa, Bulgaria, Germany, and China, but there are contributions from across the globe. Most studies were led by researchers working within the native range of the focal species, but some international and collaborative studies were also identified. The number of annual publications fluctuated, with a large but temporary increase in 2008. Many studies have employed physiological and transcriptomic methodologies to investigate the leaves of resurrection plants, but there was a paucity of studies on roots and only one metagenomic study was recovered. Based on these findings we suggest that future research focuses on resurrection plant roots and microbiome interactions to explore microbial communities associated with these plants, and their role in vegetative desiccation tolerance. 
    more » « less
  3. High throughput CRISPR screens are revolutionizing the way scientists unravel the genetic underpinnings of novel and evolved phenotypes. One of the critical challenges in accurately assessing screening outcomes is accounting for the variability in sgRNA cutting efficiency. Poorly active guides targeting genes essential to screening conditions obscure the growth defects that are expected from disrupting them. Here, we develop acCRISPR, an end-to-end pipeline that identifies essential genes in pooled CRISPR screens using sgRNA read counts obtained from next-generation sequencing. acCRISPR uses experimentally determined cutting efficiencies for each guide in the library to provide an activity correction to the screening outcomes, thus determining the fitness effect of disrupted genes. This is accomplished by calculating an optimization metric that quantifies the tradeoff between guide activity and library coverage, which is maximized to accurately classify genes essential to screening conditions. CRISPR-Cas9 and -Cas12a screens were carried out in the non-conventional oleaginous yeast Yarrowia lipolytica to determine a high-confidence set of essential genes for growth under glucose, a common carbon source used for the industrial production of oleochemicals. acCRISPR was also used in gain-and loss-of-function screens under high salt and low pH conditions to identify known and novel genes that were related to stress tolerance. Collectively, this work presents an experimental-computational framework for CRISPR-based functional genomics studies that may be expanded to other non-conventional organisms of interest. 
    more » « less
  4. Abstract

    High throughput CRISPR screens are revolutionizing the way scientists unravel the genetic underpinnings of engineered and evolved phenotypes. One of the critical challenges in accurately assessing screening outcomes is accounting for the variability in sgRNA cutting efficiency. Poorly active guides targeting genes essential to screening conditions obscure the growth defects that are expected from disrupting them. Here, we develop acCRISPR, an end-to-end pipeline that identifies essential genes in pooled CRISPR screens using sgRNA read counts obtained from next-generation sequencing. acCRISPR uses experimentally determined cutting efficiencies for each guide in the library to provide an activity correction to the screening outcomes via calculation of an optimization metric, thus determining the fitness effect of disrupted genes. CRISPR-Cas9 and -Cas12a screens were carried out in the non-conventional oleaginous yeastYarrowia lipolyticaand acCRISPR was used to determine a high-confidence set of essential genes for growth under glucose, a common carbon source used for the industrial production of oleochemicals. acCRISPR was also used in screens quantifying relative cellular fitness under high salt conditions to identify genes that were related to salt tolerance. Collectively, this work presents an experimental-computational framework for CRISPR-based functional genomics studies that may be expanded to other non-conventional organisms of interest.

     
    more » « less
  5. Summary

    Determining how genes are associated with traits in plants and other organisms is a major challenge in modern biology. The unPAKproject – undergraduates phenotyping Arabidopsis knockouts – has generated phenotype data for thousands of non‐lethal insertion mutation lines within a singleArabidopsis thalianagenomic background. The focal phenotypes examined by unPAKare complex macroscopic fitness‐related traits, which have ecological, evolutionary and agricultural importance. These phenotypes are placed in the context of the wild‐type and also natural accessions (phytometers), and standardized for environmental differences between assays. Data from the unPAKproject are used to describe broad patterns in the phenotypic consequences of insertion mutation, and to identify individual mutant lines with distinct phenotypes as candidates for further study. Inclusion of undergraduate researchers is at the core of unPAKactivities, and an important broader impact of the project is providing students an opportunity to obtain research experience.

     
    more » « less