skip to main content


Title: nknown to Known: Advancing Knowledge of Coral Gene Function
Given the catastrophic changes befalling coral reefs, understanding coral gene function is essential to advance reef conservation. This has proved challenging due to the paucity of genomic data and genetic tools available for corals. Recently, CRISPR/Cas9 gene editing was applied to these species; however, a major bottleneck is the identification and prioritization of candidate genes for manipulation. This issue is exacerbated by the many unknown ('dark') coral genes that may play key roles in the stress response. We review the use of gene coexpression networks that incorporate both known and unknown genes to identify targets for reverse genetic analysis. This approach also provides a framework for the annotation of dark genes in established interaction networks to improve our fundamental knowledge of coral gene function.  more » « less
Award ID(s):
1756616
NSF-PAR ID:
10166049
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Trends in genetics
Volume:
36
Issue:
2
ISSN:
1357-4817
Page Range / eLocation ID:
93-104
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. INTRODUCTION Genome-wide association studies (GWASs) have identified thousands of human genetic variants associated with diverse diseases and traits, and most of these variants map to noncoding loci with unknown target genes and function. Current approaches to understand which GWAS loci harbor causal variants and to map these noncoding regulators to target genes suffer from low throughput. With newer multiancestry GWASs from individuals of diverse ancestries, there is a pressing and growing need to scale experimental assays to connect GWAS variants with molecular mechanisms. Here, we combined biobank-scale GWASs, massively parallel CRISPR screens, and single-cell sequencing to discover target genes of noncoding variants for blood trait loci with systematic targeting and inhibition of noncoding GWAS loci with single-cell sequencing (STING-seq). RATIONALE Blood traits are highly polygenic, and GWASs have identified thousands of noncoding loci that map to candidate cis -regulatory elements (CREs). By combining CRE-silencing CRISPR perturbations and single-cell readouts, we targeted hundreds of GWAS loci in a single assay, revealing target genes in cis and in trans . For select CREs that regulate target genes, we performed direct variant insertion. Although silencing the CRE can identify the target gene, direct variant insertion can identify magnitude and direction of effect on gene expression for the GWAS variant. In select cases in which the target gene was a transcription factor or microRNA, we also investigated the gene-regulatory networks altered upon CRE perturbation and how these networks differ across blood cell types. RESULTS We inhibited candidate CREs from fine-mapped blood trait GWAS variants (from ~750,000 individual of diverse ancestries) in human erythroid progenitors. In total, we targeted 543 variants (254 loci) mapping to candidate CREs, generating multimodal single-cell data including transcriptome, direct CRISPR gRNA capture, and cell surface proteins. We identified target genes in cis (within 500 kb) for 134 CREs. In most cases, we found that the target gene was the closest gene and that specific enhancer-associated biochemical hallmarks (H3K27ac and accessible chromatin) are essential for CRE function. Using multiple perturbations at the same locus, we were able to distinguished between causal variants from noncausal variants in linkage disequilibrium. For a subset of validated CREs, we also inserted specific GWAS variants using base-editing STING-seq (beeSTING-seq) and quantified the effect size and direction of GWAS variants on gene expression. Given our transcriptome-wide data, we examined dosage effects in cis and trans in cases in which the cis target is a transcription factor or microRNA. We found that trans target genes are also enriched for GWAS loci, and identified gene clusters within trans gene networks with distinct biological functions and expression patterns in primary human blood cells. CONCLUSION In this work, we investigated noncoding GWAS variants at scale, identifying target genes in single cells. These methods can help to address the variant-to-function challenges that are a barrier for translation of GWAS findings (e.g., drug targets for diseases with a genetic basis) and greatly expand our ability to understand mechanisms underlying GWAS loci. Identifying causal variants and their target genes with STING-seq. Uncovering causal variants and their target genes or function are a major challenge for GWASs. STING-seq combines perturbation of noncoding loci with multimodal single-cell sequencing to profile hundreds of GWAS loci in parallel. This approach can identify target genes in cis and trans , measure dosage effects, and decipher gene-regulatory networks. 
    more » « less
  2. Predicting gene functions from genome sequence alone has been difficult, and the functions of a large fraction of plant genes remain unknown. However, leveraging the vast amount of currently available gene expression data has the potential to facilitate our understanding of plant gene functions, especially in determining complex traits. Gene coexpression networks—created by integrating multiple expression datasets—connect genes with similar patterns of expression across multiple conditions. Dense gene communities in such networks, commonly referred to as modules, often indicate that the member genes are functionally related. As such, these modules serve as tools for generating new testable hypotheses, including the prediction of gene function and importance. Recently, we have seen a paradigm shift from the traditional “global” to more defined, context-specific coexpression networks. Such coexpression networks imply genetic correlations in specific biological contexts such as during development or in response to a stress. In this short review, we highlight a few recent studies that attempt to fill the large gaps in our knowledge about cellular functions of plant genes using context-specific coexpression networks. 
    more » « less
  3. Background Corals, which form the foundation of biodiverse reef ecosystems, are under threat from warming oceans. Reefs provide essential ecological services, including food, income from tourism, nutrient cycling, waste removal, and the absorption of wave energy to mitigate erosion. Here, we studied the coral thermal stress response using network methods to analyze transcriptomic and polar metabolomic data generated from the Hawaiian rice coral Montipora capitata . Coral nubbins were exposed to ambient or thermal stress conditions over a 5-week period, coinciding with a mass spawning event of this species. The major goal of our study was to expand the inventory of thermal stress-related genes and metabolites present in M. capitata and to study gene-metabolite interactions. These interactions provide the foundation for functional or genetic analysis of key coral genes as well as provide potentially diagnostic markers of pre-bleaching stress. A secondary goal of our study was to analyze the accumulation of sex hormones prior to and during mass spawning to understand how thermal stress may impact reproductive success in M. capitata . Methods M. capitata was exposed to thermal stress during its spawning cycle over the course of 5 weeks, during which time transcriptomic and polar metabolomic data were collected. We analyzed these data streams individually, and then integrated both data sets using MAGI (Metabolite Annotation and Gene Integration) to investigate molecular transitions and biochemical reactions. Results Our results reveal the complexity of the thermal stress phenome in M. capitata , which includes many genes involved in redox regulation, biomineralization, and reproduction. The size and number of modules in the gene co-expression networks expanded from the initial stress response to the onset of bleaching. The later stages involved the suppression of metabolite transport by the coral host, including a variety of sodium-coupled transporters and a putative ammonium transporter, possibly as a response to reduction in algal productivity. The gene-metabolite integration data suggest that thermal treatment results in the activation of animal redox stress pathways involved in quenching molecular oxygen to prevent an overabundance of reactive oxygen species. Lastly, evidence that thermal stress affects reproductive activity was provided by the downregulation of CYP-like genes and the irregular production of sex hormones during the mass spawning cycle. Overall, redox regulation and metabolite transport are key components of the coral animal thermal stress phenome. Mass spawning was highly attenuated under thermal stress, suggesting that global climate change may negatively impact reproductive behavior in this species. 
    more » « less
  4. INTRODUCTION Diverse phenotypes, including large brains relative to body size, group living, and vocal learning ability, have evolved multiple times throughout mammalian history. These shared phenotypes may have arisen repeatedly by means of common mechanisms discernible through genome comparisons. RATIONALE Protein-coding sequence differences have failed to fully explain the evolution of multiple mammalian phenotypes. This suggests that these phenotypes have evolved at least in part through changes in gene expression, meaning that their differences across species may be caused by differences in genome sequence at enhancer regions that control gene expression in specific tissues and cell types. Yet the enhancers involved in phenotype evolution are largely unknown. Sequence conservation–based approaches for identifying such enhancers are limited because enhancer activity can be conserved even when the individual nucleotides within the sequence are poorly conserved. This is due to an overwhelming number of cases where nucleotides turn over at a high rate, but a similar combination of transcription factor binding sites and other sequence features can be maintained across millions of years of evolution, allowing the function of the enhancer to be conserved in a particular cell type or tissue. Experimentally measuring the function of orthologous enhancers across dozens of species is currently infeasible, but new machine learning methods make it possible to make reliable sequence-based predictions of enhancer function across species in specific tissues and cell types. RESULTS To overcome the limits of studying individual nucleotides, we developed the Tissue-Aware Conservation Inference Toolkit (TACIT). Rather than measuring the extent to which individual nucleotides are conserved across a region, TACIT uses machine learning to test whether the function of a given part of the genome is likely to be conserved. More specifically, convolutional neural networks learn the tissue- or cell type–specific regulatory code connecting genome sequence to enhancer activity using candidate enhancers identified from only a few species. This approach allows us to accurately associate differences between species in tissue or cell type–specific enhancer activity with genome sequence differences at enhancer orthologs. We then connect these predictions of enhancer function to phenotypes across hundreds of mammals in a way that accounts for species’ phylogenetic relatedness. We applied TACIT to identify candidate enhancers from motor cortex and parvalbumin neuron open chromatin data that are associated with brain size relative to body size, solitary living, and vocal learning across 222 mammals. Our results include the identification of multiple candidate enhancers associated with brain size relative to body size, several of which are located in linear or three-dimensional proximity to genes whose protein-coding mutations have been implicated in microcephaly or macrocephaly in humans. We also identified candidate enhancers associated with the evolution of solitary living near a gene implicated in separation anxiety and other enhancers associated with the evolution of vocal learning ability. We obtained distinct results for bulk motor cortex and parvalbumin neurons, demonstrating the value in applying TACIT to both bulk tissue and specific minority cell type populations. To facilitate future analyses of our results and applications of TACIT, we released predicted enhancer activity of >400,000 candidate enhancers in each of 222 mammals and their associations with the phenotypes we investigated. CONCLUSION TACIT leverages predicted enhancer activity conservation rather than nucleotide-level conservation to connect genetic sequence differences between species to phenotypes across large numbers of mammals. TACIT can be applied to any phenotype with enhancer activity data available from at least a few species in a relevant tissue or cell type and a whole-genome alignment available across dozens of species with substantial phenotypic variation. Although we developed TACIT for transcriptional enhancers, it could also be applied to genomic regions involved in other components of gene regulation, such as promoters and splicing enhancers and silencers. As the number of sequenced genomes grows, machine learning approaches such as TACIT have the potential to help make sense of how conservation of, or changes in, subtle genome patterns can help explain phenotype evolution. Tissue-Aware Conservation Inference Toolkit (TACIT) associates genetic differences between species with phenotypes. TACIT works by generating open chromatin data from a few species in a tissue related to a phenotype, using the sequences underlying open and closed chromatin regions to train a machine learning model for predicting tissue-specific open chromatin and associating open chromatin predictions across dozens of mammals with the phenotype. [Species silhouettes are from PhyloPic] 
    more » « less
  5. Abstract Background Dinoflagellates in the family Symbiodiniaceae are important photosynthetic symbionts in cnidarians (such as corals) and other coral reef organisms. Breakdown of the coral-dinoflagellate symbiosis due to environmental stress (i.e. coral bleaching) can lead to coral death and the potential collapse of reef ecosystems. However, evolution of Symbiodiniaceae genomes, and its implications for the coral, is little understood. Genome sequences of Symbiodiniaceae remain scarce due in part to their large genome sizes (1–5 Gbp) and idiosyncratic genome features. Results Here, we present de novo genome assemblies of seven members of the genus Symbiodinium , of which two are free-living, one is an opportunistic symbiont, and the remainder are mutualistic symbionts. Integrating other available data, we compare 15 dinoflagellate genomes revealing high sequence and structural divergence. Divergence among some Symbiodinium isolates is comparable to that among distinct genera of Symbiodiniaceae. We also recovered hundreds of gene families specific to each lineage, many of which encode unknown functions. An in-depth comparison between the genomes of the symbiotic Symbiodinium tridacnidorum (isolated from a coral) and the free-living Symbiodinium natans reveals a greater prevalence of transposable elements, genetic duplication, structural rearrangements, and pseudogenisation in the symbiotic species. Conclusions Our results underscore the potential impact of lifestyle on lineage-specific gene-function innovation, genome divergence, and the diversification of Symbiodinium and Symbiodiniaceae. The divergent features we report, and their putative causes, may also apply to other microbial eukaryotes that have undergone symbiotic phases in their evolutionary history. 
    more » « less