skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.

Title: Computational and Experimental Analysis of Genetic Variants
Genomics has grown exponentially over the last decade. Common variants are associated with physiological changes through statistical strategies such as Genome-Wide Association Studies (GWAS) and quantitative trail loci (QTL). Rare variants are associated with diseases through extensive filtering tools, including population genomics and trio-based sequencing (parents and probands). However, the genomic associations require follow-up analyses to narrow causal variants, identify genes that are influenced, and to determine the physiological changes. Large quantities of data exist that can be used to connect variants to gene changes, cell types, protein pathways, clinical phenotypes, and animal models that establish physiological genomics. This data combined with bioinformatics including evolutionary analysis, structural insights, and gene regulation can yield testable hypotheses for mechanisms of genomic variants. Molecular biology, biochemistry, cell culture, CRISPR editing, and animal models can test the hypotheses to give molecular variant mechanisms. Variant characterizations can be a significant component of educating future professionals at the undergraduate, graduate, or medical training programs through teaching the basic concepts and terminology of genetics while learning independent research hypothesis design. This article goes through the computational and experimental analysis strategies of variant characterization and provides examples of these tools applied in publications. © 2022 American Physiological Society. Compr Physiol 12:3303-3336, 2022.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Comprehensive physiology
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In the age of genomics, public understanding of complex scientific knowledge is critical. To combat reductionistic views, it is necessary to generate and organize educational material and data that keep pace with advances in genomics. The view that CCR5 is solely the receptor for HIV gave rise to demand to remove the gene in patients to create host HIV resistance, underestimating the broader roles and complex genetic inheritance of CCR5. A program aimed at providing research projects to undergraduates, known as CODE, has been expanded to build educational material for genes such as CCR5 in a rapid approach, exposing students and trainees to large bioinformatics databases and previous experiments for broader data to challenge commitment to biological reductionism. Our students organize expression databases, query environmental responses, assess genetic factors, generate protein models/dynamics, and profile evolutionary insights into a protein such as CCR5. The knowledgebase generated in the initiative opens the door for public educational information and tools (molecular videos, 3D printed models, and handouts), classroom materials, and strategy for future genetic ideas that can be distributed in formal, semiformal, and informal educational environments. This work highlights that many factors are missing from the reductionist view of CCR5, including the role of missense variants or expression of CCR5 with neurological phenotypes and the role of CCR5 and the delta32 variant in complex critical care patients with sepsis. When connected to genomic stories in the news, these tools offer critically needed Ethical, Legal, and Social Implication (ELSI) education to combat biological reductionism. 
    more » « less
  2. INTRODUCTION Genome-wide association studies (GWASs) have identified thousands of human genetic variants associated with diverse diseases and traits, and most of these variants map to noncoding loci with unknown target genes and function. Current approaches to understand which GWAS loci harbor causal variants and to map these noncoding regulators to target genes suffer from low throughput. With newer multiancestry GWASs from individuals of diverse ancestries, there is a pressing and growing need to scale experimental assays to connect GWAS variants with molecular mechanisms. Here, we combined biobank-scale GWASs, massively parallel CRISPR screens, and single-cell sequencing to discover target genes of noncoding variants for blood trait loci with systematic targeting and inhibition of noncoding GWAS loci with single-cell sequencing (STING-seq). RATIONALE Blood traits are highly polygenic, and GWASs have identified thousands of noncoding loci that map to candidate cis -regulatory elements (CREs). By combining CRE-silencing CRISPR perturbations and single-cell readouts, we targeted hundreds of GWAS loci in a single assay, revealing target genes in cis and in trans . For select CREs that regulate target genes, we performed direct variant insertion. Although silencing the CRE can identify the target gene, direct variant insertion can identify magnitude and direction of effect on gene expression for the GWAS variant. In select cases in which the target gene was a transcription factor or microRNA, we also investigated the gene-regulatory networks altered upon CRE perturbation and how these networks differ across blood cell types. RESULTS We inhibited candidate CREs from fine-mapped blood trait GWAS variants (from ~750,000 individual of diverse ancestries) in human erythroid progenitors. In total, we targeted 543 variants (254 loci) mapping to candidate CREs, generating multimodal single-cell data including transcriptome, direct CRISPR gRNA capture, and cell surface proteins. We identified target genes in cis (within 500 kb) for 134 CREs. In most cases, we found that the target gene was the closest gene and that specific enhancer-associated biochemical hallmarks (H3K27ac and accessible chromatin) are essential for CRE function. Using multiple perturbations at the same locus, we were able to distinguished between causal variants from noncausal variants in linkage disequilibrium. For a subset of validated CREs, we also inserted specific GWAS variants using base-editing STING-seq (beeSTING-seq) and quantified the effect size and direction of GWAS variants on gene expression. Given our transcriptome-wide data, we examined dosage effects in cis and trans in cases in which the cis target is a transcription factor or microRNA. We found that trans target genes are also enriched for GWAS loci, and identified gene clusters within trans gene networks with distinct biological functions and expression patterns in primary human blood cells. CONCLUSION In this work, we investigated noncoding GWAS variants at scale, identifying target genes in single cells. These methods can help to address the variant-to-function challenges that are a barrier for translation of GWAS findings (e.g., drug targets for diseases with a genetic basis) and greatly expand our ability to understand mechanisms underlying GWAS loci. Identifying causal variants and their target genes with STING-seq. Uncovering causal variants and their target genes or function are a major challenge for GWASs. STING-seq combines perturbation of noncoding loci with multimodal single-cell sequencing to profile hundreds of GWAS loci in parallel. This approach can identify target genes in cis and trans , measure dosage effects, and decipher gene-regulatory networks. 
    more » « less
  3. Abstract

    Evolutionary mechanisms that underlie the origins of coloniality among organisms are diverse. Some animal colonies may be comprised strictly of clonal individuals formed from asexual budding or comprised of a chimera of clonal and sexually produced individuals that fuse secondarily. This investigation focuses on select members of the lophophorates and entoprocts whose evolutionary relationships remain enigmatic even in the age of genomics. Using transcriptomic data sets, two coloniality‐based hypotheses are tested in a phylogenetic context to find candidate genes showing evidence of positive selection and potentially convergent molecular signatures among solitary species and taxa‐forming colonies from aggregate groups or clonal budding. Approximately 22% of the 387 orthogroups tested showed evidence of positive selection in at least one of the three branch‐site tests (CODEML, BUSTED, and aBSREL). Only 12 genes could be reliably associated with a developmental function related to traits linked with coloniality, neuroanatomy, or ciliary fields. Genes testing for both positive selection and convergent molecular characters include orthologues of Radial spoke head,Elongation translation initiation factors,SEC13,andImmediate early response gene5. Maximum likelihood analyses included here resulted in tree topologies typical of other phylogenetic investigations based on wider genomic information. Further genomic and experimental evidence will be needed to resolve whether a solitary ancestor with multiciliated cells that formed aggregate groups gave rise to colonial forms in bryozoans (and perhaps the entoprocts) or that the morphological differences exhibited by phoronids and brachiopods represent trait modifications from a colonial ancestor.

    more » « less
  4. Abstract Background

    Genotypes are strongly associated with disease phenotypes, particularly in brain disorders. However, the molecular and cellular mechanisms behind this association remain elusive. With emerging multimodal data for these mechanisms, machine learning methods can be applied for phenotype prediction at different scales, but due to the black-box nature of machine learning, integrating these modalities and interpreting biological mechanisms can be challenging. Additionally, the partial availability of these multimodal data presents a challenge in developing these predictive models.


    To address these challenges, we developed DeepGAMI, an interpretable neural network model to improve genotype–phenotype prediction from multimodal data. DeepGAMI leverages functional genomic information, such as eQTLs and gene regulation, to guide neural network connections. Additionally, it includes an auxiliary learning layer for cross-modal imputation allowing the imputation of latent features of missing modalities and thus predicting phenotypes from a single modality. Finally, DeepGAMI uses integrated gradient to prioritize multimodal features for various phenotypes.


    We applied DeepGAMI to several multimodal datasets including genotype and bulk and cell-type gene expression data in brain diseases, and gene expression and electrophysiology data of mouse neuronal cells. Using cross-validation and independent validation, DeepGAMI outperformed existing methods for classifying disease types, and cellular and clinical phenotypes, even using single modalities (e.g., AUC score of 0.79 for Schizophrenia and 0.73 for cognitive impairment in Alzheimer’s disease).


    We demonstrated that DeepGAMI improves phenotype prediction and prioritizes phenotypic features and networks in multiple multimodal datasets in complex brains and brain diseases. Also, it prioritized disease-associated variants, genes, and regulatory networks linked to different phenotypes, providing novel insights into the interpretation of gene regulatory mechanisms. DeepGAMI is open-source and available for general use.

    more » « less
  5. Brennan, Richard Gerald (Ed.)
    ABSTRACT <p>Microbial extracellular proteins and metabolites provide valuable information concerning how microbes adapt to changing environments. In cyanobacteria, dynamic acclimation strategies involve a variety of regulatory mechanisms, being ferric uptake regulator proteins as key players in this process. In the nitrogen-fixing cyanobacterium<italic>Anabaena</italic>sp. strain PCC 7120, FurC (PerR) is a global regulator that modulates the peroxide response and several genes involved in photosynthesis and nitrogen metabolism. To investigate the possible role of FurC in shaping the extracellular environment of<italic>Anabaena</italic>, the analysis of the extracellular metabolites and proteins of a<italic>furC</italic>-overexpressing variant was compared to that of the wild-type strain. There were 96 differentially abundant proteins, 78 of which were found for the first time in the extracellular fraction of<italic>Anabaena</italic>. While these proteins belong to different functional categories, most of them are predicted to be secreted or have a peripheral location. Several stress-related proteins, including PrxA, flavodoxin, and the Dps homolog All1173, accumulated in the exoproteome of<italic>furC</italic>-overexpressing cells, while decreased levels of FurA and a subset of membrane proteins, including several export proteins and<italic>amiC</italic>gene products, responsible for nanopore formation, were detected. Direct repression by FurC of some of those genes, including<italic>amiC1</italic>and<italic>amiC2,</italic>could account for odd septal nanopore formation and impaired intercellular molecular transfer observed in the<italic>furC</italic>-overexpressing variant. Assessment of the exometabolome from both strains revealed the release of two peptidoglycan fragments in<italic>furC</italic>-overexpressing cells, namely 1,6-anhydro-N-acetyl-β-D-muramic acid (anhydroMurNAc) and its associated disaccharide (β-D-GlcNAc-(1-4)-anhydroMurNAc), suggesting alterations in peptidoglycan breakdown and recycling.</p><sec><title>IMPORTANCE

    Cyanobacteria are ubiquitous photosynthetic prokaryotes that can adapt to environmental stresses by modulating their extracellular contents. Measurements of the organization and composition of the extracellular milieu provide useful information about cyanobacterial adaptive processes, which can potentially lead to biomimetic approaches to stabilizing biological systems to adverse conditions.Anabaenasp. strain PCC 7120 is a multicellular, nitrogen-fixing cyanobacterium whose intercellular molecular exchange is mediated by septal junctions that traverse the septal peptidoglycan through nanopores. FurC (PerR) is an essential transcriptional regulator inAnabaena, which modulates the response to several stresses. Here, we show thatfurC-overexpressing cells result in a modified exoproteome and the release of peptidoglycan fragments. Phenotypically, important alterations in nanopore formation and cell-to-cell communication were observed. Our results expand the roles of FurC to the modulation of cell-wall biogenesis and recycling, as well as in intercellular molecular transfer.

    more » « less