Human-specific segmental duplications (HSDs) contain millions of base pairs of sequence unique to the human genome, including genes that shape neurodevelopment. Despite their young age (<6 million years), HSD genes exhibit widespread regulatory divergence, with paralog-specific expression patterns documented across a variety of tissues and cell types. Using long-read expression and epigenomic data, we show that human-specific paralogs tend to have lower activity than the shared, ancestral ones. To systematically characterize the cis-regulatory elements (CREs) within HSDs and understand patterns of regulatory change in recently evolved gene families, we conducted a massively parallel reporter assay of 7,160 human duplicated and chimpanzee orthologous sequences in lymphoblastoid (GM12878) and neuroblastoma (SH-SY5Y) cell lines. A large proportion (14–24%) of sequences exhibited differential activity relative to the chimpanzee ortholog (or between human paralogs), mostly with small fold-differences. Combining measured activity levels across all assayed sequences, predicted differences in cis-regulatory activity correlated with mRNA levels in SH-SY5Y. Differentially active CREs were validated for CHRFAM7A, HYDIN2, and SRGAP2C that may contribute to paralog-specific expression patterns and thereby to human-specific traits. While we find some changes in CRE activity shared between duplicate paralogs likely driving regulatory divergence in gene expression, consideration of non-shared adjacent sequences to duplications suggests a larger role for altered genome positional effects. In all, this work suggests that functional divergence of duplicated CREs contributes moderately to regulatory divergence of HSD genes and uncovers enhancers that are candidate drivers of human-specific regulatory patterns.
more »
« less
Machine-guided design of cell-type-targeting cis-regulatory elements
Abstract Cis-regulatory elements (CREs) control gene expression, orchestrating tissue identity, developmental timing and stimulus responses, which collectively define the thousands of unique cell types in the body1–3. While there is great potential for strategically incorporating CREs in therapeutic or biotechnology applications that require tissue specificity, there is no guarantee that an optimal CRE for these intended purposes has arisen naturally. Here we present a platform to engineer and validate synthetic CREs capable of driving gene expression with programmed cell-type specificity. We take advantage of innovations in deep neural network modelling of CRE activity across three cell types, efficient in silico optimization and massively parallel reporter assays to design and empirically test thousands of CREs4–8. Through large-scale in vitro validation, we show that synthetic sequences are more effective at driving cell-type-specific expression in three cell lines compared with natural sequences from the human genome and achieve specificity in analogous tissues when tested in vivo. Synthetic sequences exhibit distinct motif vocabulary associated with activity in the on-target cell type and a simultaneous reduction in the activity of off-target cells. Together, we provide a generalizable framework to prospectively engineer CREs from massively parallel reporter assay models and demonstrate the required literacy to write fit-for-purpose regulatory code.
more »
« less
- Award ID(s):
- 1829135
- PAR ID:
- 10574097
- Publisher / Repository:
- Nature
- Date Published:
- Journal Name:
- Nature
- Volume:
- 634
- Issue:
- 8036
- ISSN:
- 0028-0836
- Page Range / eLocation ID:
- 1211 to 1220
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
INTRODUCTION Genome-wide association studies (GWASs) have identified thousands of human genetic variants associated with diverse diseases and traits, and most of these variants map to noncoding loci with unknown target genes and function. Current approaches to understand which GWAS loci harbor causal variants and to map these noncoding regulators to target genes suffer from low throughput. With newer multiancestry GWASs from individuals of diverse ancestries, there is a pressing and growing need to scale experimental assays to connect GWAS variants with molecular mechanisms. Here, we combined biobank-scale GWASs, massively parallel CRISPR screens, and single-cell sequencing to discover target genes of noncoding variants for blood trait loci with systematic targeting and inhibition of noncoding GWAS loci with single-cell sequencing (STING-seq). RATIONALE Blood traits are highly polygenic, and GWASs have identified thousands of noncoding loci that map to candidate cis -regulatory elements (CREs). By combining CRE-silencing CRISPR perturbations and single-cell readouts, we targeted hundreds of GWAS loci in a single assay, revealing target genes in cis and in trans . For select CREs that regulate target genes, we performed direct variant insertion. Although silencing the CRE can identify the target gene, direct variant insertion can identify magnitude and direction of effect on gene expression for the GWAS variant. In select cases in which the target gene was a transcription factor or microRNA, we also investigated the gene-regulatory networks altered upon CRE perturbation and how these networks differ across blood cell types. RESULTS We inhibited candidate CREs from fine-mapped blood trait GWAS variants (from ~750,000 individual of diverse ancestries) in human erythroid progenitors. In total, we targeted 543 variants (254 loci) mapping to candidate CREs, generating multimodal single-cell data including transcriptome, direct CRISPR gRNA capture, and cell surface proteins. We identified target genes in cis (within 500 kb) for 134 CREs. In most cases, we found that the target gene was the closest gene and that specific enhancer-associated biochemical hallmarks (H3K27ac and accessible chromatin) are essential for CRE function. Using multiple perturbations at the same locus, we were able to distinguished between causal variants from noncausal variants in linkage disequilibrium. For a subset of validated CREs, we also inserted specific GWAS variants using base-editing STING-seq (beeSTING-seq) and quantified the effect size and direction of GWAS variants on gene expression. Given our transcriptome-wide data, we examined dosage effects in cis and trans in cases in which the cis target is a transcription factor or microRNA. We found that trans target genes are also enriched for GWAS loci, and identified gene clusters within trans gene networks with distinct biological functions and expression patterns in primary human blood cells. CONCLUSION In this work, we investigated noncoding GWAS variants at scale, identifying target genes in single cells. These methods can help to address the variant-to-function challenges that are a barrier for translation of GWAS findings (e.g., drug targets for diseases with a genetic basis) and greatly expand our ability to understand mechanisms underlying GWAS loci. Identifying causal variants and their target genes with STING-seq. Uncovering causal variants and their target genes or function are a major challenge for GWASs. STING-seq combines perturbation of noncoding loci with multimodal single-cell sequencing to profile hundreds of GWAS loci in parallel. This approach can identify target genes in cis and trans , measure dosage effects, and decipher gene-regulatory networks.more » « less
-
SUMMARY Cis‐regulatory elements (CREs) are important sequences for gene expression and for plant biological processes such as development, evolution, domestication, and stress response. However, studying CREs in plant genomes has been challenging. The totipotent nature of plant cells, coupled with the inability to maintain plant cell types in culture and the inherent technical challenges posed by the cell wall has limited our understanding of how plant cell types acquire and maintain their identities and respond to the environment via CRE usage. Advances in single‐cell epigenomics have revolutionized the field of identifying cell‐type‐specific CREs. These new technologies have the potential to significantly advance our understanding of plant CRE biology, and shed light on how the regulatory genome gives rise to diverse plant phenomena. However, there are significant biological and computational challenges associated with analyzing single‐cell epigenomic datasets. In this review, we discuss the historical and foundational underpinnings of plant single‐cell research, challenges, and common pitfalls in the analysis of plant single‐cell epigenomic data, and highlight biological challenges unique to plants. Additionally, we discuss how the application of single‐cell epigenomic data in various contexts stands to transform our understanding of the importance of CREs in plant genomes.more » « less
-
Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding ofcis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses oncis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.more » « less
-
Kopp, Artyom (Ed.)Animal traits develop through the expression and action of numerous regulatory and realizator genes that comprise a gene regulatory network (GRN). For each GRN, its underlying patterns of gene expression are controlled by cis -regulatory elements (CREs) that bind activating and repressing transcription factors. These interactions drive cell-type and developmental stage-specific transcriptional activation or repression. Most GRNs remain incompletely mapped, and a major barrier to this daunting task is CRE identification. Here, we used an in silico method to identify predicted CREs (pCREs) that comprise the GRN which governs sex-specific pigmentation of Drosophila melanogaster . Through in vivo assays, we demonstrate that many pCREs activate expression in the correct cell-type and developmental stage. We employed genome editing to demonstrate that two CREs control the pupal abdomen expression of trithorax , whose function is required for the dimorphic phenotype. Surprisingly, trithorax had no detectable effect on this GRN’s key trans -regulators, but shapes the sex-specific expression of two realizator genes. Comparison of sequences orthologous to these CREs supports an evolutionary scenario where these trithorax CREs predated the origin of the dimorphic trait. Collectively, this study demonstrates how in silico approaches can shed novel insights on the GRN basis for a trait’s development and evolution.more » « less
An official website of the United States government

