skip to main content


Title: Identifying and retargeting transcriptional hot spots in the human genome
Abstract

Mammalian cell line development requires streamlined methodologies that will reduce both the cost and time to identify candidate cell lines. Improvements in site‐specific genomic editing techniques can result in flexible, predictable, and robust cell line engineering. However, an outstanding question in the field is the specific site of integration. Here, we seek to identify productive loci within the human genome that will result in stable, high expression of heterologous DNA. Using an unbiased, random integration approach and a green fluorescent reporter construct, we identify ten single‐integrant, recombinant human cell lines that exhibit stable, high‐level expression. From these cell lines, eight unique corresponding integration loci were identified. These loci are concentrated in non‐protein coding regions or intronic regions of protein coding genes. Expression mapping of the surrounding genes reveals minimal disruption of endogenous gene expression. Finally, we demonstrate that targeted de novo integration at one of the identified loci, the 12thexon‐intron region of theGRIK1gene on chromosome 21, results in superior expression and stability compared to the standard, illegitimate integration approach at levels approaching 4‐fold. The information identified here along with recent advances in site‐specific genomic editing techniques can lead to expedited cell line development.

 
more » « less
NSF-PAR ID:
10236635
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Biotechnology Journal
Volume:
11
Issue:
8
ISSN:
1860-6768
Format(s):
Medium: X Size: p. 1100-1109
Size(s):
["p. 1100-1109"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The Chinese hamster ovary (CHO) cell lines that are used to produce commercial quantities of therapeutic proteins commonly exhibit a decrease in productivity over time in culture, a phenomenon termed production instability. Random integration of the transgenes encoding the protein of interest into locations in the CHO genome that are vulnerable to genetic and epigenetic instability often causes production instability through copy number loss and silencing of expression. Several recent publications have shown that these cell line development challenges can be overcome by using site‐specific integration (SSI) technology to insert the transgenes at genomic loci, often called “hotspots,” that are transcriptionally permissive and have enhanced stability relative to the rest of the genome. However, extensive characterization of the CHO epigenome is needed to identify hotspots that maintain their desirable epigenetic properties in an industrial bioprocess environment and maximize transcription from a single integrated transgene copy. To this end, the epigenomes and transcriptomes of two distantly related cell lines, an industrially relevant monoclonal antibody‐producing cell line and its parental CHO‐K1 host, were characterized using high throughput chromosome conformation capture and RNAseq to analyze changes in the epigenome that occur during cell line development and associated changes in system‐wide gene expression. In total, 10.9% of the CHO genome contained transcriptionally permissive three‐dimensional chromatin structures with enhanced genetic and epigenetic stability relative to the rest of the genome. These safe harbor regions also showed good agreement with published CHO epigenome data, demonstrating that this method was suitable for finding genomic regions with epigenetic markers of active and stable gene expression. These regions significantly reduce the genomic search space when looking for CHO hotspots with widespread applicability and can guide future studies with the goal of maximizing the potential of SSI technology in industrial production CHO cell lines.

     
    more » « less
  2. Abstract

    Efforts to leverage clustered regularly interspaced short palindromic repeats/CRISPR‐associated protein 9 (CRISPR/Cas9) for targeted genomic modifications in mammalian cells are limited by low efficiencies and heterogeneous outcomes. To aid method optimization, we developed an all‐in‐one reporter system, including a novel superfolder orange fluorescent protein (sfOrange), to simultaneously quantify gene disruption, site‐specific integration (SSI), and random integration (RI). SSI strategies that utilize different donor plasmid formats and Cas9 nuclease variants were evaluated for targeting accuracy and efficiency in Chinese hamster ovary cells. Double‐cut and double‐nick donor formats significantly improved targeting accuracy by 2.3–8.3‐fold and 19–22‐fold, respectively, compared to standard circular donors. Notably, Cas9‐mediated donor linearization was associated with increased RI events, whereas donor nicking minimized RI without sacrificing SSI efficiency and avoided low‐fidelity outcomes. A screen of 10 molecules that modulate the major mammalian DNA repair pathways identified two inhibitors that further enhance targeting accuracy and efficiency to achieve SSI in 25% of transfected cells without selection. The optimized methods integrated transgene expression cassettes with 96% efficiency at a single locus and with 53%–55% efficiency at two loci simultaneously in selected clones. The CRISPR‐based tools and methods developed here could inform the use of CRISPR/Cas9 in mammalian cell lines, accelerate mammalian cell line engineering, and support advanced recombinant protein production applications.

     
    more » « less
  3. INTRODUCTION Genome-wide association studies (GWASs) have identified thousands of human genetic variants associated with diverse diseases and traits, and most of these variants map to noncoding loci with unknown target genes and function. Current approaches to understand which GWAS loci harbor causal variants and to map these noncoding regulators to target genes suffer from low throughput. With newer multiancestry GWASs from individuals of diverse ancestries, there is a pressing and growing need to scale experimental assays to connect GWAS variants with molecular mechanisms. Here, we combined biobank-scale GWASs, massively parallel CRISPR screens, and single-cell sequencing to discover target genes of noncoding variants for blood trait loci with systematic targeting and inhibition of noncoding GWAS loci with single-cell sequencing (STING-seq). RATIONALE Blood traits are highly polygenic, and GWASs have identified thousands of noncoding loci that map to candidate cis -regulatory elements (CREs). By combining CRE-silencing CRISPR perturbations and single-cell readouts, we targeted hundreds of GWAS loci in a single assay, revealing target genes in cis and in trans . For select CREs that regulate target genes, we performed direct variant insertion. Although silencing the CRE can identify the target gene, direct variant insertion can identify magnitude and direction of effect on gene expression for the GWAS variant. In select cases in which the target gene was a transcription factor or microRNA, we also investigated the gene-regulatory networks altered upon CRE perturbation and how these networks differ across blood cell types. RESULTS We inhibited candidate CREs from fine-mapped blood trait GWAS variants (from ~750,000 individual of diverse ancestries) in human erythroid progenitors. In total, we targeted 543 variants (254 loci) mapping to candidate CREs, generating multimodal single-cell data including transcriptome, direct CRISPR gRNA capture, and cell surface proteins. We identified target genes in cis (within 500 kb) for 134 CREs. In most cases, we found that the target gene was the closest gene and that specific enhancer-associated biochemical hallmarks (H3K27ac and accessible chromatin) are essential for CRE function. Using multiple perturbations at the same locus, we were able to distinguished between causal variants from noncausal variants in linkage disequilibrium. For a subset of validated CREs, we also inserted specific GWAS variants using base-editing STING-seq (beeSTING-seq) and quantified the effect size and direction of GWAS variants on gene expression. Given our transcriptome-wide data, we examined dosage effects in cis and trans in cases in which the cis target is a transcription factor or microRNA. We found that trans target genes are also enriched for GWAS loci, and identified gene clusters within trans gene networks with distinct biological functions and expression patterns in primary human blood cells. CONCLUSION In this work, we investigated noncoding GWAS variants at scale, identifying target genes in single cells. These methods can help to address the variant-to-function challenges that are a barrier for translation of GWAS findings (e.g., drug targets for diseases with a genetic basis) and greatly expand our ability to understand mechanisms underlying GWAS loci. Identifying causal variants and their target genes with STING-seq. Uncovering causal variants and their target genes or function are a major challenge for GWASs. STING-seq combines perturbation of noncoding loci with multimodal single-cell sequencing to profile hundreds of GWAS loci in parallel. This approach can identify target genes in cis and trans , measure dosage effects, and decipher gene-regulatory networks. 
    more » « less
  4. INTRODUCTION Diverse phenotypes, including large brains relative to body size, group living, and vocal learning ability, have evolved multiple times throughout mammalian history. These shared phenotypes may have arisen repeatedly by means of common mechanisms discernible through genome comparisons. RATIONALE Protein-coding sequence differences have failed to fully explain the evolution of multiple mammalian phenotypes. This suggests that these phenotypes have evolved at least in part through changes in gene expression, meaning that their differences across species may be caused by differences in genome sequence at enhancer regions that control gene expression in specific tissues and cell types. Yet the enhancers involved in phenotype evolution are largely unknown. Sequence conservation–based approaches for identifying such enhancers are limited because enhancer activity can be conserved even when the individual nucleotides within the sequence are poorly conserved. This is due to an overwhelming number of cases where nucleotides turn over at a high rate, but a similar combination of transcription factor binding sites and other sequence features can be maintained across millions of years of evolution, allowing the function of the enhancer to be conserved in a particular cell type or tissue. Experimentally measuring the function of orthologous enhancers across dozens of species is currently infeasible, but new machine learning methods make it possible to make reliable sequence-based predictions of enhancer function across species in specific tissues and cell types. RESULTS To overcome the limits of studying individual nucleotides, we developed the Tissue-Aware Conservation Inference Toolkit (TACIT). Rather than measuring the extent to which individual nucleotides are conserved across a region, TACIT uses machine learning to test whether the function of a given part of the genome is likely to be conserved. More specifically, convolutional neural networks learn the tissue- or cell type–specific regulatory code connecting genome sequence to enhancer activity using candidate enhancers identified from only a few species. This approach allows us to accurately associate differences between species in tissue or cell type–specific enhancer activity with genome sequence differences at enhancer orthologs. We then connect these predictions of enhancer function to phenotypes across hundreds of mammals in a way that accounts for species’ phylogenetic relatedness. We applied TACIT to identify candidate enhancers from motor cortex and parvalbumin neuron open chromatin data that are associated with brain size relative to body size, solitary living, and vocal learning across 222 mammals. Our results include the identification of multiple candidate enhancers associated with brain size relative to body size, several of which are located in linear or three-dimensional proximity to genes whose protein-coding mutations have been implicated in microcephaly or macrocephaly in humans. We also identified candidate enhancers associated with the evolution of solitary living near a gene implicated in separation anxiety and other enhancers associated with the evolution of vocal learning ability. We obtained distinct results for bulk motor cortex and parvalbumin neurons, demonstrating the value in applying TACIT to both bulk tissue and specific minority cell type populations. To facilitate future analyses of our results and applications of TACIT, we released predicted enhancer activity of >400,000 candidate enhancers in each of 222 mammals and their associations with the phenotypes we investigated. CONCLUSION TACIT leverages predicted enhancer activity conservation rather than nucleotide-level conservation to connect genetic sequence differences between species to phenotypes across large numbers of mammals. TACIT can be applied to any phenotype with enhancer activity data available from at least a few species in a relevant tissue or cell type and a whole-genome alignment available across dozens of species with substantial phenotypic variation. Although we developed TACIT for transcriptional enhancers, it could also be applied to genomic regions involved in other components of gene regulation, such as promoters and splicing enhancers and silencers. As the number of sequenced genomes grows, machine learning approaches such as TACIT have the potential to help make sense of how conservation of, or changes in, subtle genome patterns can help explain phenotype evolution. Tissue-Aware Conservation Inference Toolkit (TACIT) associates genetic differences between species with phenotypes. TACIT works by generating open chromatin data from a few species in a tissue related to a phenotype, using the sequences underlying open and closed chromatin regions to train a machine learning model for predicting tissue-specific open chromatin and associating open chromatin predictions across dozens of mammals with the phenotype. [Species silhouettes are from PhyloPic] 
    more » « less
  5. Abstract

    Identification of genes underlying genomic signatures of natural selection is key to understanding adaptation to local conditions. We used targeted resequencing to identifySNPmarkers in 5321 candidate adaptive genes associated with known immunological, metabolic and growth functions in ovids and other ungulates. We selectively targeted 8161 exons in protein‐coding and nearby 5′ and 3′ untranslated regions of chosen candidate genes. Targeted sequences were taken from bighorn sheep (Ovis canadensis) exon capture data and directly from the domestic sheep genome (Ovis ariesv. 3; oviAri3). The bighorn sheep sequences used in the Dall's sheep (Ovis dalli dalli) exon capture aligned to 2350 genes on the oviAri3 genome with an average of 2 exons each. We developed a microfluidic qPCR‐basedSNPchip to genotype 476 Dall's sheep from locations across their range and test for patterns of selection. Using multiple corroborating approaches (lositanandbayescan), we detected 28SNPloci potentially under selection. We additionally identified candidate loci significantly associated with latitude, longitude, precipitation and temperature, suggesting local environmental adaptation. The three methods demonstrated consistent support for natural selection on nine genes with immune and disease‐regulating functions (e.g. Ovar‐DRA,APC,BATF2,MAGEB18), cell regulation signalling pathways (e.g.KRIT1,PI3K,ORRC3), and respiratory health (CYSLTR1). Characterizing adaptive allele distributions from novel genetic techniques will facilitate investigation of the influence of environmental variation on local adaptation of a northern alpine ungulate throughout its range. This research demonstrated the utility of exon capture for gene‐targetedSNPdiscovery and subsequentSNPchip genotyping using low‐quality samples in a nonmodel species.

     
    more » « less