skip to main content


Title: Systematic identification of safe harbor regions in the CHO genome through a comprehensive epigenome analysis
Abstract

The Chinese hamster ovary (CHO) cell lines that are used to produce commercial quantities of therapeutic proteins commonly exhibit a decrease in productivity over time in culture, a phenomenon termed production instability. Random integration of the transgenes encoding the protein of interest into locations in the CHO genome that are vulnerable to genetic and epigenetic instability often causes production instability through copy number loss and silencing of expression. Several recent publications have shown that these cell line development challenges can be overcome by using site‐specific integration (SSI) technology to insert the transgenes at genomic loci, often called “hotspots,” that are transcriptionally permissive and have enhanced stability relative to the rest of the genome. However, extensive characterization of the CHO epigenome is needed to identify hotspots that maintain their desirable epigenetic properties in an industrial bioprocess environment and maximize transcription from a single integrated transgene copy. To this end, the epigenomes and transcriptomes of two distantly related cell lines, an industrially relevant monoclonal antibody‐producing cell line and its parental CHO‐K1 host, were characterized using high throughput chromosome conformation capture and RNAseq to analyze changes in the epigenome that occur during cell line development and associated changes in system‐wide gene expression. In total, 10.9% of the CHO genome contained transcriptionally permissive three‐dimensional chromatin structures with enhanced genetic and epigenetic stability relative to the rest of the genome. These safe harbor regions also showed good agreement with published CHO epigenome data, demonstrating that this method was suitable for finding genomic regions with epigenetic markers of active and stable gene expression. These regions significantly reduce the genomic search space when looking for CHO hotspots with widespread applicability and can guide future studies with the goal of maximizing the potential of SSI technology in industrial production CHO cell lines.

 
more » « less
Award ID(s):
1736123
NSF-PAR ID:
10236245
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Biotechnology and Bioengineering
Volume:
118
Issue:
2
ISSN:
0006-3592
Page Range / eLocation ID:
p. 659-675
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Chinese hamster ovary (CHO) cell lines are widely used to manufacture biopharmaceuticals. However, CHO cells are not an optimal expression host due to the intrinsic plasticity of the CHO genome. Genome plasticity can lead to chromosomal rearrangements, transgene exclusion, and phenotypic drift. A poorly understood genomic element of CHO cell line instability is extrachromosomal circular DNA (eccDNA) in gene expression and regulation. EccDNA can facilitate ultra-high gene expression and are found within many eukaryotes including humans, yeast, and plants. EccDNA confers genetic heterogeneity, providing selective advantages to individual cells in response to dynamic environments. In CHO cell cultures, maintaining genetic homogeneity is critical to ensuring consistent productivity and product quality. Understanding eccDNA structure, function, and microevolutionary dynamics under various culture conditions could reveal potential engineering targets for cell line optimization. In this study, eccDNA sequences were investigated at the beginning and end of two-week fed-batch cultures in an ambr ® 250 bioreactor under control and lactate-stressed conditions. This work characterized structure and function of eccDNA in a CHO-K1 clone. Gene annotation identified 1551 unique eccDNA genes including cancer driver genes and genes involved in protein production. Furthermore, RNA-seq data is integrated to identify transcriptionally active eccDNA genes. 
    more » « less
  2. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implemented a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx. 
    more » « less
  3. Abstract

    Chinese hamster ovary (CHO) cells are essential to biopharmaceutical manufacturing and production instability, the loss of productivity over time, is a long‐standing challenge in the industry. Accurate prediction of cell line stability could enable efficient screening to identify clones suitable for manufacturing saving significant time and costs. DNA repair genes may offer biomarkers to address this need. In this study, over 40 cell lines representing various host lineages from three companies/organizations were evaluated for expression of five DNA repair genes (Fam35a,Lig4,Palb2,Pari, andXrcc6). Expression measured in cells with less than 30 population doubling levels (PDLs) was correlated to stability profiles at 60+ PDL. Principal component analysis identified markers which separate stable and unstable CHO‐DG44 cell lines. Notably, two genes,Lig4andXrcc6, showed higher expression in unstable CHO‐DG44 cell lines with copy number loss identified as the mechanism of production instability. Expression levels across all cell ages showed lower DNA repair gene expression was associated with increased cell age. Collectively, DNA repair genes provide critical insight into long‐term behavior of CHO cells and their expression levels have potential to predict cell line stability in certain cases.

     
    more » « less
  4. Abstract

    Targeted gene knockout and site‐specific integration (SSI) are powerful genome editing techniques to improve the development of industrially relevant Chinese hamster ovary (CHO) cell lines. However, past efforts to perform SSI in CHO cells are characterized by low efficiencies. Moreover, numerous strategies proposed to boost SSI efficiency in mammalian cell types have yet to be evaluated head to head or in combination to appreciably boost efficiencies in CHO. To enable systematic and rapid optimization of genome editing methods, the SSIGNAL (site‐specificintegration andgenomealteration) reporter system is developed. This tool can analyze CRISPR (clustered regularly interspaced palindromic repeats)/Cas9 (CRISPR‐associated protein 9)‐mediated disruption activity alone or in conjunction with SSI efficiency. The reporter system uses green and red dual‐fluorescence signals to indicate genotype states within four days following transfection, facilitating rapid data acquisition via standard flow cytometry instrumentation. In addition to describing the design and development of the system, two of its applications are demonstrated by first comparing transfection conditions to maximize CRISPR/Cas9 activity and subsequently assessing the efficiency of several promising SSI strategies. Due to its sensitivity and versatility, the SSIGNAL reporter system may serve as a tool to advance genome editing technology.

     
    more » « less
  5. Abstract

    Mammalian cell line development requires streamlined methodologies that will reduce both the cost and time to identify candidate cell lines. Improvements in site‐specific genomic editing techniques can result in flexible, predictable, and robust cell line engineering. However, an outstanding question in the field is the specific site of integration. Here, we seek to identify productive loci within the human genome that will result in stable, high expression of heterologous DNA. Using an unbiased, random integration approach and a green fluorescent reporter construct, we identify ten single‐integrant, recombinant human cell lines that exhibit stable, high‐level expression. From these cell lines, eight unique corresponding integration loci were identified. These loci are concentrated in non‐protein coding regions or intronic regions of protein coding genes. Expression mapping of the surrounding genes reveals minimal disruption of endogenous gene expression. Finally, we demonstrate that targeted de novo integration at one of the identified loci, the 12thexon‐intron region of theGRIK1gene on chromosome 21, results in superior expression and stability compared to the standard, illegitimate integration approach at levels approaching 4‐fold. The information identified here along with recent advances in site‐specific genomic editing techniques can lead to expedited cell line development.

     
    more » « less