skip to main content


Search for: All records

Award ID contains: 1736123

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Background

    To select the most complete, continuous, and accurate assembly for an organism of interest, comprehensive quality assessment of assemblies is necessary. We present a novel tool, called Evaluation of De Novo Assemblies (EvalDNA), which uses supervised machine learning for the quality scoring of genome assemblies and does not require an existing reference genome for accuracy assessment.

    Results

    EvalDNA calculates a list of quality metrics from an assembled sequence and applies a model created from supervised machine learning methods to integrate various metrics into a comprehensive quality score. A well-tested, accurate model for scoring mammalian genome sequences is provided as part of EvalDNA. This random forest regression model evaluates an assembled sequence based on continuity, completeness, and accuracy, and was able to explain 86% of the variation in reference-based quality scores within the testing data. EvalDNA was applied to human chromosome 14 assemblies from the GAGE study to rank genome assemblers and to compare EvalDNA to two other quality evaluation tools. In addition, EvalDNA was used to evaluate several genome assemblies of the Chinese hamster genome to help establish a better reference genome for the biopharmaceutical manufacturing community. EvalDNA was also used to assess more recent human assemblies from the QUAST-LG study completed in 2018, and its ability to score bacterial genomes was examined through application on bacterial assemblies from the GAGE-B study.

    Conclusions

    EvalDNA enables scientists to easily identify the best available genome assembly for their organism of interest without requiring a reference assembly. EvalDNA sets itself apart from other quality assessment tools by producing a quality score that enables direct comparison among assemblies from different species.

     
    more » « less
  2. Abstract

    The biomanufacturing industry is advancing toward continuous processes that will involve longer culture durations and older cell ages. These upstream trends may bring unforeseen challenges for downstream purification due to fluctuations in host cell protein (HCP) levels. To understand the extent of HCP expression instability exhibited by Chinese hamster ovary (CHO) cells over these time scales, an industry‐wide consortium collaborated to develop a study to characterize age‐dependent changes in HCP levels across 30, 60, and 90 cell doublings, representing a period of approximately 60 days. A monoclonal antibody (mAb)‐producing cell line with bulk productivity up to 3 g/L in a bioreactor was aged in parallel with its parental CHO‐K1 host. Subsequently, both cell types at each age were cultivated in an automated bioreactor system to generate harvested cell culture fluid (HCCF) for HCP analysis. More than 1500 HCPs were quantified using complementary proteomic techniques, two‐dimensional electrophoresis (2DE) and liquid chromatography coupled with tandem mass spectrometry (LC‐MS/MS). While up to 13% of proteins showed variable expression with age, more changes were observed when comparing between the two cell lines with up to 47% of HCPs differentially expressed. A small subset (50 HCPs) with age‐dependent expression were previously reported to be problematic as high‐risk and/or difficult‐to‐remove impurities; however, the vast majority of these were downregulated with age. Our findings suggest that HCP expression changes over this time scale may not be as dramatic and pose as great of a challenge to downstream processing as originally expected but that monitoring of variably expressed problematic HCPs remains critical.

     
    more » « less
  3. Abstract

    The ambr250 high-throughput bioreactor platform was adopted to provide a highly-controlled environment for a project investigating genome instability in Chinese hamster ovary (CHO) cells, where genome instability leads to lower protein productivity. Development of the baseline (control) and stressed process conditions highlighted the need to control critical process parameters, including the proportional, integral, and derivative (PID) control loops. Process parameters that are often considered scale-independent, include dissolved oxygen (DO) and pH; however, these parameters were observed to be sensitive to PID settings. For many bioreactors, control loops are cascaded such that the manipulated variables are adjusted concurrently. Conversely, for the ambr250 bioreactor system, the control levels are segmented and implemented sequentially. Consequently, each control level must be tuned independently, as the PID settings are independent by control level. For the CHO cell studies, it was observed that initial PID settings did not resulted in a robust process, which was observed as elevated lactate levels; which was caused by the pH being above the setpoint most of the experiment. After several PID tuning iterations, new PID settings were found that could respond appropriately to routine feed and antifoam additions. Furthermore, these new PID settings resulted in more robust cell growth and increased protein productivity. This work highlights the need to describe PID gains and manipulated variable ranges, as profoundly different outcomes can result from the same feeding protocol. Additionally, improved process models are needed to allow process simulations and tuning. Thus, these tuning experiments support the idea that PID settings should be fully described in bioreactor publications to allow for better reproducibility of results.

     
    more » « less
  4. Abstract

    Human‐induced pluripotent stem cells (iPSCs) hold the promise to improve cell‐based therapies. Yet, to meet rising demands and become clinically impactful, sufficient high‐quality iPSCs in quantity must be generated, a task that exceeds current capabilities. In this study, K3 iPSCs cultures were examined using parallel‐labeling metabolic flux analysis (13C‐MFA) to quantify intracellular fluxes at relevant bioprocessing stages: glucose concentrations representative of initial media concentrations and high lactate concentrations representative of fed‐batch culture conditions, prior to and after bolus glucose feeds. The glucose and lactate concentrations are also representative of concentrations that might be encountered at different locations within 3D cell aggregates. Furthermore, a novel method was developed to allow the isotopic tracer [U‐13C3] lactate to be used in the13C‐MFA model. The results indicated that high extracellular lactate concentrations decreased glucose consumption and lactate production, while glucose concentrations alone did not affect rates of aerobic glycolysis. Moreover, for the high lactate cultures, lactate was used as a metabolic substrate to support oxidative mitochondrial metabolism. These results demonstrate that iPSCs have metabolic flexibility and possess the capacity to metabolize lactate to support exponential growth, and that high lactate concentrations alone do not adversely impact iPSC proliferation.

     
    more » « less
  5. Abstract

    The Chinese hamster ovary (CHO) cell lines that are used to produce commercial quantities of therapeutic proteins commonly exhibit a decrease in productivity over time in culture, a phenomenon termed production instability. Random integration of the transgenes encoding the protein of interest into locations in the CHO genome that are vulnerable to genetic and epigenetic instability often causes production instability through copy number loss and silencing of expression. Several recent publications have shown that these cell line development challenges can be overcome by using site‐specific integration (SSI) technology to insert the transgenes at genomic loci, often called “hotspots,” that are transcriptionally permissive and have enhanced stability relative to the rest of the genome. However, extensive characterization of the CHO epigenome is needed to identify hotspots that maintain their desirable epigenetic properties in an industrial bioprocess environment and maximize transcription from a single integrated transgene copy. To this end, the epigenomes and transcriptomes of two distantly related cell lines, an industrially relevant monoclonal antibody‐producing cell line and its parental CHO‐K1 host, were characterized using high throughput chromosome conformation capture and RNAseq to analyze changes in the epigenome that occur during cell line development and associated changes in system‐wide gene expression. In total, 10.9% of the CHO genome contained transcriptionally permissive three‐dimensional chromatin structures with enhanced genetic and epigenetic stability relative to the rest of the genome. These safe harbor regions also showed good agreement with published CHO epigenome data, demonstrating that this method was suitable for finding genomic regions with epigenetic markers of active and stable gene expression. These regions significantly reduce the genomic search space when looking for CHO hotspots with widespread applicability and can guide future studies with the goal of maximizing the potential of SSI technology in industrial production CHO cell lines.

     
    more » « less
  6. Abstract

    The Chinese hamster genome serves as a reference genome for the study of Chinese hamster ovary (CHO) cells, the preferred host system for biopharmaceutical production. Recent re‐sequencing of the Chinese hamster genome resulted in the RefSeq PICR meta‐assembly, a set of highly accurate scaffolds that filled over 95% of the gaps in previous assembly versions. However, these scaffolds did not reach chromosome‐scale due to the absence of long‐range scaffolding information during the meta‐assembly process. Here, long‐range scaffolding of the PICR Chinese hamster genome assembly was performed using high‐throughput chromosome conformation capture (Hi‐C). This process resulted in a new “PICRH” genome, where 97% of the genome is contained in 11 mega‐scaffolds corresponding to the Chinese hamster chromosomes (2n = 22) and the total number of scaffolds is reduced by three‐fold from 1,830 scaffolds in PICR to 647 in PICRH. Continuity was improved while preserving accuracy, leading to quality scores higher than recent builds of mouse chromosomes and comparable to human chromosomes. The PICRH genome assembly will be an indispensable tool for designing advanced genetic engineering strategies in CHO cells and enabling systematic examination of genomic and epigenomic instability through comparative analysis of CHO cell lines on a common set of chromosomal coordinates.

     
    more » « less
  7. Free, publicly-accessible full text available July 1, 2025
  8. Transcription factor (TF)–promoter pairs have been repurposed from native hosts to provide tools to measure intracellular biochemical production titer and dynamically control gene expression. Most often, native TF–promoter systems require rigorous screening to obtain desirable characteristics optimized for biotechnological applications. High-throughput techniques may provide a rational and less labor-intensive strategy to engineer user-defined TF–promoter pairs using fluorescence-activated cell sorting and deep sequencing methods (sort-seq). Based on the designed promoter library’s distribution characteristics, we elucidate sequence–function interactions between the TF and DNA. In this work, we use the sort-seq method to study the sequence–function relationship of a σ54-dependent, butanol-responsive TF–promoter pair, BmoR-PBMO derived from Thauera butanivorans, at the nucleotide level to improve biosensor characteristics, specifically an improved dynamic range. Activities of promoters from a mutagenized PBMO library were sorted based on gfp expression and subsequently deep sequenced to correlate site-specific sequences with changes in dynamic range. We identified site-specific mutations that increase the sensor output. Double mutant and a single mutant, CA(129,130)TC and G(205)A, in PBMO promoter increased dynamic ranges of 4-fold and 1.65-fold compared with the native system, respectively. In addition, sort-seq identified essential sites required for the proper function of the σ54-dependent promoter biosensor in the context of the host. This work can enable high-throughput screening methods for strain development. 
    more » « less
  9. Chinese hamster ovary (CHO) cells are widely used for mass production of therapeutic proteins in the pharmaceutical industry. With the growing need in optimizing the performance of producer CHO cell lines, research on CHO cell line development and bioprocess continues to increase in recent decades. Bibliographic mapping and classification of relevant research studies will be essential for identifying research gaps and trends in literature. To qualitatively and quantitatively understand the CHO literature, we have conducted topic modeling using a CHO bioprocess bibliome manually compiled in 2016, and compared the topics uncovered by the Latent Dirichlet Allocation (LDA) models with the human labels of the CHO bibliome. The results show a significant overlap between the manually selected categories and computationally generated topics, and reveal the machine-generated topic-specific characteristics. To identify relevant CHO bioprocessing papers from new scientific literature, we have developed a supervised learning model, Logistic Regression, to identify specific article topics and evaluated the results using three CHO bibliome datasets, Bioprocessing set, Glycosylation set, and Phenotype set. The use of top terms as features supports the explainability of document classification results to yield insights on new CHO bioprocessing papers. 
    more » « less