In experiments, the distributions of mRNA or protein numbers in single cells are often fitted to the random telegraph model which includes synthesis and decay of mRNA or protein, and switching of the gene between active and inactive states. While commonly used, this model does not describe how fluctuations are influenced by crucial biological mechanisms such as feedback regulation, non-exponential gene inactivation durations, and multiple gene activation pathways. Here we investigate the dynamical properties of four relatively complex gene expression models by fitting their steady-state mRNA or protein number distributions to the simple telegraph model. We show that despite the underlying complex biological mechanisms, the telegraph model with three effective parameters can accurately capture the steady-state gene product distributions, as well as the conditional distributions in the active gene state, of the complex models. Some effective parameters are reliable and can reflect realistic dynamic behaviors of the complex models, while others may deviate significantly from their real values in the complex models. The effective parameters can also be applied to characterize the capability for a complex model to exhibit multimodality. Using additional information such as single-cell data at multiple time points, we provide an effective method of distinguishing the complex models from the telegraph model. Furthermore, using measurements under varying experimental conditions, we show that fitting the mRNA or protein number distributions to the telegraph model may even reveal the underlying gene regulation mechanisms of the complex models. The effectiveness of these methods is confirmed by analysis of single-cell data for
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Finley, Stacey D (Ed.)
E. coli and mammalian cells. All these results are robust with respect to cooperative transcriptional regulation and extrinsic noise. In particular, we find that faster relaxation speed to the steady state results in more precise parameter inference under large extrinsic noise.Free, publicly-accessible full text available May 14, 2025 -
Abstract Objective . The safe delivery of electrical current to neural tissue depends on many factors, yet previous methods for predicting tissue damage rely on only a few stimulation parameters. Here, we report the development of a machine learning approach that could lead to a more reliable method for predicting electrical stimulation-induced tissue damage by incorporating additional stimulation parameters.Approach . A literature search was conducted to build an initial database of tissue response information after electrical stimulation, categorized as either damaging or non-damaging. Subsequently, we used ordinal encoding and random forest for feature selection, and investigated four machine learning models for classification: Logistic Regression, K-nearest Neighbor, Random Forest, and Multilayer Perceptron. Finally, we compared the results of these models against the accuracy of the Shannon equation.Main Results . We compiled a database with 387 unique stimulation parameter combinations collected from 58 independent studies conducted over a period of 47 years, with 195 (51%) categorized as non-damaging and 190 (49%) categorized as damaging. The features selected for building our model with a Random Forest algorithm were: waveform shape, geometric surface area, pulse width, frequency, pulse amplitude, charge per phase, charge density, current density, duty cycle, daily stimulation duration, daily number of pulses delivered, and daily accumulated charge. The Shannon equation yielded an accuracy of 63.9% using ak value of 1.79. In contrast, the Random Forest algorithm was able to robustly predict whether a set of stimulation parameters was classified as damaging or non-damaging with an accuracy of 88.3%.Significance . This novel Random Forest model can facilitate more informed decision making in the selection of neuromodulation parameters for both research studies and clinical practice. This study represents the first approach to use machine learning in the prediction of stimulation-induced neural tissue damage, and lays the groundwork for neurostimulation driven by machine learning models. -
Csikász-Nagy, Attila (Ed.)
The cell cycle consists of a series of orchestrated events controlled by molecular sensing and feedback networks that ultimately drive the duplication of total DNA and the subsequent division of a single parent cell into two daughter cells. The ability to block the cell cycle and synchronize cells within the same phase has helped understand factors that control cell cycle progression and the properties of each individual phase. Intriguingly, when cells are released from a synchronized state, they do not maintain synchronized cell division and rapidly become asynchronous. The rate and factors that control cellular desynchronization remain largely unknown. In this study, using a combination of experiments and simulations, we investigate the desynchronization properties in cervical cancer cells (HeLa) starting from the G1/S boundary following double-thymidine block. Propidium iodide (PI) DNA staining was used to perform flow cytometry cell cycle analysis at regular 8 hour intervals, and a custom auto-similarity function to assess the desynchronization and quantify the convergence to an asynchronous state. In parallel, we developed a single-cell phenomenological model the returns the DNA amount across the cell cycle stages and fitted the parameters using experimental data. Simulations of population of cells reveal that the cell cycle desynchronization rate is primarily sensitive to the variability of cell cycle duration within a population. To validate the model prediction, we introduced lipopolysaccharide (LPS) to increase cell cycle noise. Indeed, we observed an increase in cell cycle variability under LPS stimulation in HeLa cells, accompanied with an enhanced rate of cell cycle desynchronization. Our results show that the desynchronization rate of artificially synchronized in-phase cell populations can be used a proxy of the degree of variance in cell cycle periodicity, an underexplored axis in cell cycle research.
-
Solitary fibrous tumor (SFT) is a rare soft-tissue sarcoma. This nonhereditary cancer is the result of an environmental intrachromosomal gene fusion between NAB2 and STAT6 on chromosome 12, which fuses the activation domain of STAT6 with the repression domain of NAB2. Currently there is not an approved chemotherapy regimen for SFTs. The best response on available pharmaceuticals is a partial response or stable disease for several months. The purpose of this study is to investigate the potential of RNA-based therapies for the treatment of SFTs. Specifically, in vitro SFT cell models were engineered to harbor the characteristic NAB2–STAT6 fusion using the CRISPR/SpCas9 system. Cell migration as well as multiple cancer-related signaling pathways were increased in the engineered cells as compared to the fusion-absent parent cells. The SFT cell models were then used for evaluating the targeting efficacies of NAB2–STAT6 fusion-specific antisense oligonucleotides (ASOs) and CRISPR/CasRx systems. Our results showed that fusion specific ASO treatments caused a 58% reduction in expression of fusion transcripts and a 22% reduction in cell proliferation after 72 h in vitro. Similarly, the AAV2-mediated CRISPR/CasRx system led to a 59% reduction in fusion transcript expressions in vitro, and a 55% reduction in xenograft growth after 29 days ex vivo.more » « less
-
Abstract Two common hemoglobinopathies, sickle cell disease (SCD) and β-thalassemia, arise from genetic mutations within the β-globin gene. In this work, we identified a 500-bp motif (Fetal Chromatin Domain, FCD) upstream of human ϒ-globin locus and showed that the removal of this motif using CRISPR technology reactivates the expression of ϒ-globin. Next, we present two different cell morphology-based machine learning approaches that can be used identify human blood cells (KU-812) that harbor CRISPR-mediated FCD genetic modifications. Three candidate models from the first approach, which uses multilayer perceptron algorithm (MLP 20-26, MLP26-18, and MLP 30-26) and flow cytometry-derived cellular data, yielded 0.83 precision, 0.80 recall, 0.82 accuracy, and 0.90 area under the ROC (receiver operating characteristic) curve when predicting the edited cells. In comparison, the candidate model from the second approach, which uses deep learning (T2D5) and DIC microscopy-derived imaging data, performed with less accuracy (0.80) and ROC AUC (0.87). We envision that equivalent machine learning-based models can complement currently available genotyping protocols for specific genetic modifications which result in morphological changes in human cells.
-
Abstract Herein, we implement and access machine learning architectures to ascertain models that differentiate healthy from apoptotic cells using exclusively forward (FSC) and side (SSC) scatter flow cytometry information. To generate training data, colorectal cancer HCT116 cells were subjected to miR-34a treatment and then classified using a conventional Annexin V/propidium iodide (PI)-staining assay. The apoptotic cells were defined as Annexin V-positive cells, which include early and late apoptotic cells, necrotic cells, as well as other dying or dead cells. In addition to fluorescent signal, we collected cell size and granularity information from the FSC and SSC parameters. Both parameters are subdivided into area, height, and width, thus providing a total of six numerical features that informed and trained our models. A collection of logistical regression, random forest, k-nearest neighbor, multilayer perceptron, and support vector machine was trained and tested for classification performance in predicting cell states using only the six aforementioned numerical features. Out of 1046 candidate models, a multilayer perceptron was chosen with 0.91 live precision, 0.93 live recall, 0.92 live
f value and 0.97 live area under the ROC curve when applied on standardized data. We discuss and highlight differences in classifier performance and compare the results to the standard practice of forward and side scatter gating, typically performed to select cells based on size and/or complexity. We demonstrate that our model, a ready-to-use module for any flow cytometry-based analysis, can provide automated, reliable, and stain-free classification of healthy and apoptotic cells using exclusively size and granularity information.