skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: BoostMEC: predicting CRISPR-Cas9 cleavage efficiency through boosting models
Abstract Background In the CRISPR-Cas9 system, the efficiency of genetic modifications has been found to vary depending on the single guide RNA (sgRNA) used. A variety of sgRNA properties have been found to be predictive of CRISPR cleavage efficiency, including the position-specific sequence composition of sgRNAs, global sgRNA sequence properties, and thermodynamic features. While prevalent existing deep learning-based approaches provide competitive prediction accuracy, a more interpretable model is desirable to help understand how different features may contribute to CRISPR-Cas9 cleavage efficiency. Results We propose a gradient boosting approach, utilizing LightGBM to develop an integrated tool, BoostMEC (Boosting Model for Efficient CRISPR), for the prediction of wild-type CRISPR-Cas9 editing efficiency. We benchmark BoostMEC against 10 popular models on 13 external datasets and show its competitive performance. Conclusions BoostMEC can provide state-of-the-art predictions of CRISPR-Cas9 cleavage efficiency for sgRNA design and selection. Relying on direct and derived sequence features of sgRNA sequences and based on conventional machine learning, BoostMEC maintains an advantage over other state-of-the-art CRISPR efficiency prediction models that are based on deep learning through its ability to produce more interpretable feature insights and predictions.  more » « less
Award ID(s):
1764421
PAR ID:
10427889
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
BMC Bioinformatics
Volume:
23
Issue:
1
ISSN:
1471-2105
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We report the development of post-transcriptional chemical methods that enable control over CRISPR–Cas9 gene editing activity both in in vitro assays and in living cells. We show that an azide-substituted acyl imidazole reagent (NAI-N 3 ) efficiently acylates CRISPR single guide RNAs (sgRNAs) in 20 minutes in buffer. Poly-acylated (“cloaked”) sgRNA was completely inactive in DNA cleavage with Cas9 in vitro , and activity was quantitatively restored after phosphine treatment. Delivery of cloaked sgRNA and Cas9 mRNA into HeLa cells was enabled by the use of charge-altering releasable transporters (CARTs), which outperformed commercial transfection reagents in transfecting sgRNA co-complexed with Cas9 encoding functional mRNA. Genomic DNA cleavage in the cells by CRISPR–Cas9 was efficiently restored after treatment with phosphine to remove the blocking acyl groups. Our results highlight the utility of reversible RNA acylation as a novel method for temporal control of genome-editing function. 
    more » « less
  2. Abstract Macrophages are key effectors of host defense and metabolism, making them promising targets for transient genetic therapy. Gene editing through the delivery of Cas9‐ribonucleoprotein (RNP) provides multiple advantages over gene delivery–based strategies for introducing CRISPR machinery to the cell. There are, however, significant physiological, cellular, and intracellular barriers to the effective delivery of the Cas9 protein and guide RNA (sgRNA) that have to date, restricted in vivo Cas9 protein–based approaches to local/topical delivery applications. Described herein is a new nanoassembled platform featuring coengineered nanoparticles and Cas9 protein that has been developed to provide efficient Cas9‐sgRNA delivery and concomitant CRISPR editing through systemic tail‐vein injection into mice, achieving >8% gene editing efficiency in macrophages of the liver and spleen. 
    more » « less
  3. Abstract The past decade has witnessed a rapid evolution in identifying more versatile clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein (Cas) nucleases and their functional variants, as well as in developing precise CRISPR/Cas-derived genome editors. The programmable and robust features of the genome editors provide an effective RNA-guided platform for fundamental life science research and subsequent applications in diverse scenarios, including biomedical innovation and targeted crop improvement. One of the most essential principles is to guide alterations in genomic sequences or genes in the intended manner without undesired off-target impacts, which strongly depends on the efficiency and specificity of single guide RNA (sgRNA)-directed recognition of targeted DNA sequences. Recent advances in empirical scoring algorithms and machine learning models have facilitated sgRNA design and off-target prediction. In this review, we first briefly introduce the different features of CRISPR/Cas tools that should be taken into consideration to achieve specific purposes. Secondly, we focus on the computer-assisted tools and resources that are widely used in designing sgRNAs and analyzing CRISPR/Cas-induced on- and off-target mutations. Thirdly, we provide insights into the limitations of available computational tools that would help researchers of this field for further optimization. Lastly, we suggest a simple but effective workflow for choosing and applying web-based resources and tools for CRISPR/Cas genome editing. 
    more » « less
  4. Canonical CRISPR-Cas9 genome editing technique has profoundly impacted the fields of plant biology, biotechnology, and crop improvement. Since non-homologous end joining (NHEJ) is usually considered to generate random indels, its high efficiency mutation is generally not pertinent to precise editing. Homology-directed repair (HDR) can mediate precise editing with supplied donor DNA, but it suffers from extreme low efficiency in higher plants. Therefore, precision editing in plants will be facilitated by the ability to predict NHEJ repair outcome and to improve HDR efficiency. Here, we report that NHEJ-mediated single nucleotide insertion at different rice genes is predictable based on DNA sequences at the target loci. Three mutation prediction tools (inDelphi, FORECasT, and SPROUT) have been validated in the rice plant system. We also evaluated the chimeric guide RNA (cgRNA) and Cas9-Retron precISe Parallel Editing via homologY (CRISPEY) strategies to facilitate donor template supply for improving HDR efficiency in Nicotiana benthamiana and rice. However, neither cgRNA nor CRISPEY improved plant HDR editing efficiency in this study. Interestingly, our data indicate that tethering of 200–250 nucleotides long sequence to either 5′ or 3′ ends of guide RNA did not significantly affect Cas9 cleavage activity. 
    more » « less
  5. Introducing interpretability and reasoning into Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) analysis is challenging given the complexity of gigapixel slides. Traditionally MIL interpretability is limited to identifying salient regions deemed pertinent for downstream tasks offering little insight to the end-user (pathologist) regarding the rationale behind these selections. To address this we propose Self-Interpretable MIL (SI-MIL) a method intrinsically designed for interpretability from the very outset. SI-MIL employs a deep MIL framework to guide an interpretable branch grounded on handcrafted pathological features facilitating linear predictions. Beyond identifying salient regions SI-MIL uniquely provides feature-level interpretations rooted in pathological insights for WSIs. Notably SI-MIL with its linear prediction constraints challenges the prevalent myth of an inevitable trade-off between model interpretability and performance demonstrating competitive results compared to state-of-the-art methods on WSI-level prediction tasks across three cancer types. In addition we thoroughly benchmark the local- and global-interpretability of SI-MIL in terms of statistical analysis a domain expert study and desiderata of interpretability namely user-friendliness and faithfulness. 
    more » « less