skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Yang, Yiben"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Background In the CRISPR-Cas9 system, the efficiency of genetic modifications has been found to vary depending on the single guide RNA (sgRNA) used. A variety of sgRNA properties have been found to be predictive of CRISPR cleavage efficiency, including the position-specific sequence composition of sgRNAs, global sgRNA sequence properties, and thermodynamic features. While prevalent existing deep learning-based approaches provide competitive prediction accuracy, a more interpretable model is desirable to help understand how different features may contribute to CRISPR-Cas9 cleavage efficiency. Results We propose a gradient boosting approach, utilizing LightGBM to develop an integrated tool, BoostMEC (Boosting Model for Efficient CRISPR), for the prediction of wild-type CRISPR-Cas9 editing efficiency. We benchmark BoostMEC against 10 popular models on 13 external datasets and show its competitive performance. Conclusions BoostMEC can provide state-of-the-art predictions of CRISPR-Cas9 cleavage efficiency for sgRNA design and selection. Relying on direct and derived sequence features of sgRNA sequences and based on conventional machine learning, BoostMEC maintains an advantage over other state-of-the-art CRISPR efficiency prediction models that are based on deep learning through its ability to produce more interpretable feature insights and predictions. 
    more » « less