skip to main content


Title: ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides
Abstract Motivation

Anti-cancer peptides (ACPs) have recently emerged as promising therapeutic agents for cancer treatment. Due to the avalanche of protein sequence data in the post-genomic era, there is an urgent need to develop automated computational methods to enable fast and accurate identification of novel ACPs within the vast number of candidate proteins and peptides.

Results

To address this, we propose a novel predictor named Anti-Cancer peptide Predictor with Feature representation Learning (ACPred-FL) for accurate prediction of ACPs based on sequence information. More specifically, we develop an effective feature representation learning model, with which we can extract and learn a set of informative features from a pool of support vector machine-based models trained using sequence-based feature descriptors. By doing so, the class label information of data samples is fully utilized. To improve the feature representation, we further employ a two-step feature selection technique, resulting in a most informative five-dimensional feature vector for the final peptide representation. Experimental results show that such five features provide the most discriminative power for identifying ACPs than currently available feature descriptors, highlighting the effectiveness of the proposed feature representation learning approach. The developed ACPred-FL method significantly outperforms state-of-the-art methods.

Availability and implementation

The web-server of ACPred-FL is available at http://server.malab.cn/ACPred-FL.

Supplementary information

Supplementary data are available at Bioinformatics online.

 
more » « less
NSF-PAR ID:
10393444
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
34
Issue:
23
ISSN:
1367-4803
Page Range / eLocation ID:
p. 4007-4016
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Proteases are enzymes that cleave target substrate proteins by catalyzing the hydrolysis of peptide bonds between specific amino acids. While the functional proteolysis regulated by proteases plays a central role in the ‘life and death’ cellular processes, many of the corresponding substrates and their cleavage sites were not found yet. Availability of accurate predictors of the substrates and cleavage sites would facilitate understanding of proteases’ functions and physiological roles. Deep learning is a promising approach for the development of accurate predictors of substrate cleavage events.

    Results

    We propose DeepCleave, the first deep learning-based predictor of protease-specific substrates and cleavage sites. DeepCleave uses protein substrate sequence data as input and employs convolutional neural networks with transfer learning to train accurate predictive models. High predictive performance of our models stems from the use of high-quality cleavage site features extracted from the substrate sequences through the deep learning process, and the application of transfer learning, multiple kernels and attention layer in the design of the deep network. Empirical tests against several related state-of-the-art methods demonstrate that DeepCleave outperforms these methods in predicting caspase and matrix metalloprotease substrate-cleavage sites.

    Availability and implementation

    The DeepCleave webserver and source code are freely available at http://deepcleave.erc.monash.edu/.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Abstract Motivation

    Finding driver genes that are responsible for the aberrant proliferation rate of cancer cells is informative for both cancer research and the development of targeted drugs. The established experimental and computational methods are labor-intensive. To make algorithms feasible in real clinical settings, methods that can predict driver genes using less experimental data are urgently needed.

    Results

    We designed an effective feature selection method and used Support Vector Machines (SVM) to predict the essentiality of the potential driver genes in cancer cell lines with only 10 genes as features. The accuracy of our predictions was the highest in the Broad-DREAM Gene Essentiality Prediction Challenge. We also found a set of genes whose essentiality could be predicted much more accurately than others, which we called Accurately Predicted (AP) genes. Our method can serve as a new way of assessing the essentiality of genes in cancer cells.

    Availability and implementation

    The raw data that support the findings of this study are available at Synapse. https://www.synapse.org/#! Synapse: syn2384331/wiki/62825. Source code is available at GitHub. https://github.com/GuanLab/DREAM-Gene-Essentiality-Challenge.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Abstract Motivation

    Food-derived bioactive peptides (FBPs) have demonstrated their significance in pharmaceuticals, diets and nutraceuticals, benefiting public health and global ecology. While significant efforts have been made to discover FBPs and to elucidate the underlying bioactivity mechanisms, there is lack of a systemic study of sequence–structure–activity relationship of FBPs in a large dataset.

    Results

    Here, we construct a database of food-derived bioactive peptides (DFBP), containing a total of 6276 peptide entries in 31 types from different sources. Further, we develop a series of analysis tools for function discovery/repurposing, traceability, multifunctional bioactive exploration and physiochemical property assessment of peptides. Finally, we apply this database and data-mining techniques to discover new FBPs as potential drugs for cardiovascular diseases. The DFBP serves as a useful platform for not only the fundamental understanding of sequence–structure–activity of FBPs but also the design, discovery, and repurposing of peptide-based drugs, vaccines, materials and food ingredients.

    Availability and implementation

    DFBP service can be accessed freely via http://www.cqudfbp.net/. All data are incorporated into the article and its online supplementary material.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  4. Abstract Motivation

    Bacterial resistance to antibiotics is a growing concern. Antimicrobial peptides (AMPs), natural components of innate immunity, are popular targets for developing new drugs. Machine learning methods are now commonly adopted by wet-laboratory researchers to screen for promising candidates.

    Results

    In this work, we utilize deep learning to recognize antimicrobial activity. We propose a neural network model with convolutional and recurrent layers that leverage primary sequence composition. Results show that the proposed model outperforms state-of-the-art classification models on a comprehensive dataset. By utilizing the embedding weights, we also present a reduced-alphabet representation and show that reasonable AMP recognition can be maintained using nine amino acid types.

    Availability and implementation

    Models and datasets are made freely available through the Antimicrobial Peptide Scanner vr.2 web server at www.ampscanner.com.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Abstract Motivation

    Predicting pathological complete response (pCR) to neoadjuvant chemotherapy (NAC) in triple-negative breast cancer (TNBC) patients accurately is direly needed for clinical decision making. pCR is also regarded as a strong predictor of overall survival. In this work, we propose a deep learning system to predict pCR to NAC based on serial pathology images stained with hematoxylin and eosin and two immunohistochemical biomarkers (Ki67 and PHH3). To support human prior domain knowledge-based guidance and enhance interpretability of the deep learning system, we introduce a human knowledge-derived spatial attention mechanism to inform deep learning models of informative tissue areas of interest. For each patient, three serial breast tumor tissue sections from biopsy blocks were sectioned, stained in three different stains and integrated. The resulting comprehensive attention information from the image triplets is used to guide our prediction system for prognostic tissue regions.

    Results

    The experimental dataset consists of 26 419 pathology image patches of 1000×1000 pixels from 73 TNBC patients treated with NAC. Image patches from randomly selected 43 patients are used as a training dataset and images patches from the rest 30 are used as a testing dataset. By the maximum voting from patch-level results, our proposed model achieves a 93% patient-level accuracy, outperforming baselines and other state-of-the-art systems, suggesting its high potential for clinical decision making.

    Availability and implementation

    The codes, the documentation and example data are available on an open source at: https://github.com/jkonglab/PCR_Prediction_Serial_WSIs_biomarkers

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less