Abstract Sensitivity analysis is a popular feature selection approach employed to identify the important features in a dataset. In sensitivity analysis, each input feature is perturbed one-at-a-time and the response of the machine learning model is examined to determine the feature's rank. Note that the existing perturbation techniques may lead to inaccurate feature ranking due to their sensitivity to perturbation parameters. This study proposes a novel approach that involves the perturbation of input features using a complex-step. The implementation of complex-step perturbation in the framework of deep neural networks as a feature selection method is provided in this paper, and its efficacy in determining important features for real-world datasets is demonstrated. Furthermore, the filter-based feature selection methods are employed, and the results obtained from the proposed method are compared. While the results obtained for the classification task indicated that the proposed method outperformed other feature ranking methods, in the case of the regression task, it was found to perform more or less similar to that of other feature ranking methods.
more »
« less
Network-based drug sensitivity prediction
Abstract Background Drug sensitivity prediction and drug responsive biomarker selection on high-throughput genomic data is a critical step in drug discovery. Many computational methods have been developed to serve this purpose including several deep neural network models. However, the modular relations among genomic features have been largely ignored in these methods. To overcome this limitation, the role of the gene co-expression network on drug sensitivity prediction is investigated in this study. Methods In this paper, we first introduce a network-based method to identify representative features for drug response prediction by using the gene co-expression network. Then, two graph-based neural network models are proposed and both models integrate gene network information directly into neural network for outcome prediction. Next, we present a large-scale comparative study among the proposed network-based methods, canonical prediction algorithms (i.e., Elastic Net, Random Forest, Partial Least Squares Regression, and Support Vector Regression), and deep neural network models for drug sensitivity prediction. All the source code and processed datasets in this study are available at https://github.com/compbiolabucf/drug-sensitivity-prediction . Results In the comparison of different feature selection methods and prediction methods on a non-small cell lung cancer (NSCLC) cell line RNA-seq gene expression dataset with 50 different drug treatments, we found that (1) the network-based feature selection method improves the prediction performance compared to Pearson correlation coefficients; (2) Random Forest outperforms all the other canonical prediction algorithms and deep neural network models; (3) the proposed graph-based neural network models show better prediction performance compared to deep neural network model; (4) the prediction performance is drug dependent and it may relate to the drug’s mechanism of action. Conclusions Network-based feature selection method and prediction models improve the performance of the drug response prediction. The relations between the genomic features are more robust and stable compared to the correlation between each individual genomic feature and the drug response in high dimension and low sample size genomic datasets.
more »
« less
- Award ID(s):
- 1755761
- PAR ID:
- 10282797
- Date Published:
- Journal Name:
- BMC Medical Genomics
- Volume:
- 13
- Issue:
- S11
- ISSN:
- 1755-8794
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Quantitative structure–activity relationship (QSAR) modeling has become a critical tool in drug design. Recently proposed Topological Regression (TR), a computationally efficient and highly interpretable QSAR model that maps distances in the chemical domain to distances in the activity domain, has shown predictive performance comparable to state-of-the-art deep learning-based models. However, TR’s dependence on simple random sampling-based anchor selection and utilization of radial basis function for response reconstruction constrain its interpretability and predictive capacity. To address these limitations, we propose Adaptive Topological Regression (AdapToR) with adaptive anchor selection and optimization-based reconstruction. We evaluated AdapToR on the NCI60 GI50 dataset, which consists of over 50,000 drug responses across 60 human cancer cell lines, and compared its performance to Transformer CNN, Graph Transformer, TR, and other baseline models. The results demonstrate that AdapToR outperforms competing QSAR models for drug response prediction with significantly lower computational cost and greater interpretability as compared to deep learning-based models.more » « less
-
While machine learning models perform well on offline data, assessing their performance in real-world, resource-constrained environments-considering accuracy, prediction time, power consumption, and memory usage-is crucial for practical applications. This research implements a mobile-based Human Activity Recognition solution to classify three postures-sitting, standing, and walking-using smartphone sensors, specifically accelerometer, gyroscope, and magnetometer. Time-domain features extracted from these sensors were used, with Random Forest employed for feature selection. One traditional machine learning model, Logistic Regression, and one deep learning model, Convolutional Neural Network, were trained and deployed via an Android application for real-time evaluation. While the Convolutional Neural Network achieved higher accuracy and better memory efficiency, Logistic Regression demonstrated faster prediction times during real-time use. Both models showed reduced accuracy for standing and walking postures in real-world conditions, emphasizing the challenges of deploying machine learning models in dynamic environments. This study highlights the importance of evaluating machine learning models in real-world settings to ensure reliability and efficiency, particularly in resource-constrained environments.more » « less
-
Deep learning models are being adopted and applied across various critical medical tasks, yet they are primarily trained to provide point predictions without providing degrees of confidence. Medical practitioner’s trustworthiness of deep learning models is increased when paired with uncertainty estimations. Conformal Prediction has emerged as a promising method to pair machine learning models with prediction intervals, allowing for a view of the model’s uncertainty. However, popular uncertainty estimation methods for conformal prediction fail to provide highly accurate heteroskedastic intervals. In this paper, we propose a method to estimate the uncertainty of each sample by calculating the variance obtained from a Deep Regression Forest. We show that the deep regression forest variance improves the efficiency and coverage of normalized inductive conformal prediction when applied on an anti-cancer drug sensitivity prediction task.more » « less
-
DNA methylation is a process that can affect gene accessibility and therefore gene expression. In this study, a machine learning pipeline is proposed for the prediction of breast cancer and the identification of significant genes that contribute to the prediction. The current study utilized breast cancer methylation data from The Cancer Genome Atlas (TCGA), specifically the TCGA-BRCA dataset. Feature engineering techniques have been utilized to reduce data volume and make deep learning scalable. A comparative analysis of the proposed approach on Illumina 27K and 450K methylation data reveals that deep learning methodologies for cancer prediction can be coupled with feature selection models to enhance prediction accuracy. Prediction using 450K methylation markers can be accomplished in less than 13 s with an accuracy of 98.75%. Of the list of 685 genes in the feature selected 27K dataset, 578 were mapped to Ensemble Gene IDs. This reduced set was significantly (FDR < 0.05) enriched in five biological processes and one molecular function. Of the list of 1572 genes in the feature selected 450K data set, 1290 were mapped to Ensemble Gene IDs. This reduced set was significantly (FDR < 0.05) enriched in 95 biological processes and 17 molecular functions. Seven oncogene/tumor suppressor genes were common between the 27K and 450K feature selected gene sets. These genes were RTN4IP1, MYO18B, ANP32A, BRF1, SETBP1, NTRK1, and IGF2R. Our bioinformatics deep learning workflow, incorporating imputation and data balancing methods, is able to identify important methylation markers related to functionally important genes in breast cancer with high accuracy compared to deep learning or statistical models alone.more » « less
An official website of the United States government

