Abstract Sensitivity analysis is a popular feature selection approach employed to identify the important features in a dataset. In sensitivity analysis, each input feature is perturbed one-at-a-time and the response of the machine learning model is examined to determine the feature's rank. Note that the existing perturbation techniques may lead to inaccurate feature ranking due to their sensitivity to perturbation parameters. This study proposes a novel approach that involves the perturbation of input features using a complex-step. The implementation of complex-step perturbation in the framework of deep neural networks as a feature selection method is provided in this paper, and its efficacy in determining important features for real-world datasets is demonstrated. Furthermore, the filter-based feature selection methods are employed, and the results obtained from the proposed method are compared. While the results obtained for the classification task indicated that the proposed method outperformed other feature ranking methods, in the case of the regression task, it was found to perform more or less similar to that of other feature ranking methods. 
                        more » 
                        « less   
                    
                            
                            Network-based drug sensitivity prediction
                        
                    
    
            Abstract Background Drug sensitivity prediction and drug responsive biomarker selection on high-throughput genomic data is a critical step in drug discovery. Many computational methods have been developed to serve this purpose including several deep neural network models. However, the modular relations among genomic features have been largely ignored in these methods. To overcome this limitation, the role of the gene co-expression network on drug sensitivity prediction is investigated in this study. Methods In this paper, we first introduce a network-based method to identify representative features for drug response prediction by using the gene co-expression network. Then, two graph-based neural network models are proposed and both models integrate gene network information directly into neural network for outcome prediction. Next, we present a large-scale comparative study among the proposed network-based methods, canonical prediction algorithms (i.e., Elastic Net, Random Forest, Partial Least Squares Regression, and Support Vector Regression), and deep neural network models for drug sensitivity prediction. All the source code and processed datasets in this study are available at https://github.com/compbiolabucf/drug-sensitivity-prediction . Results In the comparison of different feature selection methods and prediction methods on a non-small cell lung cancer (NSCLC) cell line RNA-seq gene expression dataset with 50 different drug treatments, we found that (1) the network-based feature selection method improves the prediction performance compared to Pearson correlation coefficients; (2) Random Forest outperforms all the other canonical prediction algorithms and deep neural network models; (3) the proposed graph-based neural network models show better prediction performance compared to deep neural network model; (4) the prediction performance is drug dependent and it may relate to the drug’s mechanism of action. Conclusions Network-based feature selection method and prediction models improve the performance of the drug response prediction. The relations between the genomic features are more robust and stable compared to the correlation between each individual genomic feature and the drug response in high dimension and low sample size genomic datasets. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1755761
- PAR ID:
- 10282797
- Date Published:
- Journal Name:
- BMC Medical Genomics
- Volume:
- 13
- Issue:
- S11
- ISSN:
- 1755-8794
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            While machine learning models perform well on offline data, assessing their performance in real-world, resource-constrained environments-considering accuracy, prediction time, power consumption, and memory usage-is crucial for practical applications. This research implements a mobile-based Human Activity Recognition solution to classify three postures-sitting, standing, and walking-using smartphone sensors, specifically accelerometer, gyroscope, and magnetometer. Time-domain features extracted from these sensors were used, with Random Forest employed for feature selection. One traditional machine learning model, Logistic Regression, and one deep learning model, Convolutional Neural Network, were trained and deployed via an Android application for real-time evaluation. While the Convolutional Neural Network achieved higher accuracy and better memory efficiency, Logistic Regression demonstrated faster prediction times during real-time use. Both models showed reduced accuracy for standing and walking postures in real-world conditions, emphasizing the challenges of deploying machine learning models in dynamic environments. This study highlights the importance of evaluating machine learning models in real-world settings to ensure reliability and efficiency, particularly in resource-constrained environments.more » « less
- 
            DNA methylation is a process that can affect gene accessibility and therefore gene expression. In this study, a machine learning pipeline is proposed for the prediction of breast cancer and the identification of significant genes that contribute to the prediction. The current study utilized breast cancer methylation data from The Cancer Genome Atlas (TCGA), specifically the TCGA-BRCA dataset. Feature engineering techniques have been utilized to reduce data volume and make deep learning scalable. A comparative analysis of the proposed approach on Illumina 27K and 450K methylation data reveals that deep learning methodologies for cancer prediction can be coupled with feature selection models to enhance prediction accuracy. Prediction using 450K methylation markers can be accomplished in less than 13 s with an accuracy of 98.75%. Of the list of 685 genes in the feature selected 27K dataset, 578 were mapped to Ensemble Gene IDs. This reduced set was significantly (FDR < 0.05) enriched in five biological processes and one molecular function. Of the list of 1572 genes in the feature selected 450K data set, 1290 were mapped to Ensemble Gene IDs. This reduced set was significantly (FDR < 0.05) enriched in 95 biological processes and 17 molecular functions. Seven oncogene/tumor suppressor genes were common between the 27K and 450K feature selected gene sets. These genes were RTN4IP1, MYO18B, ANP32A, BRF1, SETBP1, NTRK1, and IGF2R. Our bioinformatics deep learning workflow, incorporating imputation and data balancing methods, is able to identify important methylation markers related to functionally important genes in breast cancer with high accuracy compared to deep learning or statistical models alone.more » « less
- 
            Abstract A large number of genetic variations have been identified to be associated with Alzheimer’s disease (AD) and related quantitative traits. However, majority of existing studies focused on single types of omics data, lacking the power of generating a community including multi-omic markers and their functional connections. Because of this, the immense value of multi-omics data on AD has attracted much attention. Leveraging genomic, transcriptomic and proteomic data, and their backbone network through functional relations, we proposed a modularity-constrained logistic regression model to mine the association between disease status and a group of functionally connected multi-omic features, i.e. single-nucleotide polymorphisms (SNPs), genes and proteins. This new model was applied to the real data collected from the frontal cortex tissue in the Religious Orders Study and Memory and Aging Project cohort. Compared with other state-of-art methods, it provided overall the best prediction performance during cross-validation. This new method helped identify a group of densely connected SNPs, genes and proteins predictive of AD status. These SNPs are mostly expression quantitative trait loci in the frontal region. Brain-wide gene expression profile of these genes and proteins were highly correlated with the brain activation map of ‘vision’, a brain function partly controlled by frontal cortex. These genes and proteins were also found to be associated with the amyloid deposition, cortical volume and average thickness of frontal regions. Taken together, these results suggested a potential pathway underlying the development of AD from SNPs to gene expression, protein expression and ultimately brain functional and structural changes.more » « less
- 
            Abstract Automated manufacturing feature recognition is a crucial link between computer-aided design and manufacturing, facilitating process selection and other downstream tasks in computer-aided process planning. While various methods such as graph-based, rule-based, and neural networks have been proposed for automatic feature recognition, they suffer from poor scalability or computational inefficiency. Recently, voxel-based convolutional neural networks have shown promise in solving these challenges but incur a tradeoff between computational cost and feature resolution. This paper investigates a computationally efficient sparse voxel-based convolutional neural network for manufacturing feature recognition, specifically, an octree-based sparse voxel convolutional neural network. This model is trained on a large-scale manufacturing feature dataset, and its performance is compared to a voxel-based feature recognition model (FeatureNet). The results indicate that the octree-based model yields higher feature recognition accuracy (99.5% on the test dataset) with 44% lower graphics processing unit (GPU) memory consumption than a voxel-based model of comparable resolution. In addition, increasing the resolution of the octree-based model enables recognition of finer manufacturing features. These results indicate that a sparse voxel-based convolutional neural network is a computationally efficient deep learning model for manufacturing feature recognition to enable process planning automation. Moreover, the sparse voxel-based neural network demonstrated comparable performance to a boundary representation-based feature recognition neural network, achieving similar accuracy in single-feature recognition without having access to the exact 3D shape descriptors.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    