NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

AdapTor: Adaptive Topological Regression for quantitative structure–activity relationship modeling

https://doi.org/10.1186/s13321-025-01071-8

Mao, Yixiang; Ghosh, Souparno; Pal, Ranadip (August 2025, Journal of Cheminformatics)

Abstract Quantitative structure–activity relationship (QSAR) modeling has become a critical tool in drug design. Recently proposed Topological Regression (TR), a computationally efficient and highly interpretable QSAR model that maps distances in the chemical domain to distances in the activity domain, has shown predictive performance comparable to state-of-the-art deep learning-based models. However, TR’s dependence on simple random sampling-based anchor selection and utilization of radial basis function for response reconstruction constrain its interpretability and predictive capacity. To address these limitations, we propose Adaptive Topological Regression (AdapToR) with adaptive anchor selection and optimization-based reconstruction. We evaluated AdapToR on the NCI60 GI50 dataset, which consists of over 50,000 drug responses across 60 human cancer cell lines, and compared its performance to Transformer CNN, Graph Transformer, TR, and other baseline models. The results demonstrate that AdapToR outperforms competing QSAR models for drug response prediction with significantly lower computational cost and greater interpretability as compared to deep learning-based models.
more » « less
Full Text Available
Topological regression as an interpretable and efficient tool for quantitative structure-activity relationship modeling

https://doi.org/10.1038/s41467-024-49372-0

Zhang, Ruibo; Nolte, Daniel; Sanchez-Villalobos, Cesar; Ghosh, Souparno; Pal, Ranadip (December 2024, Nature Communications)

Abstract Quantitative structure-activity relationship (QSAR) modeling is a powerful tool for drug discovery, yet the lack of interpretability of commonly used QSAR models hinders their application in molecular design. We propose a similarity-based regression framework, topological regression (TR), that offers a statistically grounded, computationally fast, and interpretable technique to predict drug responses. We compare the predictive performance of TR on 530 ChEMBL human target activity datasets against the predictive performance of deep-learning-based QSAR models. Our results suggest that our sparse TR model can achieve equal, if not better, performance than the deep learning-based QSAR models and provide better intuitive interpretation by extracting an approximate isometry between the chemical space of the drugs and their activity space.
more » « less
Full Text Available
Federated learning framework integrating REFINED CNN and Deep Regression Forests

https://doi.org/10.1093/bioadv/vbad036

Nolte, Daniel; Bazgir, Omid; Ghosh, Souparno; Pal, Ranadip (March 2023, Bioinformatics Advances)
Rattray, Magnus (Ed.)
Abstract SummaryPredictive learning from medical data incurs additional challenge due to concerns over privacy and security of personal data. Federated learning, intentionally structured to preserve high level of privacy, is emerging to be an attractive way to generate cross-silo predictions in medical scenarios. However, the impact of severe population-level heterogeneity on federated learners is not well explored. In this article, we propose a methodology to detect presence of population heterogeneity in federated settings and propose a solution to handle such heterogeneity by developing a federated version of Deep Regression Forests. Additionally, we demonstrate that the recently conceptualized REpresentation of Features as Images with NEighborhood Dependencies CNN framework can be combined with the proposed Federated Deep Regression Forests to provide improved performance as compared to existing approaches. Availability and implementationThe Python source code for reproducing the main results are available on GitHub: https://github.com/DanielNolte/FederatedDeepRegressionForests. Contactranadip.pal@ttu.edu Supplementary informationSupplementary data are available at Bioinformatics Advances online.
more » « less
Full Text Available
Efficient Normalized Conformal Prediction and Uncertainty Quantification for Anti-Cancer Drug Sensitivity Prediction with Deep Regression Forests

https://doi.org/10.1109/EMBC53108.2024.10782378

Nolte, Daniel; Ghosh, Souparno; Pal, Ranadip (July 2024, IEEE)

Deep learning models are being adopted and applied across various critical medical tasks, yet they are primarily trained to provide point predictions without providing degrees of confidence. Medical practitioner’s trustworthiness of deep learning models is increased when paired with uncertainty estimations. Conformal Prediction has emerged as a promising method to pair machine learning models with prediction intervals, allowing for a view of the model’s uncertainty. However, popular uncertainty estimation methods for conformal prediction fail to provide highly accurate heteroskedastic intervals. In this paper, we propose a method to estimate the uncertainty of each sample by calculating the variance obtained from a Deep Regression Forest. We show that the deep regression forest variance improves the efficiency and coverage of normalized inductive conformal prediction when applied on an anti-cancer drug sensitivity prediction task.
more » « less
Full Text Available
Predicting binding affinities of emerging variants of SARS-CoV-2 using spike protein sequencing data: observations, caveats and recommendations

https://doi.org/10.1093/bib/bbac128

Zhang, Ruibo; Ghosh, Souparno; Pal, Ranadip (May 2022, Briefings in Bioinformatics)

Abstract Predicting protein properties from amino acid sequences is an important problem in biology and pharmacology. Protein–protein interactions among SARS-CoV-2 spike protein, human receptors and antibodies are key determinants of the potency of this virus and its ability to evade the human immune response. As a rapidly evolving virus, SARS-CoV-2 has already developed into many variants with considerable variation in virulence among these variants. Utilizing the proteomic data of SARS-CoV-2 to predict its viral characteristics will, therefore, greatly aid in disease control and prevention. In this paper, we review and compare recent successful prediction methods based on long short-term memory (LSTM), transformer, convolutional neural network (CNN) and a similarity-based topological regression (TR) model and offer recommendations about appropriate predictive methodology depending on the similarity between training and test datasets. We compare the effectiveness of these models in predicting the binding affinity and expression of SARS-CoV-2 spike protein sequences. We also explore how effective these predictive methods are when trained on laboratory-created data and are tasked with predicting the binding affinity of the in-the-wild SARS-CoV-2 spike protein sequences obtained from the GISAID datasets. We observe that TR is a better method when the sample size is small and test protein sequences are sufficiently similar to the training sequence. However, when the training sample size is sufficiently large and prediction requires extrapolation, LSTM embedding and CNN-based predictive model show superior performance.
more » « less
Full Text Available
Investigation of REFINED CNN ensemble learning for anti-cancer drug sensitivity prediction

https://doi.org/10.1093/bioinformatics/btab336

Bazgir, Omid; Ghosh, Souparno; Pal, Ranadip (July 2021, Bioinformatics)
null (Ed.)
Abstract Motivation Anti-cancer drug sensitivity prediction using deep learning models for individual cell line is a significant challenge in personalized medicine. Recently developed REFINED (REpresentation of Features as Images with NEighborhood Dependencies) CNN (Convolutional Neural Network)-based models have shown promising results in improving drug sensitivity prediction. The primary idea behind REFINED-CNN is representing high dimensional vectors as compact images with spatial correlations that can benefit from CNN architectures. However, the mapping from a high dimensional vector to a compact 2D image depends on the a priori choice of the distance metric and projection scheme with limited empirical procedures guiding these choices. Results In this article, we consider an ensemble of REFINED-CNN built under different choices of distance metrics and/or projection schemes that can improve upon a single projection based REFINED-CNN model. Results, illustrated using NCI60 and NCI-ALMANAC databases, demonstrate that the ensemble approaches can provide significant improvement in prediction performance as compared to individual models. We also develop the theoretical framework for combining different distance metrics to arrive at a single 2D mapping. Results demonstrated that distance-averaged REFINED-CNN produced comparable performance as obtained from stacking REFINED-CNN ensemble but with significantly lower computational cost. Availability and implementation The source code, scripts, and data used in the paper have been deposited in GitHub (https://github.com/omidbazgirTTU/IntegratedREFINED). Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available

Search for: All records