skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Sharma, Alok"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. TP53 is a tumor-suppressor gene involved in regulating apoptosis, DNA repair, and genomic stability. Mutations in TP53 are implicated in approximately half of all detected cancers, including breast, lung, colorectal, and ovarian cancers, making it a significant target for therapeutic interventions. Many pharmaceutical drugs aim to restore TP53 function, and there is a need for predictive tools to assess how compounds may affect TP53 expression. In this study, we propose a new ensemble machine-learning model to predict the direction of TP53 relative gene expression in response to pharmaceutical compounds. Our model utilizes molecular fingerprints, descriptors, and scaffold-based features extracted from SMILES representations of compounds concatenated into a single feature vector. Trained using our newly generated benchmark dataset based on the Connectivity Map (CMap) database and addressing class imbalance with the Synthetic Minority Over-sampling Technique (SMOTE), our model achieves 62.9%, 93.9%, 40.3%, and 0.39 in terms of accuracy, sensitivity, specificity, and Matthews Correlation Coefficient (MCC), respectively. As the first-of-its-kind TP53 gene regulation prediction, our study serves as a convincing proof-of-concept that paves the way for future investigation. GenReP as a stand-alone predictor, its source code, and our newly generated benchmark dataset are publicly available. 
    more » « less
  2. Abstract Protein–peptide interactions are fundamental to numerous cellular processes and are linked to diseases like cancer when disrupted. Understanding these interactions is critical for both functional genomics and drug discovery. Despite growing availability of protein–peptide complexes, experimental methods to study them remain resource-intensive and costly. While computational approaches offer a complementary solution, their predictive accuracy is often inadequate. To overcome these limitations, we present PepENS, an ensemble model combining deep learning and traditional machine learning techniques that integrates both structural and sequence-based features from primary protein sequences. By leveraging half-sphere exposure, position-specific scoring matrices from multiple-sequence alignments, and embeddings from a pre-trained protein language model, PepENS demonstrates superior performance compared to the state-of-the-art methods. The proposed model demonstrated strong performance, achieving a precision of 0.596 and an AUC of 0.860 on the Dataset 1 test set. On the Dataset 2 test set, it attained a precision of 0.539 and an AUC of 0.846. Notably, these results reflect improvements over state-of-the-art methods in terms of precision and AUC by 2.8% and 0.5%, respectively, on Dataset 1, and by 2.3% and 2.4%, respectively, on Dataset 2. The PepENS software and associated datasets are available at https://doi.org/10.6084/m9.figshare.28490012.v2. 
    more » « less
  3. New AI-designed RF pulses increase bandwidth and sensitivity for1H-15N HSQC spectra of metabolites. 
    more » « less
  4. ABSTRACT Phosphorylation is a substantial posttranslational modification of proteins that refers to adding a phosphate group to the amino acid side chain after translation process in the ribosome. It is vital to coordinate cellular functions, such as regulating metabolism, proliferation, apoptosis, subcellular trafficking, and other crucial physiological processes. Phosphorylation prediction in a microbial organism can assist in understanding pathogenesis and host–pathogen interaction, drug and antibody design, and antimicrobial agent development. Experimental methods for predicting phosphorylation sites are costly, slow, and tedious. Hence low‐cost and high‐speed computational approaches are highly desirable. This paper presents a new deep learning tool called DeepPhoPred for predicting microbial phospho‐serine (pS), phospho‐threonine (pT), and phospho‐tyrosine (pY) sites. DeepPhoPred incorporates a two‐headed convolutional neural network architecture with the squeeze and excitation blocks followed by fully connected layers that jointly learn significant features from the peptide's structural and evolutionary information to predict phosphorylation sites. Our empirical results demonstrate that DeepPhoPred significantly outperforms the existing microbial phosphorylation site predictors with its highly efficient deep‐learning architecture. DeepPhoPred as a standalone predictor, all its source codes, and our employed datasets are publicly available athttps://github.com/faisalahm3d/DeepPhoPred. 
    more » « less
  5. Abstract Protein–peptide interactions play a crucial role in various cellular processes and are implicated in abnormal cellular behaviors leading to diseases such as cancer. Therefore, understanding these interactions is vital for both functional genomics and drug discovery efforts. Despite a significant increase in the availability of protein–peptide complexes, experimental methods for studying these interactions remain laborious, time-consuming, and expensive. Computational methods offer a complementary approach but often fall short in terms of prediction accuracy. To address these challenges, we introduce PepCNN, a deep learning-based prediction model that incorporates structural and sequence-based information from primary protein sequences. By utilizing a combination of half-sphere exposure, position specific scoring matrices from multiple-sequence alignment tool, and embedding from a pre-trained protein language model, PepCNN outperforms state-of-the-art methods in terms of specificity, precision, and AUC. The PepCNN software and datasets are publicly available athttps://github.com/abelavit/PepCNN.git. 
    more » « less