skip to main content

Title: Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology
Humans are exposed to numerous compounds daily, some of which have adverse effects on health. Computational approaches for modeling toxicological data in conjunction with machine learning algorithms have gained popularity over the last few years. Machine learning approaches have been used to predict toxicity-related biological activities using chemical structure descriptors. However, toxicity-related proteomic features have not been fully investigated. In this study, we construct a computational pipeline using machine learning models for predicting the most important protein features responsible for the toxicity of compounds taken from the Tox21 dataset that is implemented within the multiscale Computational Analysis of Novel Drug Opportunities (CANDO) therapeutic discovery platform. Tox21 is a highly imbalanced dataset consisting of twelve in vitro assays, seven from the nuclear receptor (NR) signaling pathway and five from the stress response (SR) pathway, for more than 10,000 compounds. For the machine learning model, we employed a random forest with the combination of Synthetic Minority Oversampling Technique (SMOTE) and the Edited Nearest Neighbor (ENN) method (SMOTE+ENN), which is a resampling method to balance the activity class distribution. Within the NR and SR pathways, the activity of the aryl hydrocarbon receptor (NR-AhR) and the mitochondrial membrane potential (SR-MMP) were two of the more » top-performing twelve toxicity endpoints with AUCROCs of 0.90 and 0.92, respectively. The top extracted features for evaluating compound toxicity were analyzed for enrichment to highlight the implicated biological pathways and proteins. We validated our enrichment results for the activity of the AhR using a thorough literature search. Our case study showed that the selected enriched pathways and proteins from our computational pipeline are not only correlated with AhR toxicity but also form a cascading upstream/downstream arrangement. Our work elucidates significant relationships between protein and compound interactions computed using CANDO and the associated biological pathways to which the proteins belong for twelve toxicity endpoints. This novel study uses machine learning not only to predict and understand toxicity but also elucidates therapeutic mechanisms at a proteomic level for a variety of toxicity endpoints. « less
; ; ; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
Page Range or eLocation-ID:
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract. Organic aerosols generated from the smoldering combustion of woodcritically impact air quality and health for billions of people worldwide;yet, the links between the chemical components and the optical or biologicaleffects of woodsmoke aerosol (WSA) are still poorly understood. In thiswork, an untargeted analysis of the molecular composition of smoldering WSA,generated in a controlled environment from nine types of heartwood fuels(African mahogany, birch, cherry, maple, pine, poplar, red oak, redwood, andwalnut), identified several hundred compounds using gas chromatography massspectrometry (GC-MS) and nano-electrospray high-resolution mass spectrometry(HRMS) with tandem multistage mass spectrometry (MSn). The effects ofWSA on cell toxicity as well as gene expression dependent on the aryl hydrocarbon receptor (AhR) and estrogen receptor(ER) were characterized with cellular assays, andthe visible mass absorption coefficients (MACvis) of WSA were measuredwith ultraviolet–visible spectroscopy. The WSAs studied in this work have significantlevels of biological and toxicological activity, with exposure levels inboth an outdoor and indoor environment similar to or greater than those ofother toxicants. A correlation between the HRMS molecular composition andaerosol properties found that phenolic compounds from the oxidativedecomposition of lignin are the main drivers of aerosol effects, while thecellulose decomposition products play a secondary role; e.g., levoglucosanis anticorrelated with multiple effects. Polycyclic aromaticmore »hydrocarbons(PAHs) are not expected to form at the combustion temperature in this work,nor were they observed above the detection limit; thus, biological and opticalproperties of the smoldering WSA are not attributed to PAHs. Syringylcompounds tend to correlate with cell toxicity, while the more conjugatedmolecules (including several compounds assigned to dimers) have higher AhRactivity and MACvis. The negative correlation between cell toxicity andAhR activity suggests that the toxicity of smoldering WSA to cells is notmediated by the AhR. Both mass-normalized biological outcomes have astatistically significant dependence on the degree of combustion of thewood. In addition, our observations support the fact that the visible lightabsorption of WSA is at least partially due to charge transfer effects inaerosols, as previously suggested. Finally, MACvis has no correlationwith toxicity or receptor signaling, suggesting that key chromophores inthis work are not biologically active on the endpoints tested.« less
  2. Abstract Background

    Autosomal dominant polycystic kidney disease (ADPKD) is one of the most prevalent monogenic human diseases. It is mostly caused by pathogenic variants inPKD1orPKD2genes that encode interacting transmembrane proteins polycystin-1 (PC1) and polycystin-2 (PC2). Among many pathogenic processes described in ADPKD, those associated with cAMP signaling, inflammation, and metabolic reprogramming appear to regulate the disease manifestations. Tolvaptan, a vasopressin receptor-2 antagonist that regulates cAMP pathway, is the only FDA-approved ADPKD therapeutic. Tolvaptan reduces renal cyst growth and kidney function loss, but it is not tolerated by many patients and is associated with idiosyncratic liver toxicity. Therefore, additional therapeutic options for ADPKD treatment are needed.


    As drug repurposing of FDA-approved drug candidates can significantly decrease the time and cost associated with traditional drug discovery, we used the computational approach signature reversion to detect inversely related drug response gene expression signatures from the Library of Integrated Network-Based Cellular Signatures (LINCS) database and identified compounds predicted to reverse disease-associated transcriptomic signatures in three publicly availablePkd2kidney transcriptomic data sets of mouse ADPKD models. We focused on a pre-cystic model for signature reversion, as it was less impacted by confounding secondary disease mechanisms in ADPKD, and then compared the resulting candidates’ target differential expression inmore »the two cystic mouse models. We further prioritized these drug candidates based on their known mechanism of action, FDA status, targets, and by functional enrichment analysis.


    With this in-silico approach, we prioritized 29 unique drug targets differentially expressed inPkd2ADPKD cystic models and 16 prioritized drug repurposing candidates that target them, including bromocriptine and mirtazapine, which can be further tested in-vitro and in-vivo.


    Collectively, these results indicate drug targets and repurposing candidates that may effectively treat pre-cystic as well as cystic ADPKD.

    Graphical Abstract« less
  3. Abstract Over the last 2 decades, the zebrafish (Danio rerio) has emerged as a stellar model for unraveling molecular signaling events mediated by the aryl hydrocarbon receptor (AHR), an important ligand-activated receptor found in all eumetazoan animals. Zebrafish have 3 AHRs—AHR1a, AHR1b, and AHR2, and studies have demonstrated the diversity of both the endogenous and toxicological functions of the zebrafish AHRs. In this contemporary review, we first highlight the evolution of the zebrafish ahr genes, and the characteristics of the receptors including developmental and adult expression, their endogenous and inducible roles, and the predicted ligands from homology modeling studies. We then review the toxicity of a broad spectrum of AHR ligands across multiple life stages (early stage, and adult), discuss their transcriptomic and epigenetic mechanisms of action, and report on any known interactions between the AHRs and other signaling pathways. Through this article, we summarize the promising research that furthers our understanding of the complex AHR pathway through the extensive use of zebrafish as a model, coupled with a large array of molecular techniques. As much of the research has focused on the functions of AHR2 during development and the mechanism of TCDD (2,3,7,8-tetrachlorodibenzo-p-dioxin) toxicity, we illustrate the need tomore »address the considerable knowledge gap in our understanding of both the mechanistic roles of AHR1a and AHR1b, and the diverse modes of toxicity of the various AHR ligands.« less
  4. null (Ed.)
    Many intracellular signaling pathways are composed of molecular switches, proteins that transition between two states— on and off . Typically, signaling is initiated when an external stimulus activates its cognate receptor that, in turn, causes downstream switches to transition from off to on using one of the following mechanisms: activation, in which the transition rate from the off state to the on state increases; derepression, in which the transition rate from the on state to the off state decreases; and concerted, in which activation and derepression operate simultaneously. We use mathematical modeling to compare these signaling mechanisms in terms of their dose–response curves, response times, and abilities to process upstream fluctuations. Our analysis elucidates several operating principles for molecular switches. First, activation increases the sensitivity of the pathway, whereas derepression decreases sensitivity. Second, activation generates response times that decrease with signal strength, whereas derepression causes response times to increase with signal strength. These opposing features allow the concerted mechanism to not only show dose–response alignment, but also to decouple the response time from stimulus strength. However, these potentially beneficial properties come at the expense of increased susceptibility to upstream fluctuations. We demonstrate that these operating principles also hold when themore »models are extended to include additional features, such as receptor removal, kinetic proofreading, and cascades of switches. In total, we show how the architecture of molecular switches govern their response properties. We also discuss the biological implications of our findings.« less
  5. Abstract

    The outbreak of Zika virus (ZIKV) in 2016 created worldwide health emergency which demand urgent research efforts on understanding the virus biology and developing therapeutic strategies. Here, we present a time-resolved chemical proteomic strategy to track the early-stage entry of ZIKV into host cells. ZIKV was labeled on its surface with a chemical probe, which carries a photocrosslinker to covalently link virus-interacting proteins in living cells on UV exposure at different time points, and a biotin tag for subsequent enrichment and mass spectrometric identification of the receptor or other host proteins critical for virus internalization. We identified Neural Cell Adhesion Molecule (NCAM1) as a potential ZIKV receptor and further validated it through overexpression, knockout, and inhibition of NCAM1 in Vero cells and human glioblastoma cells U-251 MG. Collectively, the strategy can serve as a universal tool to map virus entry pathways and uncover key interacting proteins.