Abstract. Organic aerosols generated from the smoldering combustion of woodcritically impact air quality and health for billions of people worldwide;yet, the links between the chemical components and the optical or biologicaleffects of woodsmoke aerosol (WSA) are still poorly understood. In thiswork, an untargeted analysis of the molecular composition of smoldering WSA,generated in a controlled environment from nine types of heartwood fuels(African mahogany, birch, cherry, maple, pine, poplar, red oak, redwood, andwalnut), identified several hundred compounds using gas chromatography massspectrometry (GC-MS) and nano-electrospray high-resolution mass spectrometry(HRMS) with tandem multistage mass spectrometry (MSn). The effects ofWSA on cell toxicity as well as gene expression dependent on the aryl hydrocarbon receptor (AhR) and estrogen receptor(ER) were characterized with cellular assays, andthe visible mass absorption coefficients (MACvis) of WSA were measuredwith ultraviolet–visible spectroscopy. The WSAs studied in this work have significantlevels of biological and toxicological activity, with exposure levels inboth an outdoor and indoor environment similar to or greater than those ofother toxicants. A correlation between the HRMS molecular composition andaerosol properties found that phenolic compounds from the oxidativedecomposition of lignin are the main drivers of aerosol effects, while thecellulose decomposition products play a secondary role; e.g., levoglucosanis anticorrelated with multiple effects. Polycyclic aromatic hydrocarbons(PAHs) are not expected to form at the combustion temperature in this work,nor were they observed above the detection limit; thus, biological and opticalproperties of the smoldering WSA are not attributed to PAHs. Syringylcompounds tend to correlate with cell toxicity, while the more conjugatedmolecules (including several compounds assigned to dimers) have higher AhRactivity and MACvis. The negative correlation between cell toxicity andAhR activity suggests that the toxicity of smoldering WSA to cells is notmediated by the AhR. Both mass-normalized biological outcomes have astatistically significant dependence on the degree of combustion of thewood. In addition, our observations support the fact that the visible lightabsorption of WSA is at least partially due to charge transfer effects inaerosols, as previously suggested. Finally, MACvis has no correlationwith toxicity or receptor signaling, suggesting that key chromophores inthis work are not biologically active on the endpoints tested.
more »
« less
Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology
Humans are exposed to numerous compounds daily, some of which have adverse effects on health. Computational approaches for modeling toxicological data in conjunction with machine learning algorithms have gained popularity over the last few years. Machine learning approaches have been used to predict toxicity-related biological activities using chemical structure descriptors. However, toxicity-related proteomic features have not been fully investigated. In this study, we construct a computational pipeline using machine learning models for predicting the most important protein features responsible for the toxicity of compounds taken from the Tox21 dataset that is implemented within the multiscale Computational Analysis of Novel Drug Opportunities (CANDO) therapeutic discovery platform. Tox21 is a highly imbalanced dataset consisting of twelve in vitro assays, seven from the nuclear receptor (NR) signaling pathway and five from the stress response (SR) pathway, for more than 10,000 compounds. For the machine learning model, we employed a random forest with the combination of Synthetic Minority Oversampling Technique (SMOTE) and the Edited Nearest Neighbor (ENN) method (SMOTE+ENN), which is a resampling method to balance the activity class distribution. Within the NR and SR pathways, the activity of the aryl hydrocarbon receptor (NR-AhR) and the mitochondrial membrane potential (SR-MMP) were two of the top-performing twelve toxicity endpoints with AUCROCs of 0.90 and 0.92, respectively. The top extracted features for evaluating compound toxicity were analyzed for enrichment to highlight the implicated biological pathways and proteins. We validated our enrichment results for the activity of the AhR using a thorough literature search. Our case study showed that the selected enriched pathways and proteins from our computational pipeline are not only correlated with AhR toxicity but also form a cascading upstream/downstream arrangement. Our work elucidates significant relationships between protein and compound interactions computed using CANDO and the associated biological pathways to which the proteins belong for twelve toxicity endpoints. This novel study uses machine learning not only to predict and understand toxicity but also elucidates therapeutic mechanisms at a proteomic level for a variety of toxicity endpoints.
more »
« less
- Award ID(s):
- 1910492
- PAR ID:
- 10352773
- Date Published:
- Journal Name:
- Molecules
- Volume:
- 27
- Issue:
- 9
- ISSN:
- 1420-3049
- Page Range / eLocation ID:
- 3021
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Abstract Over the last 2 decades, the zebrafish (Danio rerio) has emerged as a stellar model for unraveling molecular signaling events mediated by the aryl hydrocarbon receptor (AHR), an important ligand-activated receptor found in all eumetazoan animals. Zebrafish have 3 AHRs—AHR1a, AHR1b, and AHR2, and studies have demonstrated the diversity of both the endogenous and toxicological functions of the zebrafish AHRs. In this contemporary review, we first highlight the evolution of the zebrafish ahr genes, and the characteristics of the receptors including developmental and adult expression, their endogenous and inducible roles, and the predicted ligands from homology modeling studies. We then review the toxicity of a broad spectrum of AHR ligands across multiple life stages (early stage, and adult), discuss their transcriptomic and epigenetic mechanisms of action, and report on any known interactions between the AHRs and other signaling pathways. Through this article, we summarize the promising research that furthers our understanding of the complex AHR pathway through the extensive use of zebrafish as a model, coupled with a large array of molecular techniques. As much of the research has focused on the functions of AHR2 during development and the mechanism of TCDD (2,3,7,8-tetrachlorodibenzo-p-dioxin) toxicity, we illustrate the need to address the considerable knowledge gap in our understanding of both the mechanistic roles of AHR1a and AHR1b, and the diverse modes of toxicity of the various AHR ligands.more » « less
-
Abstract Zebrafish (Danio rerio) are a popular vertebrate model for high-throughput toxicity testing, serving as a model for embryonic development and disease etiology. However, standardized protocols using zebrafish tend to explore pathologies and behaviors at the organism level rather than at the organ-specific level. This study investigates the effects of chemical exposures on pancreatic function in whole-embryo zebrafish by integrating network analysis and machine learning, leveraging widely available datasets to probe an organ-specific effect. We compiled transcriptomics data for zebrafish exposed to 53 exposures from 25 unique chemicals, including halogenated organic compounds, pesticides/herbicides, endocrine-disrupting chemicals, pharmaceuticals, parabens, and solvents. All raw sequencing data were processed through a uniform bioinformatics pipeline for re-analysis and quality control, identifying differentially expressed genes and altered pathways related to pancreatic function and development. Clustering analysis revealed 5 distinct clusters of chemical exposures with similar impacts on pancreatic pathways, with gene co-expression network analysis identifying key driver genes within these clusters, providing insights into potential biomarkers of chemical-induced pancreatic toxicity. Machine learning was utilized to identify chemical properties that influence pancreatic pathway response, including average mass and biodegradation half-life. The random forest model achieved robust performance (4-fold cross-validation accuracy: 74%) over eXtreme Gradient Boosting, support vector machine, and multiclass logistic regression. This integrative approach enhances our understanding of the relationships between chemical properties and biological responses in a target organ, supporting the use of zebrafish whole embryos as a high-throughput vertebrate model. This computational workflow can be leveraged to investigate the complex effects of other exposures on organ-specific development.more » « less
-
Abstract BackgroundCannabis sativaL. with a rich history of traditional medicinal use, has garnered significant attention in contemporary research for its potential therapeutic applications in various human diseases, including pain, inflammation, cancer, and osteoarthritis. However, the specific molecular targets and mechanisms underlying the synergistic effects of its diverse phytochemical constituents remain elusive. Understanding these mechanisms is crucial for developing targeted, effective cannabis-based therapies. MethodsTo investigate the molecular targets and pathways involved in the synergistic effects of cannabis compounds, we utilized DRIFT, a deep learning model that leverages attention-based neural networks to predict compound-target interactions. We considered both whole plant extracts and specific plant-based formulations. Predicted targets were then mapped to the Reactome pathway database to identify the biological processes affected. To facilitate the prediction of molecular targets and associated pathways for any user-specified cannabis formulation, we developed CANDI (Cannabis-derived compound Analysis and Network Discovery Interface), a web-based server. This platform offers a user-friendly interface for researchers and drug developers to explore the therapeutic potential of cannabis compounds. ResultsOur analysis using DRIFT and CANDI successfully identified numerous molecular targets of cannabis compounds, many of which are involved in pathways relevant to pain, inflammation, cancer, and other diseases. The CANDI server enables researchers to predict the molecular targets and affected pathways for any specific cannabis formulation, providing valuable insights for developing targeted therapies. ConclusionsBy combining computational approaches with knowledge of traditional cannabis use, we have developed the CANDI server, a tool that allows us to harness the therapeutic potential of cannabis compounds for the effective treatment of various disorders. By bridging traditional pharmaceutical development with cannabis-based medicine, we propose a novel approach for botanical-based treatment modalities.more » « less
-
Background/Objectives: Predicting the biochemical pathway involvement of a compound could facilitate the interpretation of biological and biomedical research. Prior prediction approaches have largely focused on metabolism, training machine learning models to solely predict based on metabolic pathways. However, there are many other types of pathways in cells and organisms that are of interest to biologists. Methods: While several publications have made use of the metabolites and metabolic pathways available in the Kyoto Encyclopedia of Genes and Genomes (KEGG), we downloaded all the compound entries with pathway annotations available in the KEGG. From these data, we constructed a dataset where each entry contained features representing compounds combined with features representing pathways, followed by a binary label indicating whether the given compound is associated with the given pathway. We trained multi-layer perceptron binary classifiers on variations of this dataset. Results: The models trained on 6485 KEGG compounds and 502 pathways scored an overall mean Matthews correlation coefficient (MCC) performance of 0.847, a median MCC of 0.848, and a standard deviation of 0.0098. Conclusions: This performance on all 502 KEGG pathways represents a roughly 6% improvement over the performance of models trained on only the 184 KEGG metabolic pathways, which had a mean MCC of 0.800 and a standard deviation of 0.021. These results demonstrate the capability to effectively predict biochemical pathways in general, in addition to those specifically related to metabolism. Moreover, the improvement in the performance demonstrates additional transfer learning with the inclusion of non-metabolic pathways.more » « less
An official website of the United States government

