Facing the continuous emergence of new psychoactive substances (NPS) and their threat to public health, more effective methods for NPS prediction and identification are critical. In this study, the pharmacological affinity fingerprints (
- Award ID(s):
- 2018427
- Publication Date:
- NSF-PAR ID:
- 10368740
- Journal Name:
- Journal of Cheminformatics
- Volume:
- 14
- Issue:
- 1
- ISSN:
- 1758-2946
- Publisher:
- Springer Science + Business Media
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract In this study, we developed a novel algorithm to improve the screening performance of an arbitrary docking scoring function by recalibrating the docking score of a query compound based on its structure similarity with a set of training compounds, while the extra computational cost is neglectable. Two popular docking methods, Glide and AutoDock Vina were adopted as the original scoring functions to be processed with our new algorithm and similar improvement performance was achieved. Predicted binding affinities were compared against experimental data from ChEMBL and DUD-E databases. 11 representative drug receptors from diverse drug target categories were applied to evaluate the hybrid scoring function. The effects of four different fingerprints (FP2, FP3, FP4, and MACCS) and the four different compound similarity effect (CSE) functions were explored. Encouragingly, the screening performance was significantly improved for all 11 drug targets especially when CSE = S 4 (S is the Tanimoto structural similarity) and FP2 fingerprint were applied. The average predictive index (PI) values increased from 0.34 to 0.66 and 0.39 to 0.71 for the Glide and AutoDock vina scoring functions, respectively. To evaluate the performance of the calibration algorithm in drug lead identification, we also imposed an upper limit on the structural similaritymore »
-
Abstract Background In Alzheimer’s Diseases (AD) research, multimodal imaging analysis can unveil complementary information from multiple imaging modalities and further our understanding of the disease. One application is to discover disease subtypes using unsupervised clustering. However, existing clustering methods are often applied to input features directly, and could suffer from the curse of dimensionality with high-dimensional multimodal data. The purpose of our study is to identify multimodal imaging-driven subtypes in Mild Cognitive Impairment (MCI) participants using a multiview learning framework based on Deep Generalized Canonical Correlation Analysis (DGCCA), to learn shared latent representation with low dimensions from 3 neuroimaging modalities.
Results DGCCA applies non-linear transformation to input views using neural networks and is able to learn correlated embeddings with low dimensions that capture more variance than its linear counterpart, generalized CCA (GCCA). We designed experiments to compare DGCCA embeddings with single modality features and GCCA embeddings by generating 2 subtypes from each feature set using unsupervised clustering. In our validation studies, we found that amyloid PET imaging has the most discriminative features compared with structural MRI and FDG PET which DGCCA learns from but not GCCA. DGCCA subtypes show differential measures in 5 cognitive assessments, 6 brain volume measures, and conversion to AD patterns. Inmore »
Conclusion Overall, DGCCA is able to learn effective low dimensional embeddings from multimodal data by learning non-linear projections. MCI subtypes generated from DGCCA embeddings are different from existing early and late MCI groups and show most similarity with those identified by amyloid PET features. In our validation studies, DGCCA subtypes show distinct patterns in cognitive measures, brain volumes, and are able to identify AD genetic markers. These findings indicate the promise of the imaging-driven subtypes and their power in revealing disease structures beyond early and late stage MCI.
-
Abstract Background Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes.
Results The topology relevant curated data that we obtain provides an improvement in shallow learning as well as deep learning based supervised classifications. We further show that the representative cycles we compute have an unsupervised inclination towards phenotype labels. This work thus shows that topological signatures are able tomore »
Conclusions In this work, we engender representative persistent cycles to discern the gene expression data. These cycles allow us to directly procure genes entailed in similar processes.
-
Abstract This paper introduces the use of topological data analysis (TDA) as an unsupervised machine learning tool to uncover classification criteria in complex inorganic crystal chemistries. Using the apatite chemistry as a template, we track through the use of persistent homology the topological connectivity of input crystal chemistry descriptors on defining similarity between different stoichiometries of apatites. It is shown that TDA automatically identifies a hierarchical classification scheme within apatites based on the commonality of the number of discrete coordination polyhedra that constitute the structural building units common among the compounds. This information is presented in the form of a visualization scheme of a barcode of homology classifications, where the persistence of similarity between compounds is tracked. Unlike traditional perspectives of structure maps, this new “Materials Barcode” schema serves as an automated exploratory machine learning tool that can uncover structural associations from crystal chemistry databases, as well as to achieve a more nuanced insight into what defines similarity among homologous compounds.
-
Abstract Magnetic transition metals (mTM = Cr, Mn, Fe, Co, and Ni) and their complex compounds (oxides, hydroxides, and oxyhydroxides) are highly important material platforms for diverse technologies, where electrochemical phase diagrams with respect to electrode potential and solution pH can be used to effectively understand their corrosion and oxidation behaviors in relevant aqueous environments. Many previous decades-old mTM–Pourbaix diagrams are inconsistent with various direct electrochemical observations, because experimental complexities associated with extracting reliable free energies of formation (Δ
f G ) lead to inaccuracies in the data used for modeling. Here, we develop a high-throughput simulation approach based on density-functional theory (DFT), which quickly screens structures and compounds using efficient DFT methods and calculates accurate Δf G values, using high-level exchange-correlation functions to obtain ab initio Pourbaix diagrams in comprehensive and close agreement with various important electrochemical, geological, and biomagnetic observations reported over the last few decades. We also analyze the microscopic mechanisms governing the chemical trends among the Δf G values and Pourbaix diagrams to further understand the electrochemical behaviors of mTM-based materials. Last, we provide probability profiles at variable electrode potential and solution pH to show quantitatively the likely coexistence of multiple-phase areas and diffuse phase boundaries.