Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract BackgroundNetwork propagation has been widely used for nearly 20 years to predict gene functions and phenotypes. Despite the popularity of this approach, little attention has been paid to the question of provenance tracing in this context, e.g., determining how much any experimental observation in the input contributes to the score of every prediction. ResultsWe design a network propagation framework with 2 novel components and apply it to predict human proteins that directly or indirectly interact with SARS-CoV-2 proteins. First, we trace the provenance of each prediction to its experimentally validated sources, which in our case are human proteins experimentally determined to interact with viral proteins. Second, we design a technique that helps to reduce the manual adjustment of parameters by users. We find that for every top-ranking prediction, the highest contribution to its score arises from a direct neighbor in a human protein-protein interaction network. We further analyze these results to develop functional insights on SARS-CoV-2 that expand on known biology such as the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents. ConclusionsWe examine how our provenance-tracing method can be generalized to a broad class of network-based algorithms. We provide a useful resource for the SARS-CoV-2 community that implicates many previously undocumented proteins with putative functional relationships to viral infection. This resource includes potential drugs that can be opportunistically repositioned to target these proteins. We also discuss how our overall framework can be extended to other, newly emerging viruses.more » « less
-
The insulin receptor is a membrane protein responsible for the regulation of nutrient balance; and therefore, it is an attractive target in the treatment of diabetes and metabolic syndrome. Pharmacology of the insulin receptor involves two distinct mechanisms: (1) activation of the receptor by insulin mimetics that bind in the extracellular domain and (2) inhibition of the receptor TK enzymatic activity in the cytoplasmic domain. While a complete structural picture of the full‐length receptor comprising the entire sequence covering extracellular, transmembrane, juxtamembrane and cytoplasmic domains is still elusive, recent progress through cryoelectron microscopy has made it possible to describe the initial insulin ligand binding events at atomistic detail. We utilize this opportunity to obtain structural insights into the pharmacology of the insulin receptor. To this end, we conducted a comprehensive docking study of known ligands to the new structures of the receptor. Through this approach, we provide an in‐depth, structure‐based review of human insulin receptor pharmacology in light of the new structures. LINKED ARTICLESThis article is part of a themed issue on Structure Guided Pharmacology of Membrane Proteins (BJP 75th Anniversary). To view the other articles in this section visithttp://onlinelibrary.wiley.com/doi/10.1111/bph.v179.14/issuetocmore » « less
-
Chest X-ray (CXR) analysis plays an important role in patient treatment. As such, a multitude of machine learning models have been applied to CXR datasets attempting automated analysis. However, each patient has a differing number of images per angle, and multi-modal learning should deal with the missing data for specific angles and times. Furthermore, the large dimensionality of multi-modal imaging data with the shapes inconsistent across the dataset introduces the challenges in training. In light of these issues, we propose the Fast Multi-Modal Support Vector Machine (FMMSVM) which incorporates modality-specific factorization to deal with missing CXRs in the specific angle. Our model is able to adjust the fine-grained details in feature extraction and we provide an efficient optimization algorithm scalable to a large number of features. In our experiments, FMMSVM shows clearly improved classification performance.more » « less
-
Chest X-rays are commonly used for diagnosing and characterizing lung diseases, but the complex morphological patterns in radiographic appearances can challenge clinicians in making accurate diagnoses. To address this challenge, various learning methods have been developed for algorithm-aided disease detection and automated diagnosis. However, most existing methods fail to account for the heterogeneous variability in longitudinal imaging records and the presence of missing or inconsistent temporal data. In this paper, we propose a novel longitudinal learning framework that enriches inconsistent imaging data over sequential time points by leveraging 2D Principal Component Analysis (2D-PCA) and a robust adaptive loss function. We also derive an efficient solution algorithm that ensures both objective and sequence convergence for the non-convex optimization problem. Our experiments on the CheXpert dataset demonstrate improved performance in capturing indicative abnormalities in medical images and achieving satisfactory diagnoses. We believe that our method will be of significant interest to the research community working on medical image analysis.more » « less
-
The COVID-19 pandemic caused by SARS-CoV-2 has emphasized the importance of studying virus-host protein-protein interactions (PPIs) and drug-target interactions (DTIs) to discover effective antiviral drugs. While several computational algorithms have been developed for this purpose, most of them overlook the interplay pathways during infection along PPIs and DTIs. In this paper, we present a novel multipartite graph learning approach to uncover hidden binding affinities in PPIs and DTIs. Our method leverages a comprehensive biomolecular mechanism network that integrates protein-protein, genetic, and virus-host interactions, enabling us to learn a new graph that accurately captures the underlying connected components. Notably, our method identifies clustering structures directly from the new graph, eliminating the need for post-processing steps. To mitigate the detrimental effects of noisy or outlier data in sparse networks, we propose a robust objective function that incorporates the L2,p-norm and a constraint based on the pth-order Ky-Fan norm applied to the graph Laplacian matrix. Additionally, we present an efficient optimization method tailored to our framework. Experimental results demonstrate the superiority of our approach over existing state-of-the-art techniques, as it successfully identifies potential repurposable drugs for SARS-CoV-2, offering promising therapeutic options for COVID-19 treatment.more » « less
-
Multi-instance learning (MIL) handles data that is organized into sets of instances known as bags. Traditionally, MIL is used in the supervised-learning setting for classifying bags which contain any number of instances. However, many traditional MIL algorithms do not scale efficiently to large datasets. In this paper, we present a novel primal–dual multi-instance support vector machine that can operate efficiently on large-scale data. Our method relies on an algorithm derived using a multi-block variation of the alternating direction method of multipliers. The approach presented in this work is able to scale to large-scale data since it avoids iteratively solving quadratic programming problems which are broadly used to optimize MIL algorithms based on SVMs. In addition, we improve our derivation to include an additional optimization designed to avoid solving a least-squares problem in our algorithm, which increases the utility of our approach to handle a large number of features as well as bags. Finally, we derive a kernel extension of our approach to learn nonlinear decision boundaries for enhanced classification capabilities. We apply our approach to both synthetic and real-world multi-instance datasets to illustrate the scalability, promising predictive performance, and interpretability of our proposed method.more » « less
-
Histopathological image analysis is critical in cancer diagnosis and treatment. Due to the huge size of histopathological images, most existing works analyze the whole slide pathological image (WSI) as a bag and its patches are considered as instances. However, these approaches are limited to analyzing the patches in a fixed shape, while the malignant lesions can form varied shapes. To address this challenge, we propose the Multi-Instance Multi-Shape Support Vector Machine (MIMSSVM) to analyze the multiple images (instances) jointly where each instance consists of multiple patches in varied shapes. In our approach, we can identify the varied morphologic abnormalities of nuclei shapes from the multiple images. In addition to the multi-instance multi-shape learning capability, we provide an efficient algorithm to optimize the proposed model which scales well to a large number of features. Our experimental results show the proposed MIMSSVM method outperforms the existing SVM and recent deep learning models in histopathological classification. The proposed model also identifies the tissue segments in an image exhibiting an indication of an abnormality which provides utility in the early detection of malignant tumors.more » « less
-
Linear discriminant analysis (LDA) is widely used for dimensionality reduction under supervised learning settings. Traditional LDA objective aims to minimize the ratio of squared Euclidean distances that may not perform optimally on noisy data sets. Multiple robust LDA objectives have been proposed to address this problem, but their implementations have two major limitations. One is that their mean calculations use the squared l2-norm distance to center the data, which is not valid when the objective does not use the Euclidean distance. The second problem is that there is no generalized optimization algorithm to solve different robust LDA objectives. In addition, most existing algorithms can only guarantee the solution to be locally optimal, rather than globally optimal. In this paper, we review multiple robust loss functions and propose a new and generalized robust objective for LDA. Besides, to better remove the mean value within data, our objective uses an optimal way to center the data through learning. As one important algorithmic contribution, we derive an efficient iterative algorithm to optimize the resulting non-smooth and non-convex objective function. We theoretically prove that our solution algorithm guarantees that both the objective and the solution sequences converge to globally optimal solutions at a sub-linear convergence rate. The experimental results demonstrate the effectiveness of our new method, achieving significant improvements compared to the other competing methods.more » « less
-
Multi-instance learning (MIL) is an area of machine learning that handles data that is organized into sets of instances known as bags. Traditionally, MIL is used in the supervised-learning setting and is able to classify bags which can contain any number of instances. This property allows MIL to be naturally applied to solve the problems in a wide variety of real-world applications from computer vision to healthcare. However, many traditional MIL algorithms do not scale efficiently to large datasets. In this paper we present a novel Primal-Dual Multi-Instance Support Vector Machine (pdMISVM) derivation and implementation that can operate efficiently on large scale data. Our method relies on an algorithm derived using a multi-block variation of the alternating direction method of multipliers (ADMM). The approach presented in this work is able to scale to large-scale data since it avoids iteratively solving quadratic programming problems which are generally used to optimize MIL algorithms based on SVMs. In addition, we modify our derivation to include an additional optimization designed to avoid solving a least-squares problem during our algorithm; this optimization increases the utility of our approach to handle a large number of features as well as bags. Finally, we apply our approach to synthetic and real-world multi-instance datasets to illustrate the scalability, promising predictive performance, and interpretability of our proposed method. We end our discussion with an extension of our approach to handle non-linear decision boundaries. Code and data for our methods are available online at: https://github.com/minds-mines/pdMISVM.jl.more » « less