skip to main content

Title: Coupled matrix–matrix and coupled tensor–matrix completion methods for predicting drug–target interactions
Abstract Predicting the interactions between drugs and targets plays an important role in the process of new drug discovery, drug repurposing (also known as drug repositioning). There is a need to develop novel and efficient prediction approaches in order to avoid the costly and laborious process of determining drug–target interactions (DTIs) based on experiments alone. These computational prediction approaches should be capable of identifying the potential DTIs in a timely manner. Matrix factorization methods have been proven to be the most reliable group of methods. Here, we first propose a matrix factorization-based method termed ‘Coupled Matrix–Matrix Completion’ (CMMC). Next, in order to utilize more comprehensive information provided in different databases and incorporate multiple types of scores for drug–drug similarities and target–target relationship, we then extend CMMC to ‘Coupled Tensor–Matrix Completion’ (CTMC) by considering drug–drug and target–target similarity/interaction tensors. Results: Evaluation on two benchmark datasets, DrugBank and TTD, shows that CTMC outperforms the matrix-factorization-based methods: GRMF, $L_{2,1}$-GRMF, NRLMF and NRLMF$\beta $. Based on the evaluation, CMMC and CTMC outperform the above three methods in term of area under the curve, F1 score, sensitivity and specificity in a considerably shorter run time.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Briefings in Bioinformatics
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Knowledge graphs (KGs) are powerful tools that codify relational behaviour between entities in knowledge bases. KGs can simultaneously model many different types of subject-predicate-object and higher-order relations. As such, they offer a flexible modeling framework that has been applied to many areas, including biology and pharmacology – most recently, in the fight against COVID-19. The flexibility of KG modeling is both a blessing and a challenge from the learning point of view. In this paper we propose a novel coupled tensor-matrix framework for KG embedding. We leverage tensor factorization tools to learn concise representations of entities and relations in knowledge bases and employ these representations to perform drug repurposing for COVID-19. Our proposed framework is principled, elegant, and achieves 100% improvement over the best baseline in the COVID-19 drug repurposing task using a recently developed biological KG. 
    more » « less
  2. Hyperspectral super-resolution refers to the task of fusing a hyperspectral image (HSI) and a multispectral image (MSI) in order to produce a super-resolution image (SRI) that has high spatial and spectral resolution. Popular methods leverage matrix factorization that models each spectral pixel as a convex combination of spectral signatures belonging to a few endmembers. These methods are considered state-of-the-art, but several challenges remain. First, multiband images are naturally three dimensional (3-d) signals, while matrix methods usually ignore the 3-d structure, which is prone to information losses. Second, these methods do not provide identifiability guarantees under which the reconstruction task is feasible. Third, a tacit assumption is that the degradation operators from SRI to MSI and HSI are known - which is hardly the case in practice. Recently [1], [2] proposed a coupled tensor factorization approach to handle these issues. In this work we propose a hybrid model that combines the benefits of tensor and matrix factorization approaches. We also develop a new algorithm that is mathematically simple, enjoys identifiability under relaxed conditions and is completely agnostic of the spatial degradation operator. Experimental results with real hyperspectral data showcase the effectiveness of the proposed approach. 
    more » « less
  3. An extensively studied phenomenon of the past few years in training deep networks is the implicit bias of gradient descent towards parsimonious solutions. In this work, we further investigate this phenomenon by narrowing our focus to deep matrix factorization, where we reveal surprising low-dimensional structures in the learning dynamics when the target matrix is low-rank. Specifically, we show that the evolution of gradient descent starting from arbitrary orthogonal initialization only affects a minimal portion of singular vector spaces across all weight matrices. In other words, the learning process happens only within a small invariant subspace of each weight matrix, despite the fact that all parameters are updated throughout training. From this, we provide rigorous justification for low-rank training in a specific, yet practical setting. In particular, we demonstrate that we can construct compressed factorizations that are equivalent to full-width, deep factorizations throughout training for solving low-rank matrix completion problems efficiently. 
    more » « less
  4. The continuous-time Markov chain (CTMC) is the mathematical workhorse of evolutionary biology. Learning CTMC model parameters using modern, gradient-based methods requires the derivative of the matrix exponential evaluated at the CTMC’s infinitesimal generator (rate) matrix. Motivated by the derivative’s extreme computational complexity as a function of state space cardinality, recent work demonstrates the surprising effectiveness of a naive, first-order approximation for a host of problems in computational biology. In response to this empirical success, we obtain rigorous deterministic and probabilistic bounds for the error accrued by the naive approximation and establish a “blessing of dimensionality” result that is universal for a large class of rate matrices with random entries. Finally, we apply the first-order approximation within surrogate-trajectory Hamiltonian Monte Carlo for the analysis of the early spread of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) across 44 geographic regions that comprise a state space of unprecedented dimensionality for unstructured (flexible) CTMC models within evolutionary biology.

    more » « less
  5. Techniques of matrix completion aim to impute a large portion of missing entries in a data matrix through a small portion of observed ones. In practice, prior information and special structures are usually employed in order to improve the accuracy of matrix completion. In this paper, we propose a unified nonconvex optimization framework for matrix completion with linearly parameterized factors. In particular, by introducing a condition referred to as Correlated Parametric Factorization, we conduct a unified geometric analysis for the nonconvex objective by establishing uniform upper bounds for low-rank estimation resulting from any local minimizer. Perhaps surprisingly, the condition of Correlated Parametric Factorization holds for important examples including subspace-constrained matrix completion and skew-symmetric matrix completion. The effectiveness of our unified nonconvex optimization method is also empirically illustrated by extensive numerical simulations. 
    more » « less