skip to main content

Search for: All records

Creators/Authors contains: "Singh, Rohit"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Summary

    Computational methods to predict protein–protein interaction (PPI) typically segregate into sequence-based ‘bottom-up’ methods that infer properties from the characteristics of the individual protein sequences, or global ‘top-down’ methods that infer properties from the pattern of already known PPIs in the species of interest. However, a way to incorporate top-down insights into sequence-based bottom-up PPI prediction methods has been elusive. We thus introduce Topsy-Turvy, a method that newly synthesizes both views in a sequence-based, multi-scale, deep-learning model for PPI prediction. While Topsy-Turvy makes predictions using only sequence data, during the training phase it takes a transfer-learning approach by incorporating patterns from both global and molecular-level views of protein interaction. In a cross-species context, we show it achieves state-of-the-art performance, offering the ability to perform genome-scale, interpretable PPI prediction for non-model organisms with no existing experimental PPI data. In species with available experimental PPI data, we further present a Topsy-Turvy hybrid (TT-Hybrid) model which integrates Topsy-Turvy with a purely network-based model for link prediction that provides information about species-specific network rewiring. TT-Hybrid makes accurate predictions for both well- and sparsely-characterized proteins, outperforming both its constituent components as well as other state-of-the-art PPI prediction methods. Furthermore, running Topsy-Turvy and TT-Hybrid screens ismore »feasible for whole genomes, and thus these methods scale to settings where other methods (e.g. AlphaFold-Multimer) might be infeasible. The generalizability, accuracy and genome-level scalability of Topsy-Turvy and TT-Hybrid unlocks a more comprehensive map of protein interaction and organization in both model and non-model organisms.

    Availability and implementation

    https://topsyturvy.csail.mit.edu.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    « less
  2. Topological Data Analysis is a machine learning method that summarizes the topological features of a space. Persistent Homology (PH) can identify these topological features as they persist within a point cloud; persisting in respect to the connectedness of the point cloud at increasing distances. The utility of PH is apparent in several fields including bioinformatics, network security, and object classification. However, the memory complexity of PH limits the application to relatively small point clouds for low-dimensional topological feature identification. For this reason, numerous approaches to optimize and approximate the PH have been introduced for providing results over large point clouds. One solution, Partitioned Persistent Homology (PPH), has shown favorable approximation on a single node with significant performance improvement. However, the single-node approach is limited by the available system memory, leading to the need for a distributed approach for additional (especially memory) resources. This paper studies a distributed version of PPH for use with large point clouds over a high-performance compute cluster. Experimental results of the distributed algorithm against previous studies is presented along with scalability of the distributed library.
  3. Although security games have attracted intensive research attention over the past years, few existing works consider how information from local communities would affect the game. In this paper, we introduce a new player -- a strategic informant, who can observe and report upcoming attacks -- to the defender-attacker security game setting. Characterized by a private type, the informant has his utility structure that leads to his strategic behaviors. We model the game as a 3-player extensive-form game and propose a novel solution concept of Strong Stackelberg-perfect Bayesian equilibrium. To compute the optimal defender strategy, we first show that although the informant can have infinitely many types in general, the optimal defense plan can only include a finite (exponential) number of different patrol strategies. We then prove that there exists a defense plan with only a linear number of patrol strategies that achieve the optimal defender's utility, which significantly reduces the computational burden and allows us to solve the game in polynomial time using linear programming. Finally, we conduct extensive experiments to show the effect of the strategic informant and demonstrate the effectiveness of our algorithm.

  4. The plasma membranes of cells are thin viscous sheets in which some transmembrane proteins have two-dimensional mobility and some are immobilized. Previous studies have shown that immobile proteins retard the short-time diffusivity of mobile particles through hydrodynamic interactions and that steric effects of immobile proteins reduce the long-time diffusivity in a model that neglects hydrodynamic interactions. We present a rigorous derivation of the long-time diffusivity of a single mobile protein interacting hydrodynamically and thermodynamically with an array of immobile proteins subject to periodic boundary conditions. This method is based on a finite element method (FEM) solution of the probability density of the mobile protein diffusing with a position-dependent mobility determined through a multipole solution of Stokes equations. The simulated long-time diffusivity in square arrays decreases as the spacing in the array approaches the particle size in a manner consistent with a lubrication analysis. In random arrays, steric effects lead to a percolation threshold volume fraction above which long-time diffusion is arrested. The FEM/multipole approach is used to compute the long-time diffusivity far away from this threshold. An approximate analysis of mobile protein diffusion through a network of pores connected by bonds with resistances determined by the FEM/multipole calculations is thenmore »used to explore higher immobile area fractions and to evaluate the finite simulation cell size scaling behaviour of diffusion near the percolation threshold. Surprisingly, the ratio of the long-time diffusivity to the spatially averaged short-time diffusivity in these two-dimensional fixed arrays is higher in the presence of hydrodynamic interactions than in their absence. Finally, the implications of this work are discussed, including the possibility of using the methods developed here to investigate more complex diffusive phenomena observed in cell membranes.« less