NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Persistence Homology of Proximity Hyper-Graphs for Higher Dimensional Big Data

https://doi.org/10.1109/BigData55660.2022.10020926

Singh, Rohit P.; Wilsey, Philip A. (December 2022, IEEE International Conference on Big Data)

Persistent Homology (PH) is a method of Topological Data Analysis that analyzes the topological structure of data to help data scientists infer relationships in the data to assist in informed decision- making. A significant component in the computation of PH is the construction and use of a complex that represents the topological structure of the data. Some complex types are fast to construct but space inefficient whereas others are costly to construct and space efficient. Unfortunately, existing complex types are not both fast to construct and compact. This paper works to increase the scope of PH to support the computation of low dimensional homologies (H0 –H10 ) in high-dimension, big data. In particular, this paper exploits the desirable properties of the Vietoris–Rips Complex (VR-Complex) and the Delaunay Complex in order to construct a sparsified complex. The VR-Complex uses a distance matrix to quickly generate a complex up to the desired homology dimension. In contrast, the Delaunay Complex works at the dimensionality of the data to generate a sparsified complex. While construction of the VR-Complex is fast, its size grows exponentially by the size and dimension of the data set; in contrast, the Delaunay complex is significantly smaller for any given data dimension. However, its construction requires the computation of a Delaunay Triangulation that has high computational complexity. As a result, it is difficult to construct a Delaunay Complex for data in dimensions d > 6 that contains more than a few hundred points. The techniques in this paper enable the computation of topological preserving sparsification of k-Simplices (where k ≪ d) to quickly generate a reduced sparsified complex sufficient to compute homologies up to k-subspace, irrespective of the data dimensionality d.
more » « less
Full Text Available
Polytopal Complex Construction and Use in Persistent Homology

https://doi.org/10.1109/ICDMW58026.2022.00087

Singh, Rohit P.; Wilsey, Philip A. (November 2022, ICDM Workshop on High Dimensional Data Mining)

Topological Data Analysis (TDA) is a data mining technique to characterize the topological features of data. Persistent Homology (PH) is an important tool of TDA that has been applied to a wide range of applications. However its time and space complexities motivates a need for new methods to compute the PH of high-dimensional data. An important, and memory intensive, element in the computation of PH is the complex constructed from the input data. In general, PH tools use and focus on optimizing simplicial complexes; less frequently cubical complexes are also studied. This paper develops a method to construct polytopal complexes (or complexes constructed of any mix of convex polytopes) in any dimension Rn . In general, polytopal complexes are significantly smaller than simplicial or cubical complexes. This paper includes an experimental assessment of the impact that polytopal complexes have on memory complexity and output results of a PH computation.
more » « less
Full Text Available
D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions

https://doi.org/10.1016/j.cels.2021.08.010

Sledzieski, Samuel; Singh, Rohit; Cowen, Lenore; Berger, Bonnie (October 2021, Cell Systems)

Full Text Available
Topsy-Turvy: integrating a global view into sequence-based PPI prediction

https://doi.org/10.1093/bioinformatics/btac258

Singh, Rohit; Devkota, Kapil; Sledzieski, Samuel; Berger, Bonnie; Cowen, Lenore (June 2022, Bioinformatics)

Abstract SummaryComputational methods to predict protein–protein interaction (PPI) typically segregate into sequence-based ‘bottom-up’ methods that infer properties from the characteristics of the individual protein sequences, or global ‘top-down’ methods that infer properties from the pattern of already known PPIs in the species of interest. However, a way to incorporate top-down insights into sequence-based bottom-up PPI prediction methods has been elusive. We thus introduce Topsy-Turvy, a method that newly synthesizes both views in a sequence-based, multi-scale, deep-learning model for PPI prediction. While Topsy-Turvy makes predictions using only sequence data, during the training phase it takes a transfer-learning approach by incorporating patterns from both global and molecular-level views of protein interaction. In a cross-species context, we show it achieves state-of-the-art performance, offering the ability to perform genome-scale, interpretable PPI prediction for non-model organisms with no existing experimental PPI data. In species with available experimental PPI data, we further present a Topsy-Turvy hybrid (TT-Hybrid) model which integrates Topsy-Turvy with a purely network-based model for link prediction that provides information about species-specific network rewiring. TT-Hybrid makes accurate predictions for both well- and sparsely-characterized proteins, outperforming both its constituent components as well as other state-of-the-art PPI prediction methods. Furthermore, running Topsy-Turvy and TT-Hybrid screens is feasible for whole genomes, and thus these methods scale to settings where other methods (e.g. AlphaFold-Multimer) might be infeasible. The generalizability, accuracy and genome-level scalability of Topsy-Turvy and TT-Hybrid unlocks a more comprehensive map of protein interaction and organization in both model and non-model organisms. Availability and implementationhttps://topsyturvy.csail.mit.edu. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less
Post-Fabrication Microarchitecture

https://doi.org/10.1145/3466752.3480119

Kumar, Chanchal; Seshadri, Anirudh; Chaudhary, Aayush; Bhawalkar, Shubham; Singh, Rohit; Rotenberg, Eric (October 2021, MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture)

Full Text Available
Distributed Computation of Persistent Homology from Partitioned Big Data

https://doi.org/10.1109/Cluster48925.2021.00050

Malott, Nicholas O.; Verma, Rishi R.; Singh, Rohit P.; Wilsey, Philip A. (September 2021, IEEE International Conference on Cluster Computing)

Topological Data Analysis is a machine learning method that summarizes the topological features of a space. Persistent Homology (PH) can identify these topological features as they persist within a point cloud; persisting in respect to the connectedness of the point cloud at increasing distances. The utility of PH is apparent in several fields including bioinformatics, network security, and object classification. However, the memory complexity of PH limits the application to relatively small point clouds for low-dimensional topological feature identification. For this reason, numerous approaches to optimize and approximate the PH have been introduced for providing results over large point clouds. One solution, Partitioned Persistent Homology (PPH), has shown favorable approximation on a single node with significant performance improvement. However, the single-node approach is limited by the available system memory, leading to the need for a distributed approach for additional (especially memory) resources. This paper studies a distributed version of PPH for use with large point clouds over a high-performance compute cluster. Experimental results of the distributed algorithm against previous studies is presented along with scalability of the distributed library.
more » « less
Full Text Available
The combined hydrodynamic and thermodynamic effects of immobilized proteins on the diffusion of mobile transmembrane proteins

https://doi.org/10.1017/jfm.2019.592

Singh, Rohit R.; Sangani, Ashok S.; Daniel, Susan; Koch, Donald L. (October 2019, Journal of Fluid Mechanics)

The plasma membranes of cells are thin viscous sheets in which some transmembrane proteins have two-dimensional mobility and some are immobilized. Previous studies have shown that immobile proteins retard the short-time diffusivity of mobile particles through hydrodynamic interactions and that steric effects of immobile proteins reduce the long-time diffusivity in a model that neglects hydrodynamic interactions. We present a rigorous derivation of the long-time diffusivity of a single mobile protein interacting hydrodynamically and thermodynamically with an array of immobile proteins subject to periodic boundary conditions. This method is based on a finite element method (FEM) solution of the probability density of the mobile protein diffusing with a position-dependent mobility determined through a multipole solution of Stokes equations. The simulated long-time diffusivity in square arrays decreases as the spacing in the array approaches the particle size in a manner consistent with a lubrication analysis. In random arrays, steric effects lead to a percolation threshold volume fraction above which long-time diffusion is arrested. The FEM/multipole approach is used to compute the long-time diffusivity far away from this threshold. An approximate analysis of mobile protein diffusion through a network of pores connected by bonds with resistances determined by the FEM/multipole calculations is then used to explore higher immobile area fractions and to evaluate the finite simulation cell size scaling behaviour of diffusion near the percolation threshold. Surprisingly, the ratio of the long-time diffusivity to the spatially averaged short-time diffusivity in these two-dimensional fixed arrays is higher in the presence of hydrodynamic interactions than in their absence. Finally, the implications of this work are discussed, including the possibility of using the methods developed here to investigate more complex diffusive phenomena observed in cell membranes.
more » « less
Full Text Available
When to Follow the Tip: Security Games with Strategic Informants

https://doi.org/10.24963/ijcai.2020/52

Shen, Weiran; Chen, Weizhe; Huang, Taoan; Singh, Rohit; Fang, Fei (July 2020, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence)

Although security games have attracted intensive research attention over the past years, few existing works consider how information from local communities would affect the game. In this paper, we introduce a new player -- a strategic informant, who can observe and report upcoming attacks -- to the defender-attacker security game setting. Characterized by a private type, the informant has his utility structure that leads to his strategic behaviors. We model the game as a 3-player extensive-form game and propose a novel solution concept of Strong Stackelberg-perfect Bayesian equilibrium. To compute the optimal defender strategy, we first show that although the informant can have infinitely many types in general, the optimal defense plan can only include a finite (exponential) number of different patrol strategies. We then prove that there exists a defense plan with only a linear number of patrol strategies that achieve the optimal defender's utility, which significantly reduces the computational burden and allows us to solve the game in polynomial time using linear programming. Finally, we conduct extensive experiments to show the effect of the strategic informant and demonstrate the effectiveness of our algorithm.
more » « less
Full Text Available
Modulated orientation-sensitive terahertz spectroscopy

https://doi.org/10.1364/PRJ.4.0000A1

Singh, Rohit; George, Deepu Koshy; Bae, Chejin; Niessen, K. A.; Markelz, A. G. (June 2016, Photonics Research)

Search for: All records