skip to main content


Title: GPU-Accelerated Computation of Vietoris-Rips Persistence Barcodes
The computation of Vietoris-Rips persistence barcodes is both execution-intensive and memory-intensive. In this paper, we study the computational structure of Vietoris-Rips persistence barcodes, and identify several unique mathematical properties and algorithmic opportunities with connections to the GPU. Mathematically and empirically, we look into the properties of apparent pairs, which are independently identifiable persistence pairs comprising up to 99% of persistence pairs. We give theoretical upper and lower bounds of the apparent pair rate and model the average case. We also design massively parallel algorithms to take advantage of the very large number of simplices that can be processed independently of each other. Having identified these opportunities, we develop a GPU-accelerated software for computing Vietoris-Rips persistence barcodes, called Ripser++. The software achieves up to 30x speedup over the total execution time of the original Ripser and also reduces CPU-memory usage by up to 2.0x. We believe our GPU-acceleration based efforts open a new chapter for the advancement of topological data analysis in the post-Moore's Law era.  more » « less
Award ID(s):
1718450
NSF-PAR ID:
10171713
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of Symposium of Computational Geometry (SoCG 2020)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    We study a family of invariants of compact metric spaces that combines the Curvature Sets defined by Gromov in the 1980 s with Vietoris–Rips Persistent Homology. For given integers$$k\ge 0$$k0and$$n\ge 1$$n1we consider the dimensionkVietoris–Rips persistence diagrams ofallsubsets of a given metric space with cardinality at mostn. We call these invariantspersistence setsand denote them as$${\textbf{D}}_{n,k}^{\textrm{VR}}$$Dn,kVR. We first point out that this family encompasses the usual Vietoris–Rips diagrams. We then establish that (1) for certain range of values of the parametersnandk, computing these invariants is significantly more efficient than computing the usual Vietoris–Rips persistence diagrams, (2) these invariants have very good discriminating power and, in many cases, capture information that is imperceptible through standard Vietoris–Rips persistence diagrams, and (3) they enjoy stability properties analogous to those of the usual Vietoris–Rips persistence diagrams. We precisely characterize some of them in the case of spheres and surfaces with constant curvature using a generalization of Ptolemy’s inequality. We also identify a rich family of metric graphs for which$${\textbf{D}}_{4,1}^{\textrm{VR}}$$D4,1VRfully recovers their homotopy type by studying split-metric decompositions. Along the way we prove some useful properties of Vietoris–Rips persistence diagrams using Mayer–Vietoris sequences. These yield a geometric algorithm for computing the Vietoris–Rips persistence diagram of a spaceXwith cardinality$$2k+2$$2k+2with quadratic time complexity as opposed to the much higher cost incurred by the usual algebraic algorithms relying on matrix reduction.

     
    more » « less
  2. In general, the performance of parallel graph processing is determined by three pairs of critical parameters, namely synchronous or asynchronous execution mode (Sync or Async), Push or Pull communication mechanism (Push or Pull), and Data-driven or Topology-driven traversing scheme (DD or TD), which increases the complexity and sophistication of programming and system implementation of GPU. Existing graph-processing frameworks mainly use a single combination in the entire execution for a given application, but we have observed their variable and suboptimal performance. In this paper, we present SEP-Graph, a highly efficient software framework for graph-processing on GPU. The hybrid execution mode is automatically switched among three pairs of parameters, with an objective to achieve the shortest execution time in each iteration. We also apply a set of optimizations to SEP-Graph, considering the characteristics of graph algorithms and underlying GPU architectures. We show the effectiveness of SEP-Graph based on our intensive and comparative performance evaluation on NVIDIA 1080, P100, and V100 GPUs. Compared with existing and representative GPU graph-processing framework Groute and Gunrock, SEP-Graph can reduce execution time up to 45.8 times and 39.4 times. 
    more » « less
  3. Outlier detection (OD) is a key machine learning task for finding rare and deviant data samples, with many time-critical applications such as fraud detection and intrusion detection. In this work, we propose TOD, the first tensor-based system for efficient and scalable outlier detection on distributed multi-GPU machines. A key idea behind TOD is decomposing complex OD applications into a small collection of basic tensor algebra operators. This decomposition enables TOD to accelerate OD computations by leveraging recent advances in deep learning infrastructure in both hardware and software. Moreover, to deploy memory-intensive OD applications on modern GPUs with limited on-device memory, we introduce two key techniques. First, provable quantization speeds up OD computations and reduces its memory footprint by automatically performing specific floating-point operations in lower precision while provably guaranteeing no accuracy loss. Second, to exploit the aggregated compute resources and memory capacity of multiple GPUs, we introduce automatic batching , which decomposes OD computations into small batches for both sequential execution on a single GPU and parallel execution across multiple GPUs. TOD supports a diverse set of OD algorithms. Evaluation on 11 real-world and 3 synthetic OD datasets shows that TOD is on average 10.9X faster than the leading CPU-based OD system PyOD (with a maximum speedup of 38.9X), and can handle much larger datasets than existing GPU-based OD systems. In addition, TOD allows easy integration of new OD operators, enabling fast prototyping of emerging and yet-to-discovered OD algorithms. 
    more » « less
  4. The performance of modern Big Data frameworks, e.g. Spark, depends greatly on high-speed storage and shuffling, which impose a significant memory burden on production data centers. In many production situations, the persistence and shuffling intensive applications can suffer a major performance loss due to lack of memory. Thus, the common practice is usually to over-allocate the memory assigned to the data workers for production applications, which in turn reduces overall resource utilization. One efficient way to address the dilemma between the performance and cost efficiency of Big Data applications is through data center computing resource disaggregation. This paper proposes and implements a system that incorporates the Spark Big Data framework with a novel in-memory distributed file system to achieve memory disaggregation for data persistence and shuffling. We address the challenge of optimizing performance at affordable cost by co-designing the proposed in-memory distributed file system with large-volume DIMM-based persistent memory (PMEM) and RDMA technology. The disaggregation design allows each part of the system to be scaled independently, which is particularly suitable for cloud deployments. The proposed system is evaluated in a production-level cluster using real enterprise-level Spark production applications. The results of an empirical evaluation show that the system can achieve up to a 3.5- fold performance improvement for shuffle-intensive applications with the same amount of memory, compared to the default Spark setup. Moreover, by leveraging PMEM, we demonstrate that our system can effectively increase the memory capacity of the computing cluster with affordable cost, with a reasonable execution time overhead with respect to using local DRAM only. 
    more » « less
  5. Integrated CPU-GPU architecture provides excellent acceleration capabilities for data parallel applications on embedded platforms while meeting the size, weight and power (SWaP) requirements. However, sharing of main memory between CPU applications and GPU kernels can severely affect the execution of GPU kernels and diminish the performance gain provided by GPU. For example, in the NVIDIA Jetson TX2 platform, an integrated CPU-GPU architecture, we observed that, in the worst case, the GPU kernels can suffer as much as 3X slowdown in the presence of co-running memory intensive CPU applications. In this paper, we propose a software mechanism, which we call BWLOCK++, to protect the performance of GPU kernels from co-scheduled memory intensive CPU applications. 
    more » « less