The computation of Vietoris-Rips persistence barcodes is both execution-intensive and memory-intensive. In this paper, we study the computational structure of Vietoris-Rips persistence barcodes, and identify several unique mathematical properties and algorithmic opportunities with connections to the GPU. Mathematically and empirically, we look into the properties of apparent pairs, which are independently identifiable persistence pairs comprising up to 99% of persistence pairs. We give theoretical upper and lower bounds of the apparent pair rate and model the average case. We also design massively parallel algorithms to take advantage of the very large number of simplices that can be processed independently of each other. Having identified these opportunities, we develop a GPU-accelerated software for computing Vietoris-Rips persistence barcodes, called Ripser++. The software achieves up to 30x speedup over the total execution time of the original Ripser and also reduces CPU-memory usage by up to 2.0x. We believe our GPU-acceleration based efforts open a new chapter for the advancement of topological data analysis in the post-Moore's Law era.
Computing by Programmable Particles.
This is a chapter in Book "Distributed Computing by Mobile Entities: Current Research in Moving and Computing", Springer Nature. The vision for programmable matter is to realize a physical substance that is scalable, versatile, instantly reconfigurable, safe to handle, and robust to failures. Programmable matter could be deployed in a variety of domain spaces to address a wide gamut of problems, including applications in construction, environmental science, synthetic biology, and space exploration. However, there are considerable engineering and computational challenges that must be overcome before such a system could be implemented. Towards developing efficient algorithms for novel programmable matter behaviors, the amoebot model for self-organizing particle systems and its variant, hybrid programmable matter, provide formal computational frameworks that facilitate rigorous algorithmic research. In this chapter, we discuss distributed algorithms under these models for shape formation, shape recognition, object coating, compression, shortcut bridging, and separation in addition to some underlying algorithmic primitives.
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Page Range or eLocation-ID:
- Sponsoring Org:
- National Science Foundation
More Like this
Scalable Signal Data Processing for Measuring Functional Connectivity in Epilepsy Neurological DisorderThe accurate characterization of how different brain structures interact in terms of both structural and functional networks is an area of active research in neuroscience. A better understanding of these interactions can potentially lead to targeted treatments and improved therapies for many neurological disorders, such as epilepsy, which alone affects over 65 million people worldwide. The study of functional connectivity networks in epilepsy, which is characterized by abnormalities in brain electrical activity, will help to provide new insights into the onset and progression of this complex neurological disorder. In this chapter, we discuss statistical signal processing techniques and their use in determining functional connectivity among brain regions exhibiting epileptic activity. We also discuss computational challenges associated with deriving functional connectivity measures from neurological Big Data, and we introduce our highly scalable signal processing pipeline for quantifying functional connectivity with the goal of addressing these challenges and potentially advancing understanding of the underlying mechanisms of epilepsy. This pipeline makes use of a novel signal data format that facilitates storing and retrieving data in a distributed computing environment. We conclude the chapter by describing our current activities and proposed plans for improving our computational pipeline, such as the inclusion of biomedical ontologiesmore »
Population protocols are a popular model of distributed computing, in which randomly-interacting agents with little computational power cooperate to jointly perform computational tasks. Inspired by developments in molecular computation, and in particular DNA computing, recent algorithmic work has focused on the complexity of solving simple yet fundamental tasks in the population model, such as leader election (which requires convergence to a single agent in a special “leader” state), and majority (in which agents must converge to a decision as to which of two possible initial states had higher initial count). Known results point towards an inherent trade-off between the time complexity of such algorithms, and the space complexity, i.e. size of the memory available to each agent. In this paper, we explore this trade-off and provide new upper and lower bounds for majority and leader election. First, we prove a unified lower bound, which relates the space available per node with the time complexity achievable by a protocol: for instance, our result implies that any protocol solving either of these tasks for n agents using O(log log n) states must take Ω(n/polylogn) expected time. This is the first result to characterize time complexity for protocols which employ super-constant number ofmore »
Phylogenetic networks extend phylogenetic trees to allow for modeling reticulate evolutionary processes such as hybridization. They take the shape of a rooted, directed, acyclic graph, and when parameterized with evolutionary parameters, such as divergence times and population sizes, they form a generative process of molecular sequence evolution. Early work on computational methods for phylogenetic network inference focused exclusively on reticulations and sought networks with the fewest number of reticulations to fit the data. As processes such as incomplete lineage sorting (ILS) could be at play concurrently with hybridization, work in the last decade has shifted to computational approaches for phylogenetic network inference in the presence of ILS. In such a short period, significant advances have been made on developing and implementing such computational approaches. In particular, parsimony, likelihood, and Bayesian methods have been devised for estimating phylogenetic networks and associated parameters using estimated gene trees as data. Use of those inference methods has been augmented with statistical tests for specific hypotheses of hybridization, like the D-statistic. Most recently, Bayesian approaches for inferring phylogenetic networks directly from sequence data were developed and implemented. In this chapter, we survey such advances and discuss model assumptions as well as methods’ strengths and limitations.more »
Cliques and their generalizations are frequently used to model “tightly knit” clusters in graphs and identifying such clusters is a popular technique used in graph-based data mining. One such model is the s-club, which is a vertex subset that induces a subgraph of diameter at most s. This model has found use in a variety of fields because low-diameter clusters have practical significance in many applications. As this property is not hereditary on vertex-induced subgraphs, the diameter of a subgraph could increase upon the removal of some vertices and the subgraph could even become disconnected. For example, star graphs have diameter two but can be disconnected by removing the central vertex. The pursuit of a fault-tolerant extension of the s-club model has spawned two variants that we study in this article: robust s-clubs and hereditary s-clubs. We analyze the complexity of the verification and optimization problems associated with these variants. Then, we propose cut-like integer programming formulations for both variants whenever possible and investigate the separation complexity of the cut-like constraints. We demonstrate through our extensive computational experiments that the algorithmic ideas we introduce enable us to solve the problems to optimality on benchmark instances with several thousand vertices. Thismore »