We introduce the ParClusterers Benchmark Suite (PCBS)---a collection of highly scalable parallel graph clustering algorithms and benchmarking tools that streamline comparing different graph clustering algorithms and implementations. The benchmark includes clustering algorithms that target a wide range of modern clustering use cases, including community detection, classification, and dense subgraph mining. The benchmark toolkit makes it easy to run and evaluate multiple instances of different clustering algorithms with respect to both the running time and quality. We evaluate the PCBS algorithms empirically and find that they deliver both the state of the art quality and the running time. In terms of the running time, they are on average over 4x faster than the fastest library we compared to. In terms of quality, the correlation clustering algorithm [Shi et al., VLDB'21] optimizing for the LambdaCC objective, which does not have a direct counterpart in other libraries, delivers the highest quality in the majority of datasets that we used.
more »
« less
Poster: The Problem-Based Benchmark Suite (PBBS), V2
The Problem-Based Benchmark Suite (PBBS) is a set of benchmark problems designed for comparing algorithms, implementations and platforms. For each problem, the suite defines the problem in terms of the input-output relationship, and supplies a set of input instances along with input generators, a default implementation, code for checking correctness or accuracy, and a timing harness. The suite makes it possible to compare different algorithms, platforms (e.g. GPU vs CPU), and implementations using different programming languages or libraries. The purpose is to better understand how well a wide variety of problems parallelize, and what techniques/algorithms are most effective. The suite was first announced in 2012 with 14 benchmark problems. Here we describe some significant updates. In particular, we have added nine new benchmarks from a mix of problems in text processing, computational geometry and machine learning. We have further optimized the default implementations; several are the fastest available for multicore CPUs, often achieving near perfect speedup on the 72 core machine we test them on. The suite now also supplies significantly larger default test instances, as well as a broader variety, with many derived from real-world data.
more »
« less
- PAR ID:
- 10416517
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
- Page Range / eLocation ID:
- 445 to 447
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The emergence of quantum computers as a new computational paradigm has been accompanied by speculation concerning the scope and timeline of their anticipated revolutionary changes. While quantum computing is still in its infancy, the variety of different architectures used to implement quantum computations make it difficult to reliably measure and compare performance. This problem motivates our introduction of SupermarQ, a scalable, hardware-agnostic quantum benchmark suite which uses application-level metrics to measure performance. SupermarQ is the first attempt to systematically apply techniques from classical benchmarking methodology to the quantum domain. We define a set of feature vectors to quantify coverage, select applications from a variety of domains to ensure the suite is representative of real workloads, and collect benchmark results from the IBM, IonQ, and AQT@LBNL platforms. Looking forward, we envision that quantum benchmarking will encompass a large cross-community effort built on open source, constantly evolving benchmark suites. We introduce SupermarQ as an important step in this direction.more » « less
-
The numerical performance of algorithms can be studied using test sets or procedures that generate such problems. This paper proposes various methods for generating linear, semidefinite, and secondorder cone optimization problems. Specifically, we are interested in problem instances requiring a known optimal solution, a known optimal partition, a specific interior solution, or all these together. In the proposed problem generators, different characteristics of optimization problems, including dimension, size, condition number, degeneracy, optimal partition, and sparsity, can be chosen to facilitate comprehensive computational experiments. We also develop procedures to generate instances with a maximally complementary optimal solution with a predetermined optimal partition to generate challenging semidefinite and second-order cone optimization problems. Generated instances enable us to evaluate efficient interior-point methods for conic optimization problems.more » « less
-
null (Ed.)There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory. This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly-available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 20 important graph problems. We also present the interfaces, optimizations, and graph processing techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We have made the implementations developed in this work publicly-available as the Graph Based Benchmark Suite (GBBS).more » « less
-
Multi-Agent Path Finding (MAPF) is a well studied problem with many existing optimal algorithms capable of solving a wide variety of instances, each with its own strengths and weaknesses. While for some instances the fastest algorithm can be easily determined, not enough is known about their performance to predict the fastest algorithm for every MAPF instance, or what makes some instances more difficult than others. There is no clear answer for which features dominate the hardness of MAPF instances. In this work, we study how betweenness centrality affects the empirical difficulty of MAPF instances. To that end, we benchmark the largest and most complete optimal MAPF algorithm portfolio to date. We analyze the algorithms’ performance independently and as part of the portfolio, and discuss how betweenness centrality can be used to improve estimations of algorithm performance on a given instance of MAPF.more » « less
An official website of the United States government
