One of the major challenges in parallelization is the difficulty of improving application scalability with conventional techniques. HPX provides efficient scalable parallelism by significantly reducing node starvation and effective latencies while controlling the overheads. In this paper, we present a new highly scalable parallel distributed N-Body application using a future-based algorithm, which is implemented with HPX. The main difference between this algorithm and prior art is that a future-based request buffer is used between different nodes and along each spatial direction to send/receive data to/from the remote nodes, which helps removing synchronization barriers. HPX provides an asynchronous programming model which results in improving the parallel performance. The results of using HPX for parallelizing Octree construction on one node and the force computation on the distributed nodes show the scalability improvement on an average by about 45% compared to an equivalent OpenMP implementation and 28% compared to a hybrid implementation (MPI+OpenMP) [1] respectively for one billion particles running on up to 128 nodes with 20 cores per each.
more »
« less
PREEMPT: Scalable Epidemic Interventions Using Submodular Optimization on Multi-GPU Systems
Preventing and slowing the spread of epidemics is achieved through techniques such as vaccination and social distancing. Given practical limitations on the number of vaccines and cost of administration, optimization becomes a necessity. Previous approaches using mathematical programming methods have shown to be effective but are limited by computational costs. In this work, we present PREEMPT, a new approach for intervention via maximizing the influence of vaccinated nodes on the network.We prove submodular properties associated with the objective function of our method so that it aids in construction of an efficient greedy approximation strategy. Consequently, we present a new parallel algorithm based on greedy hill climbing for PREEMPT, and present an efficient parallel implementation for distributed CPU-GPU heterogeneous platforms. Our results demonstrate that PREEMPT is able to achieve a significant reduction (up to 6:75) in the percentage of people infected and up to 98% reduction in the peak of the infection on a cityscale network. We also show strong scaling results of PREEMPT on up to 128 nodes of the Summit supercomputer. Our parallel implementation is able to significantly reduce time to solution, from hours to minutes on large networks. This work represents a first-of-its-kind effort in parallelizing greedy hill climbing and applying it toward devising effective interventions for epidemics.
more »
« less
- PAR ID:
- 10213737
- Date Published:
- Journal Name:
- International Conference for High Performance Computing Networking Storage and Analysis
- ISSN:
- 2167-4337
- Page Range / eLocation ID:
- 765-799
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Evaluating AMR parsing accuracy involves comparing pairs of AMR graphs. The major evaluation metric, SMATCH (Cai and Knight,2013), searches for one-to-one mappings between the nodes of two AMRs with a greedy hill-climbing algorithm, which leads to search errors. We propose SEMBLEU, a robust metric that extends BLEU (Papineni et al., 2002) to AMRs. It does not suffer from search errors and considers non-local correspondences in addition to local ones. SEMBLEU is fully content-driven and punishes situations where a system’s output does not preserve most information from the input. Preliminary experiments on both sentence and corpus levels show that SEMBLEU has slightly higher consistency with human judgments than SMATCH. Our code is available athttp://github.com/freesunshine0316/sembleu.more » « less
-
Efficient contact tracing and isolation is an effective strategy to control epidemics, as seen in the Ebola epidemic and COVID-19 pandemic. An important consideration in contact tracing is the budget on the number of individuals asked to quarantine—the budget is limited for socioeconomic reasons (e.g., having a limited number of contact tracers). Here, we present a Markov Decision Process (MDP) framework to formulate the problem of using contact tracing to reduce the size of an outbreak while limiting the number of people quarantined. We formulate each step of the MDP as a combinatorial problem, MinExposed, which we demonstrate is NP-Hard. Next, we develop two approximation algorithms, one based on rounding the solutions of a linear program and another (greedy algorithm) based on choosing nodes with a high (weighted) degree. A key feature of the greedy algorithm is that it does not need complete information of the underlying social contact network, making it implementable in practice. Using simulations over realistic networks, we show how the algorithms can help in bending the epidemic curve with a limited number of isolated individuals.more » « less
-
ABSTRACT Efficient contact tracing and isolation is an effective strategy to control epidemics, as seen in the Ebola epidemic and COVID-19 pandemic. An important consideration in contact tracing is the budget on the number of individuals asked to quarantine—the budget is limited for socioeconomic reasons (e.g., having a limited number of contact tracers). Here, we present a Markov Decision Process (MDP) framework to formulate the problem of using contact tracing to reduce the size of an outbreak while limiting the number of people quarantined. We formulate each step of the MDP as a combinatorial problem, MinExposed, which we demonstrate is NP-Hard. Next, we develop two approximation algorithms, one based on rounding the solutions of a linear program and another (greedy algorithm) based on choosing nodes with a high (weighted) degree. A key feature of the greedy algorithm is that it does not need complete information of the underlying social contact network, making it implementable in practice. Using simulations over realistic networks, we show how the algorithms can help in bending the epidemic curve with a limited number of isolated individuals.more » « less
-
Persistent homology (PH) matrix reduction is an important tool for data analytics in many application areas. Due to its highly irregular execution patterns in computation, it is challenging to gain high efficiency in parallel processing for increasingly large data sets. In this paper, we introduce HYPHA, a HYbrid Persistent Homology matrix reduction Accelerator, to make parallel processing highly efficient on both GPU and multicore. The essential foundation of our algorithm design and implementation is the separation of SIMT and MIMD parallelisms in PH matrix reduction computation. With such a separation, we are able to perform massive parallel scanning operations on GPU in a super-fast manner, which also collects rich information from an input boundary matrix for further parallel reduction operations on multicore with high efficiency. The HYPHA framework may provide a general purpose guidance to high performance computing on multiple hardware accelerators. To our best knowledge, HYPHA achieves the highest performance in PH matrix reduction execution. Our experiments show speedups of up to 116x against the standard PH algorithm. Compared to the state-of-the-art parallel PH software packages, such as PHAT and DIPHA, HYPHA outperforms their fastest PH matrix reduction algorithms by factor up to 2.3x.more » « less
An official website of the United States government

