Geologic processes at convergent plate margins control geochemical cycling, seismicity, and deep biosphere activity in subduction zones and suprasubduction zone lithosphere. International Ocean Discovery Program (IODP) Expedition 366 was designed to address the nature of these processes in the shallow to intermediate depth of the Mariana subduction channel. Although no technology is available to permit direct sampling of the subduction channel of an intraoceanic convergent margin at depths up to 18 km, the Mariana forearc region (between the trench and the active volcanic arc) provides a means to access this zone. Active conduits, resulting from fractures in the forearc, are prompted by along- and across-strike extension that allows slab-derived fluids and materials to ascend to the seafloor along associated faults, resulting in the formation of serpentinite mud volcanoes. Serpentinite mud volcanoes of the Mariana forearc are the largest mud volcanoes on Earth. Their positions adjacent to or atop fault scarps on the forearc are likely related to the regional extension and vertical tectonic deformation in the forearc. Serpentinite mudflows at these volcanoes include serpentinized forearc mantle clasts, crustal and subducted Pacific plate materials, a matrix of serpentinite muds, and deep-sourced formation fluid. Mud volcanism on the Mariana forearc occurs withinmore »
Parallelism in Randomized Incremental Algorithms
In this article, we show that many sequential randomized incremental algorithms are in fact parallel. We consider algorithms for several problems, including Delaunay triangulation, linear programming, closest pair, smallest enclosing disk, least-element lists, and strongly connected components. We analyze the dependencies between iterations in an algorithm and show that the dependence structure is shallow with high probability or that, by violating some dependencies, the structure is shallow and the work is not increased significantly. We identify three types of algorithms based on their dependencies and present a framework for analyzing each type. Using the framework gives work-efficient polylogarithmic-depth parallel algorithms for most of the problems that we study. This article shows the first incremental Delaunay triangulation algorithm with optimal work and polylogarithmic depth. This result is important, since most implementations of parallel Delaunay triangulation use the incremental approach. Our results also improve bounds on strongly connected components and least-element lists and significantly simplify parallel algorithms for several problems.
- Publication Date:
- NSF-PAR ID:
- 10222586
- Journal Name:
- Journal of the ACM
- Volume:
- 67
- Issue:
- 5
- Page Range or eLocation-ID:
- 1 to 27
- ISSN:
- 0004-5411
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Obeid, I. ; Selesnik, I. ; Picone, J. (Ed.)The Neuronix high-performance computing cluster allows us to conduct extensive machine learning experiments on big data [1]. This heterogeneous cluster uses innovative scheduling technology, Slurm [2], that manages a network of CPUs and graphics processing units (GPUs). The GPU farm consists of a variety of processors ranging from low-end consumer grade devices such as the Nvidia GTX 970 to higher-end devices such as the GeForce RTX 2080. These GPUs are essential to our research since they allow extremely compute-intensive deep learning tasks to be executed on massive data resources such as the TUH EEG Corpus [2]. We use TensorFlow [3] as the core machine learning library for our deep learning systems, and routinely employ multiple GPUs to accelerate the training process. Reproducible results are essential to machine learning research. Reproducibility in this context means the ability to replicate an existing experiment – performance metrics such as error rates should be identical and floating-point calculations should match closely. Three examples of ways we typically expect an experiment to be replicable are: (1) The same job run on the same processor should produce the same results each time it is run. (2) A job run on a CPU and GPU should producemore »
-
This article presents I/O-efficient algorithms for topologically sorting a directed acyclic graph and for the more general problem identifying and topologically sorting the strongly connected components of a directed graph G = ( V, E ). Both algorithms are randomized and have I/O-costs O ( sort ( E ) · poly(log V)), with high probability, where sort ( E ) = O( E / B log M / B ( E/B )) is the I/O cost of sorting an | E |-element array on a machine with size- B blocks and size- M cache/internal memory. These are the first algorithms for these problems that do not incur at least one I/O per vertex, and as such these are the first I/O-efficient algorithms for sparse graphs. By applying the technique of time-forward processing, these algorithms also imply I/O-efficient algorithms for most problems on directed acyclic graphs, such as shortest paths, as well as the single-source reachability problem on arbitrary directed graphs.
-
While there has been significant work on parallel graph processing, there has been very surprisingly little work on high-performance hypergraph processing. This paper presents a collection of efficient parallel algorithms for hypergraph processing, including algorithms for betweenness centrality, maximal independent set, k-core decomposition, hypertrees, hyperpaths, connected components, PageRank, and single-source shortest paths. For these problems, we either provide new parallel algorithms or more efficient implementations than prior work. Furthermore, our algorithms are theoretically-efficient in terms of work and depth. To implement our algorithms, we extend the Ligra graph processing framework to support hypergraphs, and our implementations benefit from graph optimizations including switching between sparse and dense traversals based on the frontier size, edge-aware parallelization, using buckets to prioritize processing of vertices, and compression. Our experiments on a 72-core machine and show that our algorithms obtain excellent parallel speedups, and are significantly faster than algorithms in existing hypergraph processing frameworks.
-
Parallel radix sorting has been extensively studied in the literature for decades. However, the most efficient implementations require auxiliary memory proportional to the input size, which can be prohibitive for large inputs. The standard serial in-place radix sorting algorithm is based on swapping each key to its correct place in the array on each level of recursion. However, it is not straightforward to parallelize this algorithm due to dependencies among the swaps. This paper presents Regions Sort, a new parallel in-place radix sorting algorithm that is efficient both in theory and in practice. Our algorithm uses a graph structure to model dependencies across elements that need to be swapped, and generates independent tasks from this graph that can be executed in parallel. For sorting $n$ integers from a range $r$, and a parameter $K$, Regions Sort requires only $O(K\log r\log n)$ auxiliary memory. Our algorithm requires $O(n\log r)$ work and $O((n/K+\log K)\log r)$ span, making it work-efficient and highly parallel. In addition, we present several optimizations that significantly improve the empirical performance of our algorithm. We compare the performance of Regions Sort to existing parallel in-place and out-of-place sorting algorithms on a variety of input distributions and show that Regionsmore »