Geologic processes at convergent plate margins control geochemical cycling, seismicity, and deep biosphere activity in subduction zones and suprasubduction zone lithosphere. International Ocean Discovery Program (IODP) Expedition 366 was designed to address the nature of these processes in the shallow to intermediate depth of the Mariana subduction channel. Although no technology is available to permit direct sampling of the subduction channel of an intraoceanic convergent margin at depths up to 18 km, the Mariana forearc region (between the trench and the active volcanic arc) provides a means to access this zone. Active conduits, resulting from fractures in the forearc, are prompted by along and acrossstrike extension that allows slabderived fluids and materials to ascend to the seafloor along associated faults, resulting in the formation of serpentinite mud volcanoes. Serpentinite mud volcanoes of the Mariana forearc are the largest mud volcanoes on Earth. Their positions adjacent to or atop fault scarps on the forearc are likely related to the regional extension and vertical tectonic deformation in the forearc. Serpentinite mudflows at these volcanoes include serpentinized forearc mantle clasts, crustal and subducted Pacific plate materials, a matrix of serpentinite muds, and deepsourced formation fluid. Mud volcanism on the Mariana forearc occurs withinmore »
Parallelism in Randomized Incremental Algorithms
In this article, we show that many sequential randomized incremental algorithms are in fact parallel. We consider algorithms for several problems, including Delaunay triangulation, linear programming, closest pair, smallest enclosing disk, leastelement lists, and strongly connected components. We analyze the dependencies between iterations in an algorithm and show that the dependence structure is shallow with high probability or that, by violating some dependencies, the structure is shallow and the work is not increased significantly. We identify three types of algorithms based on their dependencies and present a framework for analyzing each type. Using the framework gives workefficient polylogarithmicdepth parallel algorithms for most of the problems that we study. This article shows the first incremental Delaunay triangulation algorithm with optimal work and polylogarithmic depth. This result is important, since most implementations of parallel Delaunay triangulation use the incremental approach. Our results also improve bounds on strongly connected components and leastelement lists and significantly simplify parallel algorithms for several problems.
 Publication Date:
 NSFPAR ID:
 10222586
 Journal Name:
 Journal of the ACM
 Volume:
 67
 Issue:
 5
 Page Range or eLocationID:
 1 to 27
 ISSN:
 00045411
 Sponsoring Org:
 National Science Foundation
More Like this


Obeid, I. ; Selesnik, I. ; Picone, J. (Ed.)The Neuronix highperformance computing cluster allows us to conduct extensive machine learning experiments on big data [1]. This heterogeneous cluster uses innovative scheduling technology, Slurm [2], that manages a network of CPUs and graphics processing units (GPUs). The GPU farm consists of a variety of processors ranging from lowend consumer grade devices such as the Nvidia GTX 970 to higherend devices such as the GeForce RTX 2080. These GPUs are essential to our research since they allow extremely computeintensive deep learning tasks to be executed on massive data resources such as the TUH EEG Corpus [2]. We use TensorFlow [3] as the core machine learning library for our deep learning systems, and routinely employ multiple GPUs to accelerate the training process. Reproducible results are essential to machine learning research. Reproducibility in this context means the ability to replicate an existing experiment – performance metrics such as error rates should be identical and floatingpoint calculations should match closely. Three examples of ways we typically expect an experiment to be replicable are: (1) The same job run on the same processor should produce the same results each time it is run. (2) A job run on a CPU and GPU should producemore »

This article presents I/Oefficient algorithms for topologically sorting a directed acyclic graph and for the more general problem identifying and topologically sorting the strongly connected components of a directed graph G = ( V, E ). Both algorithms are randomized and have I/Ocosts O ( sort ( E ) · poly(log V)), with high probability, where sort ( E ) = O( E / B log M / B ( E/B )) is the I/O cost of sorting an  E element array on a machine with size B blocks and size M cache/internal memory. These are the first algorithms for these problems that do not incur at least one I/O per vertex, and as such these are the first I/Oefficient algorithms for sparse graphs. By applying the technique of timeforward processing, these algorithms also imply I/Oefficient algorithms for most problems on directed acyclic graphs, such as shortest paths, as well as the singlesource reachability problem on arbitrary directed graphs.

While there has been significant work on parallel graph processing, there has been very surprisingly little work on highperformance hypergraph processing. This paper presents a collection of efficient parallel algorithms for hypergraph processing, including algorithms for betweenness centrality, maximal independent set, kcore decomposition, hypertrees, hyperpaths, connected components, PageRank, and singlesource shortest paths. For these problems, we either provide new parallel algorithms or more efficient implementations than prior work. Furthermore, our algorithms are theoreticallyefficient in terms of work and depth. To implement our algorithms, we extend the Ligra graph processing framework to support hypergraphs, and our implementations benefit from graph optimizations including switching between sparse and dense traversals based on the frontier size, edgeaware parallelization, using buckets to prioritize processing of vertices, and compression. Our experiments on a 72core machine and show that our algorithms obtain excellent parallel speedups, and are significantly faster than algorithms in existing hypergraph processing frameworks.

Parallel radix sorting has been extensively studied in the literature for decades. However, the most efficient implementations require auxiliary memory proportional to the input size, which can be prohibitive for large inputs. The standard serial inplace radix sorting algorithm is based on swapping each key to its correct place in the array on each level of recursion. However, it is not straightforward to parallelize this algorithm due to dependencies among the swaps. This paper presents Regions Sort, a new parallel inplace radix sorting algorithm that is efficient both in theory and in practice. Our algorithm uses a graph structure to model dependencies across elements that need to be swapped, and generates independent tasks from this graph that can be executed in parallel. For sorting $n$ integers from a range $r$, and a parameter $K$, Regions Sort requires only $O(K\log r\log n)$ auxiliary memory. Our algorithm requires $O(n\log r)$ work and $O((n/K+\log K)\log r)$ span, making it workefficient and highly parallel. In addition, we present several optimizations that significantly improve the empirical performance of our algorithm. We compare the performance of Regions Sort to existing parallel inplace and outofplace sorting algorithms on a variety of input distributions and show that Regionsmore »