NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Parallel Batch Queries on Dynamic Trees: Algorithms and Experiments

https://doi.org/10.1145/3694906.3743305

Ikram, Humza; Brady, Andrew; Anderson, Daniel; Blelloch, Guy E (July 2025, ACM)

Free, publicly-accessible full text available July 16, 2026
Big Atomics and Fast Hash Tables

https://doi.org/10.1145/3710848.3710874

Anderson, Daniel; Blelloch, Guy E; Jayanti, Siddhartha V (February 2025, ACM)

Free, publicly-accessible full text available February 28, 2026
Deterministic and Low-Span Work-Efficient Parallel Batch-Dynamic Trees

https://doi.org/10.1145/3626183.3659976

Anderson, Daniel; Blelloch, Guy E (June 2024, ACM Symposium on Parallelism in Algorithms and Architectures (SPAA))

Dynamic trees are a well-studied and fundamental building block of dynamic graph algorithms dating back to the seminal work of Sleator and Tarjan [STOC'81, (1981), pp. 114-122]. The problem is to maintain a tree subject to online edge insertions and deletions while answering queries about the tree, such as the heaviest weight on a path, etc. In the parallel batch-dynamic setting, the goal is to process batches of edge updates work efficiently in low (polylog n) span. Two work-efficient algorithms are known: batch-parallel Euler Tour Trees by Tseng et al. [ALENEX'19, (2019), pp. 92--106] and parallel Rake-Compress (RC) Trees by Acar et al. [ESA'20, (2020), pp. 2:1--2:23]. Both however are randomized and work efficient in expectation. Several downstream results that use these data structures (and indeed to the best of our knowledge, all known work-efficient parallel batch-dynamic graph algorithms) are therefore also randomized. In this work, we give the first deterministic work-efficient solution to the problem. Our algorithm maintains a parallel RC-Tree on n vertices subject to batches of k edge updates deterministically in worst-case O(k log(1 + n/k)) work and O(log n loglog k) span on the Common-CRCW PRAM. We also show how to improve the span of the randomized algorithm from O(log n log* n) to O(log n). Lastly, as a result of our new deterministic algorithm, we also derandomize several downstream results that make use of parallel batch-dynamic dynamic trees, previously for which the only efficient solutions were randomized.
more » « less
Full Text Available
Parallel Minimum Cuts in O ( m log ² n ) Work and Low Depth

https://doi.org/10.1145/3565557

Anderson, Daniel; Blelloch, Guy E (December 2023, ACM Transactions on Parallel Computing)

We present a randomizedO(mlog²n) work,O(polylogn) depth parallel algorithm for minimum cut. This algorithm matches the work bounds of a recent sequential algorithm by Gawrychowski, Mozes, and Weimann [ICALP’20], and improves on the previously best parallel algorithm by Geissmann and Gianinazzi [SPAA’18], which performsO(mlog⁴n) work inO(polylogn) depth. Our algorithm makes use of three components that might be of independent interest. First, we design a parallel data structure that efficiently supports batched mixed queries and updates on trees. It generalizes and improves the work bounds of a previous data structure of Geissmann and Gianinazzi and is work efficient with respect to the best sequential algorithm. Second, we design a parallel algorithm for approximate minimum cut that improves on previous results by Karger and Motwani. We use this algorithm to give a work-efficient procedure to produce a tree packing, as in Karger’s sequential algorithm for minimum cuts. Last, we design an efficient parallel algorithm for solving the minimum 2-respecting cut problem.
more » « less
Full Text Available
Smarter Atomic Smart Pointers: Safe and Efficient Concurrent Memory Management (Abstract)

https://doi.org/10.1145/3597635.3598027

Anderson, Daniel; Blelloch, Guy E.; Wei, Yuanhao (July 2023, ACM)

Full Text Available
OH, HO ₂ , and RO ₂ radical chemistry in a rural forest environment: measurements, model comparisons, and evidence of a missing radical sink

https://doi.org/10.5194/acp-23-10287-2023

Bottorff, Brandon; Lew, Michelle M.; Woo, Youngjun; Rickly, Pamela; Rollings, Matthew D.; Deming, Benjamin; Anderson, Daniel C.; Wood, Ezra; Alwe, Hariprasad D.; Millet, Dylan B.; et al (September 2024, Atmospheric Chemistry and Physics)

Abstract. The hydroxyl (OH), hydroperoxy (HO2), and organic peroxy (RO2)radicals play important roles in atmospheric chemistry. In the presence ofnitrogen oxides (NOx), reactions between OH and volatile organiccompounds (VOCs) can initiate a radical propagation cycle that leads to theproduction of ozone and secondary organic aerosols. Previous measurements ofthese radicals under low-NOx conditions in forested environmentscharacterized by emissions of biogenic VOCs, including isoprene andmonoterpenes, have shown discrepancies with modeled concentrations. During the summer of 2016, OH, HO2, and RO2 radical concentrationswere measured as part of the Program for Research on Oxidants:Photochemistry, Emissions, and Transport – Atmospheric Measurements ofOxidants in Summer (PROPHET-AMOS) campaign in a midlatitude deciduousbroadleaf forest. Measurements of OH and HO2 were made by laser-inducedfluorescence–fluorescence assay by gas expansion (LIF-FAGE) techniques,and total peroxy radical (XO2) mixing ratios were measured by the Ethane CHemical AMPlifier (ECHAMP) instrument. Supporting measurements ofphotolysis frequencies, VOCs, NOx, O3, and meteorological datawere used to constrain a zero-dimensional box model utilizing either theRegional Atmospheric Chemical Mechanism (RACM2) or the Master ChemicalMechanism (MCM). Model simulations tested the influence of HOxregeneration reactions within the isoprene oxidation scheme from the LeuvenIsoprene Mechanism (LIM1). On average, the LIM1 models overestimated daytimemaximum measurements by approximately 40 % for OH, 65 % for HO2,and more than a factor of 2 for XO2. Modeled XO2 mixing ratioswere also significantly higher than measured at night. Addition of RO2 + RO2 accretion reactions for terpene-derived RO2 radicals tothe model can partially explain the discrepancy between measurements andmodeled peroxy radical concentrations at night but cannot explain thedaytime discrepancies when OH reactivity is dominated by isoprene. Themodels also overestimated measured concentrations of isoprene-derivedhydroxyhydroperoxides (ISOPOOH) by a factor of 10 during the daytime,consistent with the model overestimation of peroxy radical concentrations.Constraining the model to the measured concentration of peroxy radicalsimproves the agreement with the measured ISOPOOH concentrations, suggestingthat the measured radical concentrations are more consistent with themeasured ISOPOOH concentrations. These results suggest that the models maybe missing an important daytime radical sink and could be overestimating therate of ozone and secondary product formation in this forest.
more » « less
Full Text Available
Turning manual concurrent memory reclamation into automatic reference counting

https://doi.org/10.1145/3519939.3523730

Anderson, Daniel; Blelloch, Guy E.; Wei, Yuanhao (June 2022, Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation)

Safe memory reclamation (SMR) schemes are an essential tool for lock-free data structures and concurrent programming. However, manual SMR schemes are notoriously difficult to apply correctly, and automatic schemes, such as reference counting, have been argued for over a decade to be too slow for practical purposes. A recent wave of work has disproved this long-held notion and shown that reference counting can be as scalable as hazard pointers, one of the most common manual techniques. Despite these tremendous improvements, there remains a gap of up to 2x or more in performance between these schemes and faster manual techniques such as epoch-based reclamation (EBR). In this work, we first advance these ideas and show that in many cases, automatic reference counting can in fact be as fast as the fastest manual SMR techniques.We generalize our previous algorithm called Concurrent Deferred Reference Counting (CDRC) to obtain a method for converting any standard manual SMR technique into an automatic reference counting technique with a similar performance profile. Our second contribution is extending this framework to support weak pointers, which are reference-counted pointers that automatically break pointer cycles by not contributing to the reference count, thus addressing a common weakness in reference-counted garbage collection. Our experiments with a C++-library implementation show that our automatic techniques perform in line with their manual counterparts, and that our weak pointer implementation outperforms the best known atomic weak pointer library by up to an order of magnitude on high thread counts. All together, we show that the ease of use of automatic memory management can be achieved without significant cost to practical performance or general applicability.
more » « less
Parallel block-delayed sequences

https://doi.org/10.1145/3503221.3508434

Westrick, Sam; Rainey, Mike; Anderson, Daniel; Blelloch, Guy E. (March 2022, ACM Symposium on Principles and Practice of Parallel Programming)
Jaejin Lee, Kunal Agrawal (Ed.)
Programming languages using functions on collections of values, such as map, reduce, scan and filter, have been used for over fifty years. Such collections have proven to be particularly useful in the context of parallelism because such functions are naturally parallel. However, if implemented naively they lead to the generation of temporary intermediate collections that can significantly increase memory usage and runtime. To avoid this pitfall, many approaches use "fusion" to combine operations and avoid temporary results. However, most of these approaches involve significant changes to a compiler and are limited to a small set of functions, such as maps and reduces. In this paper we present a library-based approach that fuses widely used operations such as scans, filters, and flattens. In conjunction with existing techniques, this covers most of the common operations on collections. Our approach is based on a novel technique which parallelizes over blocks, with streams within each block. We demonstrate the approach by implementing libraries targeting multicore parallelism in two languages: Parallel ML and C++, which have very different semantics and compilers. To help users understand when to use the approach, we define a cost semantics that indicates when fusion occurs and how it reduces memory allocations. We present experimental results for a dozen benchmarks that demonstrate significant reductions in both time and space. In most cases the approach generates code that is near optimal for the machines it is running on.
more » « less
Full Text Available
Poster: The Problem-Based Benchmark Suite (PBBS), V2

https://doi.org/10.1145/3503221.3508422

Anderson, Daniel; Blelloch, Guy E.; Dhulipala, Laxman; Dobson, Magdalen; Sun, Yihan (March 2022, Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming)

The Problem-Based Benchmark Suite (PBBS) is a set of benchmark problems designed for comparing algorithms, implementations and platforms. For each problem, the suite defines the problem in terms of the input-output relationship, and supplies a set of input instances along with input generators, a default implementation, code for checking correctness or accuracy, and a timing harness. The suite makes it possible to compare different algorithms, platforms (e.g. GPU vs CPU), and implementations using different programming languages or libraries. The purpose is to better understand how well a wide variety of problems parallelize, and what techniques/algorithms are most effective. The suite was first announced in 2012 with 14 benchmark problems. Here we describe some significant updates. In particular, we have added nine new benchmarks from a mix of problems in text processing, computational geometry and machine learning. We have further optimized the default implementations; several are the fastest available for multicore CPUs, often achieving near perfect speedup on the 72 core machine we test them on. The suite now also supplies significantly larger default test instances, as well as a broader variety, with many derived from real-world data.
more » « less
In vivo bone marrow microenvironment siRNA delivery using lipid–polymer nanoparticles for multiple myeloma therapy

https://doi.org/10.1073/pnas.2215711120

Guimarães, Pedro_P G; Figueroa-Espada, Christian G; Riley, Rachel S; Gong, Ningqiang; Xue, Lulu; Sewastianik, Tomasz; Dennis, Peter S; Loebel, Claudia; Chung, Amanda; Shepherd, Sarah J; et al (June 2023, Proceedings of the National Academy of Sciences)

Multiple myeloma (MM), a hematologic malignancy that preferentially colonizes the bone marrow, remains incurable with a survival rate of 3 to 6 mo for those with advanced disease despite great efforts to develop effective therapies. Thus, there is an urgent clinical need for innovative and more effective MM therapeutics. Insights suggest that endothelial cells within the bone marrow microenvironment play a critical role. Specifically, cyclophilin A (CyPA), a homing factor secreted by bone marrow endothelial cells (BMECs), is critical to MM homing, progression, survival, and chemotherapeutic resistance. Thus, inhibition of CyPA provides a potential strategy to simultaneously inhibit MM progression and sensitize MM to chemotherapeutics, improving therapeutic response. However, inhibiting factors from the bone marrow endothelium remains challenging due to delivery barriers. Here, we utilize both RNA interference (RNAi) and lipid–polymer nanoparticles to engineer a potential MM therapy, which targets CyPA within blood vessels of the bone marrow. We used combinatorial chemistry and high-throughput in vivo screening methods to engineer a nanoparticle platform for small interfering RNA (siRNA) delivery to bone marrow endothelium. We demonstrate that our strategy inhibits CyPA in BMECs, preventing MM cell extravasation in vitro. Finally, we show that siRNA-based silencing of CyPA in a murine xenograft model of MM, either alone or in combination with the Food and Drug Administration (FDA)-approved MM therapeutic bortezomib, reduces tumor burden and extends survival. This nanoparticle platform may provide a broadly enabling technology to deliver nucleic acid therapeutics to other malignancies that home to bone marrow.
more » « less
Full Text Available

« Prev Next »

Search for: All records