skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 30, 2025

Title: Orchard: Heterogeneous Parallelism and Fine-grained Fusion for Complex Tree Traversals
Many applications are designed to perform traversals ontree-likedata structures. Fusing and parallelizing these traversals enhance the performance of applications. Fusing multiple traversals improves the locality of the application. The runtime of an application can be significantly reduced by extracting parallelism and utilizing multi-threading. Prior frameworks have tried to fuse and parallelize tree traversals using coarse-grained approaches, leading to missed fine-grained opportunities for improving performance. Other frameworks have successfully supported fine-grained fusion on heterogeneous tree types but fall short regarding parallelization. We introduce a new frameworkOrchardbuilt on top ofGrafter.Orchard’s novelty lies in allowing the programmer to transform tree traversal applications by automatically applyingfine-grainedfusion and extractingheterogeneousparallelism.Orchardallows the programmer to write general tree traversal applications in a simple and elegant embedded Domain-Specific Language (eDSL). We show that the combination of fine-grained fusion and heterogeneous parallelism performs better than each alone when the conditions are met.  more » « less
Award ID(s):
2216978 1908504 1919197
PAR ID:
10577819
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
ACM Transactions on Architecture and Code Optimization
Volume:
21
Issue:
2
ISSN:
1544-3566
Page Range / eLocation ID:
1 to 25
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Traversals are commonly seen in tree data structures, and performance-enhancing transformations between tree traversals are critical for many applications. Existing approaches to reasoning about tree traversals and their transformations are ad hoc, with various limitations on the classes of traversals they can handle, the granularity of dependence analysis, and the types of possible transformations. We propose Retreet, a framework in which one can describe general recursive tree traversals, precisely represent iterations, schedules and dependences, and automatically check data-race-freeness and transformation correctness. The crux of the framework is a stack-based representation for iterations and an encoding to Monadic Second-Order (MSO) logic over trees. Experiments show that Retreet can automatically verify optimizations for complex traversals on real-world data structures, such as CSS and cycletrees, which are not possible before. Our framework is also integrated with other MSO-based analysis techniques to verify even more challenging program transformations. 
    more » « less
  2. We present DRYADdec, a decidable logic that allows reasoning about tree data-structures with measurements. This logic supports user-defined recursive measure functions based on Max or Sum, and recursive predicates based on these measure functions, such as AVL trees or red-black trees. We prove that the logic’s satisfiability is decidable. The crux of the decidability proof is a small model property which allows us to reduce the satisfiability of DRYADdec to quantifier-free linear arithmetic theory which can be solved efficiently using SMT solvers. We also show that DRYADdec can encode a variety of verification and synthesis problems, including natural proof verification conditions for functional correctness of recursive tree-manipulating programs, legality conditions for fusing tree traversals, synthesis conditions for conditional linear-integer arithmetic functions. We developed the decision procedure and successfully solved 220+ DRYADdec formulae raised from these application scenarios, including verifying functional correctness of programs manipulating AVL trees, red-black trees and treaps, checking the fusibility of height-based mutually recursive tree traversals, and counterexample-guided synthesis from linear integer arithmetic specifications. To our knowledge, DRYADdec is the first decidable logic that can solve such a wide variety of problems requiring flexible combination of measure-related, data-related and shape-related properties for trees. 
    more » « less
  3. Recent advancements in 3D-stacked DRAM such as hybrid memory cube (HMC) and high-bandwidth memory (HBM) promise higher bandwidth and lower power consumption compared to traditional DDR-based DRAM. However, taking advantage of this additional bandwidth for improving the performance of real-world applications requires carefully laying out the data in memory which incurs significant programmer effort. To alleviate this programmer burden, we investigate application-specific address mapping to improve performance while minimizing manual effort. Our approach is guided by the following insights: (i) toggling activity of address bits can help determine strategies to improve parallelism within memory but this metric underestimates conflicts and (ii) modern memory controllers reorder address requests and therefore any toggling activity measured from an address trace is non-deterministic. Furthermore, our position is that analyzing individual address bits results in poor estimates for actual conflicts and exploited parallelism and that entropy needs to be calculated for groups of address bits. Therefore, we calculate window-based probabilistic entropy for groups of address bits to determine a near-optimal address mapping. We present simulation results for ten applications that show a performance improvement up to 25% over fixed address-mapping and up to 8% over previous application-specific address mapping for our proposed approach. 
    more » « less
  4. Code summarization is the task of creating short, natural language descriptions of source code. It is an important part of code comprehension and a powerful method of documentation. Previous work has made progress in identifying where programmers focus in code as they write their own summaries (i.e., Writing). However, there is currently a gap in studying programmers’ attention as they read code with pre-written summaries (i.e., Reading). As a result, it is currently unknown how these two forms of code comprehension compare: Reading and Writing. Also, there is a limited understanding of programmer attention with respect to program semantics. We address these shortcomings with a human eye-tracking study (n= 27) comparing Reading and Writing. We examined programmers’ attention with respect to fine-grained program semantics, including their attention sequences (i.e., scan paths). We find distinctions in programmer attention across the comprehension tasks, similarities in reading patterns between them, and differences mediated by demographic factors. This can help guide code comprehension in both computer science education and automated code summarization. Furthermore, we mapped programmers’ gaze data onto the Abstract Syntax Tree to explore another representation of human attention. We find that visual behavior on this structure is not always consistent with that on source code. 
    more » « less
  5. OpenVX is a recently ratified standard that was expressly proposed to facilitate the design of computer-vision (CV) applications used in real-time embedded systems. Despite its real-time focus, OpenVX presents several challenges when validating real-time constraints. Many of these challenges are rooted in the fact that OpenVX only implicitly defines any notion of a schedulable entity. Under OpenVX, CV applications are specified in the form of processing graphs that are inherently considered to execute monolithically end-to-end. This monolithic execution hinders parallelism and can lead to significant processing-capacity loss. Prior work partially addressed this problem by treating graph nodes as schedulable entities, but under OpenVX, these nodes represent rather coarse-grained CV functions, so the available parallelism that can be obtained in this way is quite limited. In this paper, a much more fine-grained approach for scheduling OpenVX graphs is proposed. This approach was designed to enable additional parallelism and to eliminate schedulability-related processing-capacity loss that arises when programs execute on both CPUs and graphics processing units (GPUs). Response-time analysis for this new approach is presented and its efficacy is evaluated via a case study involving an actual CV application. 
    more » « less