skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Static prediction of parallel computation graphs
Many algorithms for analyzing parallel programs, for example to detect deadlocks or data races or to calculate the execution cost, are based on a model variously known as a cost graph, computation graph or dependency graph, which captures the parallel structure of threads in a program. In modern parallel programs, computation graphs are highly dynamic and depend greatly on the program inputs and execution details. As such, most analyses that use these graphs are either dynamic analyses or are specialized static analyses that gather a subset of dependency information for a specific purpose. This paper introduces graph types, which compactly represent all of the graphs that could arise from program execution. Graph types are inferred from a parallel program using a graph type system and inference algorithm, which we present drawing on ideas from Hindley-Milner type inference, affine logic and region type systems. We have implemented the inference algorithm over a subset of OCaml, extended with parallelism primitives, and we demonstrate how graph types can be used to accelerate the development of new graph-based static analyses by presenting proof-of-concept analyses for deadlock detection and cost analysis.  more » « less
Award ID(s):
2107289 2007784
PAR ID:
10325787
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of the ACM on Programming Languages
Volume:
6
Issue:
POPL
ISSN:
2475-1421
Page Range / eLocation ID:
1 to 31
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Deadlocks, in which threads wait on each other in a cyclic fashion and can't make progress, have plagued parallel programs for decades. In recent years, as the parallel programming mechanism known as futures has gained popularity, interest in preventing deadlocks in programs with futures has increased as well. Various static and dynamic algorithms exist to detect and prevent deadlock in programs with futures, generally by constructing some approximation of the dependency graph of the program but, as far as we are aware, all are specialized to a particular programming language. A recent paper introduced graph types, by which one can statically approximate the dependency graphs of a program in a language-independent fashion. By analyzing the graph type directly instead of the source code, a graph-based program analysis, such as one to detect deadlock, can be made language-independent. Indeed, the paper that proposed graph types also proposed a deadlock detection algorithm. Unfortunately, the algorithm was based on an unproven conjecture which we show to be false. In this paper, we present, and prove sound, a type system for finding possible deadlocks in programs that operates over graph types and can therefore be applied to many different languages. As a proof of concept, we have implemented the algorithm over a subset of the OCaml language extended with built-in futures. 
    more » « less
  2. Parallel programs are frequently modeled asdependencyorcostgraphs, which can be used to detect various bugs, or simply to visualize the parallel structure of the code. However, such graphs reflect just one particular execution and are typically constructed in apost-hocmanner.Graph types, which were introduced recently to mitigate this problem, can be assigned statically to a program by a type system and compactly represent the family of all graphs that could result from the program. Unfortunately, prior work is restricted in its treatment offutures, an increasingly common and especially dynamic form of parallelism. In short, each instance of a future must be statically paired with a vertex name. Previously, this led to the restriction that futures could not be placed in collections or be used to construct data structures. Doing so is not a niche exercise: such structures form the basis of numerous algorithms that use forms of pipelining to achieve performance not attainable without futures. All but the most limited of these examples are out of reach of prior graph type systems. In this paper, we propose a graph type system that allows for almost arbitrary combinations of futures and recursive data types. We do so by indexing datatypes with a type-levelvertex structure, a codata structure that supplies unique vertex names to the futures in a data structure. We prove the soundness of the system in a parallel core calculus annotated with vertex structures and associated operations. Although the calculus is annotated, this is merely for convenience in defining the type system. We prove that it is possible to annotate arbitrary recursive types with vertex structures, and show using a prototype inference engine that these annotations can be inferred from OCaml-like source code for several complex parallel algorithms. 
    more » « less
  3. Call graph or caller-callee relationships have been used for various kinds of static program analysis, performance analysis and profiling, and for program safety or security analysis such as detecting anomalies of program execution or code injection attacks. However, different tools generate call graphs in different formats, which prevents efficient reuse of call graph results. In this paper, we present an approach of using ontology and resource description framework (RDF) to create knowledge graphs for specifying call graphs to facilitate the construction of full-fledged and complex call graphs of computer programs, realizing more interoperable and scalable program analyses than conventional approaches. We create a formal ontology-based specification of call graph information to capture concepts and properties of both static and dynamic call graphs so different tools can collaboratively contribute to more comprehensive analysis results. Our experiments show that ontology enables merging of call graphs generated from different tools and flexible queries using a standard query interface. 
    more » « less
  4. Probabilistic programming languages (PPLs) provide language support for expressing flexible probabilistic models and solving Bayesian inference problems. PPLs withprogrammable inferencemake it possible for users to obtain improved results by customizing inference engines usingguideprograms that are tailored to a correspondingmodelprogram. However, errors in guide programs can compromise the statistical soundness of the inference. This article introduces a novel coroutine-based framework for verifying the correctness of user-written guide programs for a broad class of Markov chain Monte Carlo (MCMC) inference algorithms. Our approach rests on a novel type system for describing communication protocols between a model program and a sequence of guides that each update only a subset of random variables. We prove that, by translating guide types to context-free processes with finite norms, it is possible to check structural type equality between models and guides in polynomial time. This connection gives rise to an efficienttype-inference algorithmfor probabilistic programs with flexible constructs such as general recursion and branching. We also contribute acoverage-checking algorithmthat verifies the support of sequentially composed guide programs agrees with that of the model program, which is a key soundness condition for MCMC inference with multiple guides. Evaluations on diverse benchmarks show that our type-inference and coverage-checking algorithms efficiently infer types and detect sound and unsound guides for programs that existing static analyses cannot handle. 
    more » « less
  5. Most existing pre-trained language models for source code focus on learning the static code text, typically augmented with static code structures (abstract syntax tree, dependency graphs, etc.). However, program semantics will not be fully exposed before the real execution. Without an understanding of the program execution, statically pre-trained models fail to comprehensively capture the dynamic code properties, such as the branch coverage and the runtime variable values, and they are consequently less effective at code understanding tasks, such as retrieving semantic clones and detecting software vulnerabilities. To close the gap between the static nature of language models and the dynamic characteristics of programs, we introduce TRACED, an execution-aware pre-training strategy for source code. Specifically, we pre-train code language models with a combination of source code, executable inputs, and corresponding execution traces. Our goal is to teach code models the complicated execution logic during the pre-training, enabling the model to statically estimate the dynamic code properties without repeatedly executing code during task-specific fine-tuning. To illustrate the effectiveness of our proposed approach, we fine-tune and evaluate TRACED on three downstream tasks: static execution estimation, clone retrieval, and vulnerability detection. The empirical results show that TRACED relatively improves the statically pre-trained code models by 12.4% for complete execution path prediction and by 25.2% for runtime variable value predictions. TRACED also significantly outperforms statically pre-trained models in clone retrieval and vulnerability detection across four public benchmarks. 
    more » « less