NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Decidable Subtyping of Existential Types for Julia

https://doi.org/10.1145/3656421

Belyakova, Julia; Chung, Benjamin; Tate, Ross; Vitek, Jan (June 2024, Proceedings of the ACM on Programming Languages)

Julia is a modern scientific-computing language that relies on multiple dispatch to implement generic libraries. While the language does not have a static type system, method declarations are decorated with expressive type annotations to determine when they are applicable. To find applicable methods, the implementation uses subtyping at run-time. We show that Julia’s subtyping is undecidable, and we propose a restriction on types to recover decidability by stratifying types into method signatures over value types—where the former can freely use bounded existential types but the latter are restricted to use-site variance. A corpus analysis suggests that nearly all Julia programs written in practice already conform to this restriction.
more » « less
The Fault in Our Stars: Designing Reproducible Large-scale Code Analysis Experiments

https://doi.org/10.4230/LIPIcs.ECOOP.2024.27

Maj, Petr; Muroya, Stefanie; Siek, Konrad; Di_Grazia, Luca; Vitek, Jan (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Aldrich, Jonathan; Salvaneschi, Guido (Ed.)
Large-scale software repositories are a source of insights for software engineering. They offer an unmatched window into the software development process at scale. Their sheer number and size holds the promise of broadly applicable results. At the same time, that very size presents practical challenges for scaling tools and algorithms to millions of projects. A reasonable approach is to limit studies to representative samples of the population of interest. Broadly applicable conclusions can then be obtained by generalizing to the entire population. The contribution of this paper is a standardized experimental design methodology for choosing the inputs of studies working with large-scale repositories. We advocate for a methodology that clearly lays out what the population of interest is, how to sample it, and that fosters reproducibility. Along the way, we discourage researchers from using extrinsic attributes of projects such as stars, that measure some unclear notion of popularity.
more » « less
Full Text Available
Reusing Just-in-Time Compiled Code

https://doi.org/10.1145/3622839

Mehta, Meetesh Kalpesh; Krynski, Sebastián; Gualandi, Hugo Musso; Thakur, Manas; Vitek, Jan (October 2023, Proceedings of the ACM on Programming Languages)

Most code is executed more than once. If not entire programs then libraries remain unchanged from one run to the next. Just-in-time compilers expend considerable effort gathering insights about code they compiled many times, and often end up generating the same binary over and over again. We explore how to reuse compiled code across runs of different programs to reduce warm-up costs of dynamic languages. We propose to usespeculative contextual dispatchto select versions of functions from anoff-line curated code repository. That repository is a persistent database of previously compiled functions indexed by the context under which they were compiled. The repository is curated to remove redundant code and to optimize dispatch. We assess practicality by extending Ř, a compiler for the R language, and evaluating its performance. Our results suggest that the approach improves warmup times while preserving peak performance.
more » « less
Full Text Available
signatr: A Data-Driven Fuzzing Tool for R

https://doi.org/10.1145/3567512.3567530

Turcotte, Alexi; Donat-Bouillud, Pierre; Křikava, Filip; Vitek, Jan (November 2022, SLE)

The fast-and-loose, permissive semantics of dynamic programming languages limit the power of static analyses. For that reason, soundness is often traded for precision through dynamic program analysis. Dynamic analysis is only as good as the available runnable code, and relying solely on test suites is fraught as they do not cover the full gamut of possible behaviors. Fuzzing is an approach for automatically exercising code, and could be used to obtain more runnable code. However, the shape of user-defined data in dynamic languages is difficult to intuit, limiting a fuzzer's reach. We propose a feedback-driven blackbox fuzzing approach which draws inputs from a database of values recorded from existing code. We implement this approach in a tool called signatr for the R language. We present the insights of its design and implementation, and assess signatr's ability to uncover new behaviors by fuzzing 4,829 R functions from 100 R packages, revealing 1,195,184 new signatures.
more » « less
Full Text Available
First-class environments in R

https://doi.org/10.1145/3486602.3486768

Goel, Aviral; Vitek, Jan (October 2021, ACM SIGPLAN International Symposium on Dynamic Languages)

The R programming language is widely used for statistical computing. To enable interactive data exploration and rapid prototyping, R encourages a dynamic programming style. This programming style is supported by features such as first-class environments. Amongst widely used languages, R has the richest interface for programmatically manipulating environments. With the flexibility afforded by reflective operations on first-class environments, come significant challenges for reasoning and optimizing user-defined code. This paper documents the reflective interface used to operate over first-class environment. We explain the rationale behind its design and conduct a large-scale study of how the interface is used in popular libraries.
more » « less
Full Text Available
Promises are made to be broken: migrating R to strict semantics

https://doi.org/10.1145/3485478

Goel, Aviral; Ječmen, Jan; Krynski, Sebastián; Flückiger, Olivier; Vitek, Jan (October 2021, Proceedings of the ACM on Programming Languages)

Function calls in the R language do not evaluate their arguments, these are passed to the callee as suspended computations and evaluated if needed. After 25 years of experience with the language, there are very few cases where programmers leverage delayed evaluation intentionally and laziness comes at a price in performance and complexity. This paper explores how to evolve the semantics of a lazy language towards strictness-by-default and laziness-on-demand. To provide a migration path, it is necessary to provide tooling for developers to migrate libraries without introducing errors. This paper reports on a dynamic analysis that infers strictness signatures for functions to capture both intentional and accidental laziness. Over 99% of the inferred signatures were correct when tested against clients of the libraries.
more » « less
What we eval in the shadows: a large-scale study of eval in R programs

https://doi.org/10.1145/3485502

Goel, Aviral; Donat-Bouillud, Pierre; Křikava, Filip; Kirsch, Christoph_M; Vitek, Jan (October 2021, Proceedings of the ACM on Programming Languages)

Most dynamic languages allow users to turn text into code using various functions, often named eval, with language-dependent semantics. The widespread use of these reflective functions hinders static analysis and prevents compilers from performing optimizations. This paper aims to provide a better sense of why programmers use eval. Understanding why eval is used in practice is key to finding ways to mitigate its negative impact. We have reasons to believe that reflective feature usage is language and application domain-specific; we focus on data science code written in R and compare our results to previous work that analyzed web programming in JavaScript. We analyze 49,296,059 calls to eval from 240,327 scripts extracted from 15,401 R packages. We find that eval is indeed in widespread use; R’s eval is more pervasive and arguably dangerous than what was previously reported for JavaScript.
more » « less
Formally verified speculation and deoptimization in a JIT compiler

https://doi.org/10.1145/3434327

Barrière, Aurèle; Blazy, Sandrine; Flückiger, Olivier; Pichardie, David; Vitek, Jan (January 2021, Proceedings of the ACM on Programming Languages)

Just-in-time compilers for dynamic languages routinely generate code under assumptions that may be invalidated at run-time, this allows for specialization of program code to the common case in order to avoid unnecessary overheads due to uncommon cases. This form of software speculation requires support for deoptimization when some of the assumptions fail to hold. This paper presents a model just-in-time compiler with an intermediate representation that explicits the synchronization points used for deoptimization and the assumptions made by the compiler's speculation. We also present several common compiler optimizations that can leverage speculation to generate improved code. The optimizations are proved correct with the help of a proof assistant. While our work stops short of proving native code generation, we demonstrate how one could use the verified optimization to obtain significant speed ups in an end-to-end setting.
more » « less
Designing types for R, empirically

https://doi.org/10.1145/3428249

Turcotte, Alexi; Goel, Aviral; Křikava, Filip; Vitek, Jan (November 2020, Proceedings of the ACM on Programming Languages)
null (Ed.)
Full Text Available
World age in Julia: optimizing method dispatch in the presence of eval

https://doi.org/10.1145/3428275

Belyakova, Julia; Chung, Benjamin; Gelinas, Jack; Nash, Jameson; Tate, Ross; Vitek, Jan (November 2020, Proceedings of the ACM on Programming Languages)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records