NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MSstatsPTM: Statistical Relative Quantification of Posttranslational Modifications in Bottom-Up Mass Spectrometry-Based Proteomics

https://doi.org/10.1016/j.mcpro.2022.100477

Kohler, Devon; Tsai, Tsung-Heng; Verschueren, Erik; Huang, Ting; Hinkle, Trent; Phu, Lilian; Choi, Meena; Vitek, Olga (January 2023, Molecular & Cellular Proteomics)

Full Text Available
Do-calculus enables estimation of causal effects in partially observed biomolecular pathways

https://doi.org/10.1093/bioinformatics/btac251

Mohammad-Taheri, Sara; Zucker, Jeremy; Hoyt, Charles Tapley; Sachs, Karen; Tewari, Vartika; Ness, Robert; Vitek, Olga (June 2022, Bioinformatics)

Abstract Motivation Estimating causal queries, such as changes in protein abundance in response to a perturbation, is a fundamental task in the analysis of biomolecular pathways. The estimation requires experimental measurements on the pathway components. However, in practice many pathway components are left unobserved (latent) because they are either unknown, or difficult to measure. Latent variable models (LVMs) are well-suited for such estimation. Unfortunately, LVM-based estimation of causal queries can be inaccurate when parameters of the latent variables are not uniquely identified, or when the number of latent variables is misspecified. This has limited the use of LVMs for causal inference in biomolecular pathways. Results In this article, we propose a general and practical approach for LVM-based estimation of causal queries. We prove that, despite the challenges above, LVM-based estimators of causal queries are accurate if the queries are identifiable according to Pearl’s do-calculus and describe an algorithm for its estimation. We illustrate the breadth and the practical utility of this approach for estimating causal queries in four synthetic and two experimental case studies, where structures of biomolecular pathways challenge the existing methods for causal query estimation. Availability and implementation The code and the data documenting all the case studies are available at https://github.com/srtaheri/LVMwithDoCalculus. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Deoptless: speculation with dispatched on-stack replacement and specialized continuations

https://doi.org/10.1145/3519939.3523729

Flückiger, Olivier; Ječmen, Jan; Krynski, Sebastián; Vitek, Jan (June 2022, ACM SIGPLAN International Conference on Programming Language Design and Implementation)

Just-in-time compilation provides significant performance improvements for programs written in dynamic languages. These benefits come from the ability of the compiler to spec- ulate about likely cases and generate optimized code for these. Unavoidably, speculations sometimes fail and the opti- mizations must be reverted. In some pathological cases, this can leave the program stuck with suboptimal code. In this paper we propose deoptless, a technique that replaces deopti- mization points with dispatched specialized continuations. The goal of deoptless is to take a step towards providing users with a more transparent performance model in which mysterious slowdowns are less frequent and grave.
more » « less
Full Text Available
First-class environments in R

https://doi.org/10.1145/3486602.3486768

Goel, Aviral; Vitek, Jan (October 2021, ACM SIGPLAN International Symposium on Dynamic Languages)

The R programming language is widely used for statistical computing. To enable interactive data exploration and rapid prototyping, R encourages a dynamic programming style. This programming style is supported by features such as first-class environments. Amongst widely used languages, R has the richest interface for programmatically manipulating environments. With the flexibility afforded by reflective operations on first-class environments, come significant challenges for reasoning and optimizing user-defined code. This paper documents the reflective interface used to operate over first-class environment. We explain the rationale behind its design and conduct a large-scale study of how the interface is used in popular libraries.
more » « less
Full Text Available
Type stability in Julia: avoiding performance pathologies in JIT compilation

https://doi.org/10.1145/3485527

Pelenitsyn, Artem; Belyakova, Julia; Chung, Benjamin; Tate, Ross; Vitek, Jan (October 2021, Proceedings of the ACM on Programming Languages)

As a scientific programming language, Julia strives for performance but also provides high-level productivity features. To avoid performance pathologies, Julia users are expected to adhere to a coding discipline that enables so-called type stability. Informally, a function is type stable if the type of the output depends only on the types of the inputs, not their values. This paper provides a formal definition of type stability as well as a stronger property of type groundedness, shows that groundedness enables compiler optimizations, and proves the compiler correct. We also perform a corpus analysis to uncover how these type-related properties manifest in practice.
more » « less
Promises are made to be broken: migrating R to strict semantics

https://doi.org/10.1145/3485478

Goel, Aviral; Ječmen, Jan; Krynski, Sebastián; Flückiger, Olivier; Vitek, Jan (October 2021, Proceedings of the ACM on Programming Languages)

Function calls in the R language do not evaluate their arguments, these are passed to the callee as suspended computations and evaluated if needed. After 25 years of experience with the language, there are very few cases where programmers leverage delayed evaluation intentionally and laziness comes at a price in performance and complexity. This paper explores how to evolve the semantics of a lazy language towards strictness-by-default and laziness-on-demand. To provide a migration path, it is necessary to provide tooling for developers to migrate libraries without introducing errors. This paper reports on a dynamic analysis that infers strictness signatures for functions to capture both intentional and accidental laziness. Over 99% of the inferred signatures were correct when tested against clients of the libraries.
more » « less
What we eval in the shadows: a large-scale study of eval in R programs

https://doi.org/10.1145/3485502

Goel, Aviral; Donat-Bouillud, Pierre; Křikava, Filip; Kirsch, Christoph_M; Vitek, Jan (October 2021, Proceedings of the ACM on Programming Languages)

Most dynamic languages allow users to turn text into code using various functions, often named eval, with language-dependent semantics. The widespread use of these reflective functions hinders static analysis and prevents compilers from performing optimizations. This paper aims to provide a better sense of why programmers use eval. Understanding why eval is used in practice is key to finding ways to mitigate its negative impact. We have reasons to believe that reflective feature usage is language and application domain-specific; we focus on data science code written in R and compare our results to previous work that analyzed web programming in JavaScript. We analyze 49,296,059 calls to eval from 240,327 scripts extracted from 15,401 R packages. We find that eval is indeed in widespread use; R’s eval is more pervasive and arguably dangerous than what was previously reported for JavaScript.
more » « less
Sampling optimized code for type feedback

https://doi.org/10.1145/3426422.3426984

Flückiger, Olivier; Wälchli, Andreas; Krynski, Sebastián; Vitek, Jan (November 2020, Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages)

To efficiently execute dynamically typed languages, many language implementations have adopted a two-tier architecture. The first tier aims for low-latency startup times and collects dynamic profiles, such as the dynamic types of variables. The second tier provides high-throughput using an optimizing compiler that specializes code to the recorded type information. If the program behavior changes to the point that not previously seen types occur in specialized code, that specialized code becomes invalid, it is deoptimized, and control is transferred back to the first tier execution engine which will start specializing anew. However, if the program behavior becomes more specific, for instance, if a polymorphic variable becomes monomorphic, nothing changes. Once the program is running optimized code, there are no means to notice that an opportunity for optimization has been missed. We propose to employ a sampling-based profiler to monitor native code without any instrumentation. The absence of instrumentation means that when the profiler is not active, no overhead is incurred. We present an implementation is in the context of the Ř just-in-time, optimizing compiler for the R language. Based on the sampled profiles, we are able to detect when the native code produced by Ř is specialized for stale type feedback and recompile it to more type-specific code. We show that sampling adds an overhead of less than 3% in most cases and up to 9% in few cases and that it reliably detects stale type feedback within milliseconds.
more » « less
Full Text Available
Designing types for R, empirically

https://doi.org/10.1145/3428249

Turcotte, Alexi; Goel, Aviral; Křikava, Filip; Vitek, Jan (November 2020, Proceedings of the ACM on Programming Languages)
null (Ed.)
Full Text Available
World age in Julia: optimizing method dispatch in the presence of eval

https://doi.org/10.1145/3428275

Belyakova, Julia; Chung, Benjamin; Gelinas, Jack; Nash, Jameson; Tate, Ross; Vitek, Jan (November 2020, Proceedings of the ACM on Programming Languages)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records