NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Learning Programs to Graph Execution

Khatchadourian, Raffi; Vélez, Tatiana Castro; Bagherzadeh, Mehdi; Jia, Nan; Raja, Anita (May 2025, Forschungsberichte aus dem Institut für Sozialwissenschaftliche Forschung eV ISF München)
Boronat, Artur; Fraser, Gordon (Ed.)
Abstract Efficiency is essential to support responsiveness w.r.t. ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraceddeferredexecution-style DL code—supporting symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, imperative DL frameworks encouragingeagerexecution have emerged but at the expense of run-time performance. Though hybrid approaches aim for the “best of both worlds,” using them effectively requires subtle considerations to make code amenable to safe, accurate, and efficient graph execution—avoiding performance bottlenecks and semantically inequivalent results. We discuss the engineering aspects of a refactoring tool that automatically determines when it is safe and potentially advantageous to migrate imperative DL code to graph execution and vice-versa.
more » « less
Free, publicly-accessible full text available May 1, 2026
Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph Execution

https://doi.org/10.5281/zenodo.13748907

Khatchadourian, Raffi; Castro-Vélez, Tatiana; Bagherzadeh, Mehdi; Jia, Nan; Raja, Anita (January 2025, Zenodo)

Efficiency is essential to support ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code—supporting symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, imperative DL frameworks encouraging eager execution have emerged but at the expense of run-time performance. Though hybrid approaches aim for the "best of both worlds," using them effectively requires subtle considerations. Our key insight is that, while DL programs typically execute sequentially, hybridizing imperative DL code resembles parallelizing sequential code in traditional systems. Inspired by this, we present an automated refactoring approach that assists developers in determining which otherwise eagerly-executed imperative DL functions could be effectively and efficiently executed as graphs. The approach features novel static imperative tensor and side-effect analyses for Python. Due to its inherent dynamism, analyzing Python may be unsound; however, the conservative approach leverages a speculative (keyword-based) analysis for resolving difficult cases that informs developers of any assumptions made. The approach is: (i) implemented as a plug-in to the PyDev Eclipse IDE that integrates the WALA Ariadne analysis framework and (ii) evaluated on nineteen DL projects consisting of 132 KLOC. The results show that 326 of 766 candidate functions (42.56%) were refactorable, and an average relative speedup of 2.16 on performance tests was observed with negligible differences in model accuracy. The results indicate that the approach is useful in optimizing imperative DL code to its full potential.
more » « less
ponder-lab/Hybridize-Functions-Refactoring: v1.4.0

https://doi.org/10.5281/zenodo.15045769

Khatchadourian, Raffi; Castro_Velez, Tatiana (January 2025, Zenodo)

Full Changelog: https://github.com/ponder-lab/Hybridize-Functions-Refactoring/compare/v1.3.0...v1.4.0
more » « less
ponder-lab/Common-Eclipse-Refactoring-Framework: v5.1.0

https://doi.org/10.5281/zenodo.13873498

Khatchadourian, Raffi; WLiD; Oren (January 2025, Zenodo)

Full Changelog: https://github.com/ponder-lab/Common-Eclipse-Refactoring-Framework/compare/v5.0.0...v5.1.0
more » « less
ReLESS: A framework for assessing safety in Deep Learning systems

Jia, Nan; Raja, Anita; Khatchadourian, Raffi (August 2024, CEUR-WS)

Traditionally, software refactoring helps to improve a system's internal structure and enhance its non-functional features, such as reliability and run-time performance, while preserving external behavior including original program semantics. However, in the context of learning-enabled software systems (LESS), e.g., Machine Learning (ML) systems, it is unclear which portions of a software's semantics require preservation at the development phase. This is mainly because (a) the behavior of the LESS is not defined until run-time; and (b) the inherently iterative and non-deterministic nature of ML algorithms. Consequently, there is a knowledge gap in what refactoring truly means in the context of LESS as such systems have no guarantee of a predetermined correct answer. We thus conjecture that to construct robust and safe LESS, it is imperative to understand the flexibility of refactoring LESS compared to traditional software and to measure it. In this paper, we introduce a novel conceptual framework named ReLESS for evaluating refactorings for supervised learning by (i) exploring the transformation methodologies taken by state-of-the-art LESS refactorings that focus on singular metrics, (ii) reviewing informal notions of semantics preservation and the level at which they occur (source code vs. trained model), and (iii) empirically comparing and contrasting existing LESS refactorings in the context of image classification problems. This framework will set the foundation to not only formalize a standard definition of semantics preservation in LESS but also combine four metrics: accuracy, run-time performance, robustness, and interpretability as a multi-objective optimization function, instead of a single-objective function used in existing works, to assess LESS refactorings. In the future, our work could seek reliable LESS refactorings that generalize over diverse systems.
more » « less
Full Text Available
ponder-lab/ML: 0.34.0

https://doi.org/10.5281/zenodo.7992107

Khatchadourian, Raffi; Dolby, Julian; Vélez, Tatiana Castro; shinnar; Lagouvardos, Sifis; Sridharan, Manu (January 2024, Zenodo)

Full Changelog: https://github.com/ponder-lab/ML/compare/0.33.0...0.34.0
more » « less
𝜇Akka: Mutation Testing for Actor Concurrency in Akka using Real-World Bugs

https://doi.org/10.1145/3611643.3616362

Moradi_Moghadam, Mohsen; Bagherzadeh, Mehdi; Khatchadourian, Raffi; Bagheri, Hamid (November 2023, Proceedings of the ACM SIGSOFT International Symposium on the Foundations of Software Engineering)

Actor concurrency is becoming increasingly important in the real world and mission-critical software. This requires these applications to be free from actor bugs, that occur in the real world, and have tests that are effective in finding these bugs. Mutation testing is a well-established technique that transforms an application to induce its likely bugs and evaluate the effectiveness of its tests in finding these bugs. Mutation testing is available for a broad spectrum of applications and their bugs, ranging from web to mobile to machine learning, and is used at scale in companies like Google and Facebook. However, there still is no mutation testing for actor concurrency that uses real-world actor bugs. In this paper, we propose 𝜇Akka, a framework for mutation testing of Akka actor concurrency using real actor bugs. Akka is a popular industrial-strength implementation of actor concurrency. To design, implement, and evaluate 𝜇Akka, we take the following major steps: (1) manually analyze a recent set of 186 real Akka bugs from Stack Overflow and GitHub to understand their causes; (2) design a set of 32 mutation operators, with 138 source code changes in Akka API, to emulate these causes and induce their bugs; (3) implement these operators in an Eclipse plugin for Java Akka; (4) use the plugin to generate 11.7k mutants of 10 real GitHub applications, with 446.4k lines of code and 7.9k tests; (5) run these tests on these mutants to measure the quality of mutants and effectiveness of tests; (6) use PIT to generate 26.2k mutants to compare 𝜇Akka and PIT mutant quality and test effectiveness. PIT is a popular mutation testing tool with traditional operators; (7) manually analyze the bug coverage and overlap of 𝜇Akka, PIT, and actor operators in a previous work; and (8) discuss a few implications of our findings. Among others, we find that 𝜇Akka mutants are higher quality, cover more bugs, and tests are less effective in detecting them.
more » « less
Full Text Available
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Graph Execution

https://doi.org/10.1109/ASE56229.2023.00187

Khatchadourian, Raffi; Castro-Vélez, Tatiana; Bagherzadeh, Mehdi; Jia, Nan; Raja, Anita (September 2023, IEEE/ACM International Conference on Automated Software Engineering)

Efficiency is essential to support responsiveness w.r.t. ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code-supporting symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, imperative DL frameworks encouraging eager execution have emerged at the expense of run-time performance. Though hybrid approaches aim for the “best of both worlds,” using them effectively requires subtle considerations to make code amenable to safe, accurate, and efficient graph execution. We present our ongoing work on automated refactoring that assists developers in specifying whether and how their otherwise eagerly-executed imperative DL code could be reliably and efficiently executed as graphs while preserving semantics. The approach, based on a novel imperative tensor analysis, will automatically determine when it is safe and potentially advantageous to migrate imperative DL code to graph execution and modify decorator parameters or eagerly executing code already running as graphs. The approach is being implemented as a PyDev Eclipse IDE plug-in and uses the WALA Ariadne analysis framework. We discuss our ongoing work towards optimizing imperative DL code to its full potential.
more » « less
Full Text Available
How many mutex bugs can a simple analysis find in Go programs?

Fumi Takeuchi. Hidehiko Masuhara. Raffi Khatchadourian, Youyou Cong (September 2022, Annual Conference of the Japanese Society for Software Science and Technology)

In open source software, it is known that there are many concurrency bugs. A previous study in Go revealed that a considerable number of such bugs are simple (for example, 9\% of the bugs are the ones that forget to unlock a mutex,) through a manual program investigation. This paper tries to detect such bugs by applying a simple analysis in order to see how far such a tool can match the manual analysis. We built a simple intraprocedural control flow analysis in Go, and evaluated its performance with respect to the open source programs with concurrency bugs reported in the previous study. Consequently, as for quality, the recall is good at 88\% and the precision is poor at 60\%, and as for analysis time, it can be finished within practical amount of time (for example, 1 second per 5000 LoC).
more » « less
Full Text Available
Challenges in migrating imperative deep learning programs to graph execution: an empirical study

https://doi.org/10.1145/3524842.3528455

Castro-Vélez, Tatiana; Khatchadourian, Raffi; Bagherzadeh, Mehdi; Raja, Anita (May 2022, International Conference on Mining Software Repositories)

Efficiency is essential to support responsiveness w.r.t. ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code that supports symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development tends to produce DL code that is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, less error-prone imperative DL frameworks encouraging eager execution have emerged but at the expense of run-time performance. While hybrid approaches aim for the "best of both worlds," the challenges in applying them in the real world are largely unknown. We conduct a data-driven analysis of challenges---and resultant bugs---involved in writing reliable yet performant imperative DL code by studying 250 open-source projects, consisting of 19.7 MLOC, along with 470 and 446 manually examined code patches and bug reports, respectively. The results indicate that hybridization: (i) is prone to API misuse, (ii) can result in performance degradation---the opposite of its intention, and (iii) has limited application due to execution mode incompatibility. We put forth several recommendations, best practices, and anti-patterns for effectively hybridizing imperative DL code, potentially benefiting DL practitioners, API designers, tool developers, and educators.
more » « less
Full Text Available

« Prev Next »

Search for: All records