Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control

Huang, Yicong; Wang, Zuozhi; Li, Chen

doi:10.1145/3626712

Citation Details

Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control

Many big data systems are written in languages such as C, C++, Java, and Scala to process large amounts of data efficiently, while data analysts often use Python to conduct data wrangling, statistical analysis, and machine learning. User-defined functions (UDFs) are commonly used in these systems to bridge the gap between the two ecosystems. In this paper, we propose Udon, a novel debugger to support fine-grained debugging of UDFs. Udon encapsulates the modern line-by-line debugging primitives, such as the ability to set breakpoints, perform code inspections, and make code modifications while executing a UDF on a single tuple. It includes a novel debug-aware UDF execution model to ensure the responsiveness of the operator during debugging. It utilizes advanced state-transfer techniques to satisfy breakpoint conditions that span across multiple UDFs. It incorporates various optimization techniques to reduce the runtime overhead. We conduct experiments with multiple UDF workloads on various datasets and show its high efficiency and scalability. more »

Award ID(s):: 2200274

PAR ID:: 10541803

Author(s) / Creator(s):: Huang, Yicong; Wang, Zuozhi; Li, Chen

Publisher / Repository:: ACM Digital Library

Date Published:: 2023-12-08

Journal Name:: Proceedings of the ACM on Management of Data

Volume:: 1

Issue:: 4

ISSN:: 2836-6573

Page Range / eLocation ID:: 1 to 26

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1145/3626712

More Like this