BigSift: automated debugging of big data analytics in data-intensive scalable computing

Gulzar, Muhammad Ali; Wang, Siman; Kim, Miryung

doi:10.1145/3236024.3264586

Citation Details

BigSift: automated debugging of big data analytics in data-intensive scalable computing

Developing Big Data Analytics often involves trial and error debugging, due to the unclean nature of datasets or wrong assumptions made about data. When errors (e.g. program crash, outlier results, etc.) arise, developers are often interested in pinpointing the root cause of errors. To address this problem, BigSift takes an Apache Spark program, a user-defined test oracle function, and a dataset as input and outputs a minimum set of input records that reproduces the same test failure by combining the insights from delta debugging with data provenance. The technical contribution of BigSift is the design of systems optimizations that bring automated debugging closer to a reality for data intensive scalable computing. BigSift exposes an interactive web interface where a user can monitor a big data analytics job running remotely on the cloud, write a user-defined test oracle function, and then trigger the automated debugging process. BigSift also provides a set of predefined test oracle functions, which can be used for explaining common types of anomalies in big data analytics--for example, finding the origin of the output value that is more than k standard deviations away from the median. The demonstration video is available at https://youtu.be/jdBsCd61a1Q. more »

Award ID(s):: 1764077

PAR ID:: 10101001

Author(s) / Creator(s):: Gulzar, Muhammad Ali; Wang, Siman; Kim, Miryung

Date Published:: 2018-11-01

Journal Name:: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Page Range / eLocation ID:: 863 to 866

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3236024.3264586

More Like this