White-box testing of big data analytics with complex user-defined functions

Gulzar, Muhammad Ali; Mardani, Shaghayegh; Musuvathi, Madanlal; Kim, Miryung

doi:10.1145/3338906.3338953

Citation Details

White-box testing of big data analytics with complex user-defined functions

Data-intensive scalable computing (DISC) systems such as Google’s MapReduce, Apache Hadoop, and Apache Spark are being leveraged to process massive quantities of data in the cloud. Modern DISC applications pose new challenges in exhaustive, automatic testing because they consist of dataflow operators, and complex user-defined functions (UDF) are prevalent unlike SQL queries. We design a new white-box testing approach, called BigTest to reason about the internal semantics of UDFs in tandem with the equivalence classes created by each dataflow and relational operator. Our evaluation shows that, despite ultra-large scale input data size, real world DISC applications are often significantly skewed and inadequate in terms of test coverage, leaving 34% of Joint Dataflow and UDF (JDU) paths untested. BigTest shows the potential to minimize data size for local testing by 10^5 to 10^8 orders of magnitude while revealing 2X more manually-injected faults than the previous approach. Our experiment shows that only few of the data records (order of tens) are actually required to achieve the same JDU coverage as the entire production data. The reduction in test data also provides CPU time saving of 194X on average, demonstrating that interactive and fast local testing is feasible for big data analytics, obviating the need to test applications on huge production data. more »

Award ID(s):: 1764077 1723773

PAR ID:: 10173703

Author(s) / Creator(s):: Gulzar, Muhammad Ali; Mardani, Shaghayegh; Musuvathi, Madanlal; Kim, Miryung

Date Published:: 2019-01-01

Journal Name:: ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Page Range / eLocation ID:: 290 to 301

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3338906.3338953

More Like this