Replay without recording of production bugs for service oriented applications

Arora, Nipun; Bell, Jonathan; Ivančić, Franjo; Kaiser, Gail; Ray, Baishakhi

doi:10.1145/3238147.3238186

Citation Details

Replay without recording of production bugs for service oriented applications

Short time-to-localize and time-to-fix for production bugs is extremely important for any 24x7 service-oriented application (SOA). Debugging buggy behavior in deployed applications is hard, as it requires careful reproduction of a similar environment and workload. Prior approaches for automatically reproducing production failures do not scale to large SOA systems. Our key insight is that for many failures in SOA systems (e.g., many semantic and performance bugs), a failure can automatically be reproduced solely by relaying network packets to replicas of suspect services, an insight that we validated through a manual study of 16 real bugs across five different systems. This paper presents Parikshan, an application monitoring framework that leverages user-space virtualization and network proxy technologies to provide a sandbox “debug” environment. In this “debug” environment, developers are free to attach debuggers and analysis tools without impacting performance or correctness of the production environment. In comparison to existing monitoring solutions that can slow down production applications, Parikshan allows application monitoring at significantly lower overhead. more »

Award ID(s):: 1563555

PAR ID:: 10110095

Author(s) / Creator(s):: Arora, Nipun; Bell, Jonathan; Ivančić, Franjo; Kaiser, Gail; Ray, Baishakhi

Date Published:: 2018-09-03

Journal Name:: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

Page Range / eLocation ID:: 452 to 463

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3238147.3238186

More Like this