Automating instrumentation choices for performance problems in distributed applications with VAIF

Toslali, Mert; Ates, Emre; Ellis, Alex; Zhang, Zhaoqi; Huye, Darby; Liu, Lan; Puterman, Samantha; Coskun, Ayse K.; Sambasivan, Raja R.

doi:10.1145/3472883.3487000

Citation Details

Automating instrumentation choices for performance problems in distributed applications with VAIF

Developers use logs to diagnose performance problems in distributed applications. However, it is difficult to know a priori where logs are needed and what information in them is needed to help diagnose problems that may occur in the future. We present the Variance-driven Automated Instrumentation Framework (VAIF), which runs alongside distributed applica- tions. In response to newly-observed performance problems, VAIF automatically searches the space of possible instrumen- tation choices to enable the logs needed to help diagnose them. To work, VAIF combines distributed tracing (an enhanced form of logging) with insights about how response-time variance can be decomposed on the critical-path portions of requests’ traces. We evaluate VAIF by using it to localize performance problems in OpenStack and HDFS. We show that VAIF can localize problems related to slow code paths, resource contention, and problematic third-party code while enabling only 3-34% of the total tracing instrumentation. more »

Award ID(s):: 2016178

PAR ID:: 10395746

Author(s) / Creator(s):: Toslali, Mert; Ates, Emre; Ellis, Alex; Zhang, Zhaoqi; Huye, Darby; Liu, Lan; Puterman, Samantha; Coskun, Ayse K.; Sambasivan, Raja R.

Date Published:: 2021-11-01

Journal Name:: Proceedings of the ACM Symposium on Cloud Computing

Page Range / eLocation ID:: 61 to 75

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3472883.3487000

More Like this