Developers use logs to diagnose performance problems in distributed applications. But, it is difficult to know a priori where logs are needed and what information in them is needed to help diagnose problems that may occur in the future. We summarize our work on the Variance-driven Automated Instrumentation Framework (VAIF), which runs alongside distributed applications. In response to newly-observed performance problems, VAIF automatically searches the space of possible instrumentation choices to enable the logs needed to help diagnose them. To work, VAIF combines distributed tracing (an enhanced form of logging) with insights about how response-time variance can be decomposed on the criticalpath portions of requests' traces.
more »
« less
Automating instrumentation choices for performance problems in distributed applications with VAIF
Developers use logs to diagnose performance problems in distributed applications. However, it is difficult to know a priori where logs are needed and what information in them is needed to help diagnose problems that may occur in the future. We present the Variance-driven Automated Instrumentation Framework (VAIF), which runs alongside distributed applica- tions. In response to newly-observed performance problems, VAIF automatically searches the space of possible instrumen- tation choices to enable the logs needed to help diagnose them. To work, VAIF combines distributed tracing (an enhanced form of logging) with insights about how response-time variance can be decomposed on the critical-path portions of requests’ traces. We evaluate VAIF by using it to localize performance problems in OpenStack and HDFS. We show that VAIF can localize problems related to slow code paths, resource contention, and problematic third-party code while enabling only 3-34% of the total tracing instrumentation.
more »
« less
- Award ID(s):
- 2016178
- PAR ID:
- 10395746
- Date Published:
- Journal Name:
- Proceedings of the ACM Symposium on Cloud Computing
- Page Range / eLocation ID:
- 61 to 75
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Diagnosing performance problems in distributed applications is extremely challenging. A significant reason is that it is hard to know where to place instrumentation a priori to help diagnose problems that may occur in the future. We present the vision of an automated instrumentation framework, Pythia, that runs alongside deployed distributed applications. In response to a newly-observed performance problem, Pythia searches the space of possible instrumentation choices to enable the instrumentation needed to help diagnose it. Our vision for Pythia builds on workflow-centric tracing, which records the order and timing of how requests are processed within and among a distributed application's nodes (i.e., records their workflows). It uses the key insight that localizing the sources high performance variation within the workflows of requests that are expected to perform similarly gives insight into where additional instrumentation is needed.more » « less
-
Roll, I; McNamara, D; Sosnovsky, S; Luckin, R; Dimitrova, V. (Ed.)Knowledge tracing refers to a family of methods that estimate each student’s knowledge component/skill mastery level from their past responses to questions. One key limitation of most existing knowledge tracing methods is that they can only estimate an overall knowledge level of a student per knowledge component/skill since they analyze only the (usually binary-valued) correctness of student responses. Therefore, it is hard to use them to diagnose specific student errors. In this paper, we extend existing knowledge tracing methods beyond correctness prediction to the task of predicting the exact option students select in multiple choice questions. We quantitatively evaluate the performance of our option tracing methods on two large-scale student response datasets. We also qualitatively evaluate their ability in identifying common student errors in the form of clusters of incorrect options across different questions that correspond to the same error.more » « less
-
We present 3MileBeach, a tracing and fault injection platform designed for microservice-based architectures. 3Mile-Beach interposes on the message serialization libraries that are ubiquitous in this environment, avoiding the application code instrumentation that tracing and fault injection infrastructures typically require. 3MileBeach provides message-level distributed tracing at less than 50% of the overhead of the state-of-the-art tracing frameworks, and fault injection that allows higher precision experiments than existing solutions. We measure the overhead of 3MileBeach as a tracer and its efficacy as a fault injector. We qualitatively measure its promise as a platform for tuning and debugging by sharing concrete use cases in the context of bottleneck identification, performance tuning, and bug finding. Finally, we use 3MileBeach to perform a novel type of fault injection - Temporal Fault Injection (TFI), which more precisely controls individual inter-service message flow with temporal prerequisites, and makes it possible to catch an entirely new class of fault tolerance bugs.more » « less
-
Explain in Plain English (EiPE) questions evaluate whether students can understand and explain the high-level purpose of code. We conducted a qualitative think-aloud study of introductory programming students solving EiPE questions. In this paper, we focus on how students use tracing (mental execution) to understand code in order to explain it. We found that, in some cases, tracing can be an effective strategy for novices to understand and explain code. Furthermore, we observed three problems that prevented tracing from being helpful, which are 1) not employing tracing when it could be helpful (some struggling students explained correctly after the interviewer suggested tracing the code), 2) tracing incorrectly due to misunderstandings of the programming language, and 3) tracing with a set of inputs that did not sufficiently expose the code’s behavior (upon interviewer suggesting inputs, students explained correctly). These results suggest that we should teach students to use tracing as a method for understanding code and teach them how to select appropriate inputs to trace.more » « less
An official website of the United States government

