skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on November 1, 2024

Title: A BlackBox Approach to Profile Runtime Execution Dependencies in Microservices
Loosely-coupled and lightweight microservices running in containers are likely to form complex execution dependencies inside the system. The execution dependency arises when two execution paths partially share component microservices, resulting in potential runtime performance interference. In this paper, we present a blackbox approach that utilizes legitimate HTTP requests to accurately profile the internal pairwise dependencies of all supported execution paths in the target microservices application. Concretely, we profile the pairwise dependency of two execution paths through performance interference analysis by sending bursts of two types of requests simultaneously. By characterizing and grouping all the execution paths based on their pairwise dependencies, the blackbox approach can derive a clear dependency graph(s) of the entire backend of the microservices application. We validate the effectiveness of the blackbox approach through experiments of open-source microservices benchmark applications running on real clouds (e.g., EC2, Azure).  more » « less
Award ID(s):
2000681
NSF-PAR ID:
10499482
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE 9th International Conference on Collaboration and Internet Computing (CIC)
ISBN:
979-8-3503-3912-3
Page Range / eLocation ID:
116 to 120
Format(s):
Medium: X
Location:
Atlanta, GA, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Early run-time prediction of co-running independent applications prior to application integration becomes challenging in multi-core processors. One of the most notable causes is the interference at the main memory subsystem, which results in significant degradation in application performance and response time in comparison to standalone execution. Currently available techniques for run-time prediction like traditional cycle-accurate simulations are slow, and analytical models are not accurate and time-consuming to build. By contrast, existing machine-learning-based approaches for run-time prediction simply do not account for interference. In this paper, we use a machine learning- based approach to train a model to correlate performance data (instructions and hardware performance counters) for a set of benchmark applications between the standalone and interference scenarios. After that, the trained model is used to predict the run-time of co-running applications in interference scenarios. In general, there is no straightforward one-to-one correspondence between samples obtained from the standalone and interference scenarios due to the different run-times, i.e. execution speeds. To address this, we developed a simple yet effective sample alignment algorithm, which is a key component in transforming interference prediction into a machine learning problem. In addition, we systematically identify the subset of features that have the highest positive impact on the model performance. Our approach is demonstrated to be effective and shows an average run-time prediction error, which is as low as 0.3% and 0.1% for two co-running applications. 
    more » « less
  2. Slow builds remain a plague for software developers. The frequency with which code can be built (compiled, tested and packaged) directly impacts the productivity of developers: longer build times mean a longer wait before determining if a change to the application being built was successful. We have discovered that in the case of some languages, such as Java, the majority of build time is spent running tests, where dependencies between individual tests are complicated to discover, making many existing test acceleration techniques unsound to deploy in practice. Without knowledge of which tests are dependent on others, we cannot safely parallelize the execution of the tests, nor can we perform incremental testing (i.e., execute only a subset of an application's tests for each build). The previous techniques for detecting these dependencies did not scale to large test suites: given a test suite that normally ran in two hours, the best-case running scenario for the previous tool would have taken over 422 CPU days to find dependencies between all test methods (and would not soundly find all dependencies) — on the same project the exhaustive technique (to find all dependencies) would have taken over 1e300 years. We present a novel approach to detecting all dependencies between test cases in large projects that can enable safe exploitation of parallelism and test selection with a modest analysis cost. 
    more » « less
  3. Cloud applications are increasingly shifting to interactive and loosely-coupled microservices. Despite their advantages, microservices complicate resource management, due to inter-tier dependencies. We present Sinan and PuppetMaster, two cluster managers for interactive microservices that leverages easily-obtainable tracing data instead of empirical decisions, to infer the impact of a resource allocation on end-to-end performance, and allocate appropriate resources to each tier. In a preliminary evaluation of the system with an end-to-end social network built with microservices, we show that the cluster manager's data-driven approach allows the service to always meet its QoS without sacrificing resource efficiency. 
    more » « less
  4. null (Ed.)
    Python has become a widely used programming language for research, not only for small one-off analyses, but also for complex application pipelines running at supercomputer- scale. Modern parallel programming frameworks for Python present users with a more granular unit of management than traditional Unix processes and batch submissions: the Python function. We review the challenges involved in running native Python functions at scale, and present techniques for dynamically determining a minimal set of dependencies and for assembling a lightweight function monitor (LFM) that captures the software environment and manages resources at the granularity of single functions. We evaluate these techniques in a range of environ- ments, from campus cluster to supercomputer, and show that our advanced dependency management planning and dynamic re- source management methods provide superior performance and utilization relative to coarser-grained management approaches, achieving several-fold decrease in execution time for several large Python applications. 
    more » « less
  5. The paradigm shift of deploying applications to the cloud has introduced both opportunities and challenges. Although clouds use elasticity to scale resource usage at runtime to help meet an application’s performance requirements, developers are still challenged by unpredictable performance, little control of execution environment, and differences among cloud service providers, all while being charged for their cloud usages. Application performance stability is particularly affected by multi-tenancy in which the hardware is shared among varying applications and virtual machines. Developers porting their applications need to meet performance requirements, but testing on the cloud under the effects of performance uncertainty is difficult and expensive, due to high cloud usage costs. This paper presents a first approach to testing an application with typical inputs for how its performance will be affected by performance uncertainty, without incurring undue costs of bruteforce testing in the cloud. We specify cloud uncertainty testing criteria, design a test-based strategy to characterize the blackbox cloud’s performance distributions using these testing criteria, and support execution of tests to characterize the resource usage and cloud baseline performance of the application to be deployed. Importantly, we developed a smart test oracle that estimates the application’s performance with certain confidence levels using the above characterization test results and determines whether it will meet its performance requirements. We evaluated our testing approach on both the Chameleon cloud and Amazon web services; results indicate that this testing strategy shows promise as a cost-effective approach to test for performance effects of cloud uncertainty when porting an application to the cloud. 
    more » « less