skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Snicket: Query-Driven Distributed Tracing
Increasing application complexity has caused applications to be refactored into smaller components known as microservices that communicate with each other using RPCs. Distributed tracing has emerged as an important debugging tool for such microservice-based applications. Distributed tracing follows the journey of a user request from its starting point at the application's front-end, through RPC calls made by the front-end to different microservices recursively, all the way until a response is constructed and sent back to the user. To reduce storage costs, distributed tracing systems sample traces before collecting them for subsequent querying, affecting the accuracy of queries on the collected traces. We propose an alternative system, Snicket, that tightly integrates querying and collection of traces. Snicket takes as input a database-style streaming query that expresses the analysis the developer wants to perform on the trace data. This query is compiled into a distributed collection of microservice extensions that run as "bumps-in-the-wire," intercepting RPC requests and responses as they flow into and out of microservices. This collection of extensions implements the query, performing early filtering and computation on the traces to reduce the amount of stored data in a query-specific manner. We show that Snicket is expressive in the queries it can support and can update queries fast enough for interactive use.  more » « less
Award ID(s):
2008048
PAR ID:
10316153
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
HotNets '21: Proceedings of the Twentieth ACM Workshop on Hot Topics in Networks
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Reducing tail latency has become a crucial issue for optimizing the performance of online cloud services and distributed applications. In distributed applications, there are many causes of high end-to-end tail latency, including operating system delays, request re-ordering due to fan-out/fanin, and network congestion. Although recent research has focused on reducing tail latency for individual application components, such as by replicating requests and scheduling, in this paper, we argue for a holistic approach for reducing the end-to-end tail latency across application components. We propose TailClipper, a distributed scheduler that tags each arriving request with an arrival timestamp, and propagates it across the microservices' call chain. TailClipper then uses arrival timestamps to implement an oldest request first scheduler that combines global first-come first serve with a limited form of processor sharing to reduce end-to-end tail latency. In doing so, TailClipper can counter the performance degradation caused by request reordering in multi-tiered and microservices-based applications. We implement TailClipper as a userspace Linux scheduler and evaluate it using cloud workload traces and a real-world microservices application. Compared to state-of-the-art schedulers, our experiments reveal that TailClipper improves the 99th percentile response time by up to 81%, while also improving the mean response time and the system throughput by up to 54% and 29% respectively under high loads. 
    more » « less
  2. Test coverage is a critical aspect of the software development process, aiming for overall confidence in the product. When considering cloud-native systems, testing becomes complex, as it becomes necessary to deal with multiple distributed microservices that are developed by different teams and may change quite rapidly. In such a dynamic environment, it is important to track test coverage. This is especially relevant for end-to-end (E2E) and API testing, as these might be developed by teams distinct from microservice developers. Moreover, indirection exists in E2E, where the testers may see the user interface but not know how comprehensive the test suits are. To ensure confidence in health checks in the system, mechanisms and instruments are needed to indicate the test coverage level. Unfortunately, there is a lack of such mechanisms for cloud-native systems. This manuscript introduces test coverage metrics for evaluating the extent of E2E and API test suite coverage for microservice endpoints. It elaborates on automating the calculation of these metrics with access to microservice codebases and system testing traces, delves into the process, and offers feedback with a visual perspective, emphasizing test coverage across microservices. To demonstrate the viability of the proposed approach, we implement a proof-of-concept tool and perform a case study on a well-established system benchmark assessing existing E2E and API test suites with regard to test coverage using the proposed endpoint metrics. The results of endpoint coverage reflect the diverse perspectives of both testing approaches. API testing achieved 91.98% coverage in the benchmark, whereas E2E testing achieved 45.42%. Combining both coverage results yielded a slight increase to approximately 92.36%, attributed to a few endpoints tested exclusively through one testing approach, not covered by the other. 
    more » « less
  3. null (Ed.)
    Microservice Architecture (MSA) is becoming the predominant direction of new cloud-based applications. There are many advantages to using microservices, but also downsides to using a more complex architecture than a typical monolithic enterprise application. Beyond the normal poor coding practices and code smells of a typical application, microservice-specific code smells are difficult to discover within a distributed application setup. There are many static code analysis tools for monolithic applications, but tools to offer code-smell detection for microservice-based applications are lacking. This paper proposes a new approach to detect code smells in distributed applications based on microservices. We develop an MSANose tool to detect up to eleven different microservice specific code smells and share it as open-source. We demonstrate our tool through a case study on two robust benchmark microservice applications and verify its accuracy. Our results show that it is possible to detect code smells within microservice applications using bytecode and/or source code analysis throughout the development process or even before its deployment to production. 
    more » « less
  4. Cloud applications are increasingly relying on hundreds of loosely-coupled microservices to complete user requests that meetan application’s end-to-end QoS requirements. Communication time between services accounts for a large fraction of the end-to-endlatency and can introduce performance unpredictability and QoS violations. This work presents our early work onDagger, a hardwareacceleration platform for networking, designed specifically with the unique qualities of microservices in mind. The Dagger architecturerelies on an FPGA-based NIC, closely coupled with the processor over a configurable memory interconnect, designed to offload andaccelerate RPC stacks. Unlike the traditional cloud systems that use PCIe links as the NIC I/O interface, we leverage memory-interconnectedFPGAs as networking devices to provide the efficiency, transparency, and programmability needed for fine-grained microservices. We showthat this considerably improves CPU utilization and performance for cloud RPCs. 
    more » « less
  5. null (Ed.)
    The microservice architecture is a popular software engineering approach for building flexible, large-scale online services. Serverless functions, or function as a service (FaaS), provide a simple programming model of stateless functions which are a natural substrate for implementing the stateless RPC handlers of microservices, as an alternative to containerized RPC servers. However, current serverless platforms have millisecond-scale runtime overheads, making them unable to meet the strict sub-millisecond latency targets required by existing interactive microservices. We present Nightcore, a serverless function runtime with microsecond-scale overheads that provides container-based isolation between functions. Nightcore’s design carefully considers various factors having microsecond-scale overheads, including scheduling of function requests, communication primitives, threading models for I/O, and concurrent function executions. Nightcore currently supports serverless functions written in C/C++, Go, Node.js, and Python. Our evaluation shows that when running latency-sensitive interactive microservices, Nightcore achieves 1.36×–2.93× higher throughput and up to 69% reduction in tail latency. 
    more » « less