Optimizing request routing in large microservice-based applications is difficult, especially when applications span multiple geo-distributed clusters. In this paper, inspired by ideas from network traffic engineering, we propose Service Layer Traffic Engineering (SLATE), a new framework for request routing in microservices that span multiple clusters. SLATE leverages global knowledge of cluster states and multi-hop application graphs to centrally control the flow of requests in order to optimize end-to-end application latency and cost. Realizing such a system requires tackling several technical challenges unique to service layer, such as accounting for different request traffic classes, multi-hop call trees, and application latency profiles. We identify such challenges and build a preliminary prototype that addresses some of them. Preliminary evaluations of our prototype show how SLATE outperforms the state-of-the-art global load balancing approach (used by Meta’s Service Router and Google’s Traffic Director) by up to 3.5× in average latency and reduces egress bandwidth cost by up to 11.6×.
more »
« less
Improving Service Performance through Multilayer Routing and Service Intelligence in a Network Service Mesh
Network service mesh architectures, by interconnecting cloud clusters, provide access to services across distributed infrastructures. Typically, services are replicated across clusters to ensure resilience. However, end-to-end service performance varies mainly depending on the service loads experienced by individual clusters. Therefore, a key challenge is to optimize end-to-end service performance by routing service requests to clusters with the least service processing/response times. We present a two-phase approach that combines an optimized multi-layer optical routing system with service mesh performance costs to improve end-to-end service performance. Our experimental strategy shows that leveraging a multi-layer architecture in combination with service performance information improves end-to-end performance. We evaluate our approach by testing our strategy on a service mesh layer overlay on a modified continental united states (CONUS) network topology.
more »
« less
- Award ID(s):
- 1817105
- PAR ID:
- 10464591
- Date Published:
- Journal Name:
- 2021 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS)
- Page Range / eLocation ID:
- 420 to 425
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Next-generation optical metro-access networks are expected to support end-to-end virtual network slices for critical 5G services. However, disasters affecting physical infrastructures upon which network slices are mapped can cause significant disruption in these services. Operators can deploy recovery units or trucks to restore services based on slice requirements. In this study, we investigate the problem of slice-aware service restoration in metro-access networks with specialized recovery trucks to restore services after a disaster failure. We model the problem based on classical vehicle-routing problem to find optimal routes for recovery trucks to failure sites to provide temporary backup service until the network components are repaired. Our proposed slice-aware service-restoration approach is formulated as a mixed integer linear program with the objective to minimize penalty of service disruption across different network slices.We compare our slice-aware approach with a slice-unaware approach and show that our proposed approach can achieve significant reduction in service-disruption penaltymore » « less
-
The proliferation of innovative mobile services such as augmented reality, networked gaming, and autonomous driving has spurred a growing need for low-latency access to computing resources that cannot be met solely by existing centralized cloud systems. Mobile Edge Computing (MEC) is expected to be an effective solution to meet the demand for low-latency services by enabling the execution of computing tasks at the network-periphery, in proximity to end-users. While a number of recent studies have addressed the problem of determining the execution of service tasks and the routing of user requests to corresponding edge servers, the focus has primarily been on the efficient utilization of computing resources, neglecting the fact that non-trivial amounts of data need to be stored to enable service execution, and that many emerging services exhibit asymmetric bandwidth requirements. To fill this gap, we study the joint optimization of service placement and request routing in MEC-enabled multi-cell networks with multidimensional (storage-computation-communication) constraints. We show that this problem generalizes several problems in literature and propose an algorithm that achieves close-to-optimal performance using randomized rounding. Evaluation results demonstrate that our approach can effectively utilize the available resources to maximize the number of requests served by low-latency edge cloud servers.more » « less
-
A majority of today's cloud services are independently operated by individual cloud service providers. In this approach, the locations of cloud resources are strictly constrained by the distribution of cloud service providers' sites. As the popularity and scale of cloud services increase, we believe this traditional paradigm is about to change toward further federated services, a.k.a., multi-cloud, due to the improved performance, reduced cost of compute, storage and network resources, as well as increased user demands. In this paper, we present COMET, a lightweight, distributed storage system for managing metadata on large scale, federated cloud infrastructure providers, end users, and their applications (e.g. HTCondor Cluster or Hadoop Cluster). We showcase use case from NSF's, Chameleon, ExoGENI and JetStream research cloud testbeds to show the effectiveness of COMET design and deployment.more » « less
-
Microservices are a dominant cloud computing architecture because they enable applications to be built as collections of loosely coupled services. To provide greater observability and control into the resultant distributed system, microservices often use an overlay proxy network called a service mesh. A key advantage of service meshes is their ability to implement zero trust networking by encrypting microservice traffic with mutually authenticated TLS. However, the service mesh control plane—particularly its local certificate authority—becomes a critical point of trust. If compromised, an attacker can issue unauthorized certificates and redirect traffic to impersonating services. In this paper, we introduce our initial work in Mazu, a system designed to eliminate trust in the service mesh control plane by replacing its certificate authority with an unprivileged principal. Mazu leverages recent advances in registration-based encryption and integrates seamlessly with Istio, a widely used service mesh. Our preliminary evaluation, using Fortio macro-benchmarks and Prometheus-assisted micro-benchmarks, shows that Mazu significantly reduces the service mesh’s attack surface while adding just 0.17 ms to request latency compared to mTLS-enabled Istio.more » « less
An official website of the United States government

