The main premise of this work is that since large cloud providers can and do manipulate probe packets that traverse their privately owned and operated backbones, standard traceroute-based measurement techniques are no longer a reliable means for assessing network connectivity in large cloud provider infrastructures. In response to these developments, we present a new empirical approach for elucidating private connectivity in today's Internet. Our approach relies on using only "light-weight" ( i.e., simple, easily-interpretable, and readily available) measurements, but requires applying a "heavy-weight" or advanced mathematical analysis. In particular, we describe a new method for assessing the characteristics of network path connectivity that is based on concepts from Riemannian geometry ( i.e., Ricci curvature) and also relies on an array of carefully crafted visualizations ( e.g., a novel manifold view of a network's delay space). We demonstrate our method by utilizing latency measurements from RIPE Atlas anchors and virtual machines running in data centers of three large cloud providers to (i) study different aspects of connectivity in their private backbones and (ii) show how our manifold-based view enables us to expose and visualize critical aspects of this connectivity over different geographic scales.
more »
« less
A Manifold View of Connectivity in the Private Backbone Networks of Hyperscalers
As hyperscalers such as Google, Microsoft, and Amazon play an increasingly important role in today's Internet, they are also capable of manipulating probe packets that traverse their privately owned and operated backbones. As a result, standard traceroute-based measurement techniques are no longer a reliable means for assessing network connectivity in these global-scale cloud provider infrastructures. In response to these developments, we present a new empirical approach for elucidating connectivity in these private backbone networks. Our approach relies on using only lightweight (i.e., simple, easily interpretable, and readily available) measurements, but requires applying heavyweight mathematical techniques for analyzing these measurements. In particular, we describe a new method that uses network latency measurements and relies on concepts from Riemannian geometry (i.e., Ricci curvature) to assess the characteristics of the connectivity fabric of a given network infrastructure. We complement this method with a visualization tool that generates a novel manifold view of a network's delay space. We demonstrate our approach by utilizing latency measurements from available vantage points and virtual machines running in datacenters of three large cloud providers to study different aspects of connectivity in their private backbones and show how our generated manifold views enable us to expose and visualize critical aspects of this connectivity.
more »
« less
- Award ID(s):
- 2039146
- PAR ID:
- 10496493
- Publisher / Repository:
- Association for Computer Machinery
- Date Published:
- Journal Name:
- Communications of the ACM
- Volume:
- 66
- Issue:
- 8
- ISSN:
- 0001-0782
- Page Range / eLocation ID:
- 95 to 103
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Edge computing is an emerging paradigm whose goal is to boost with cloud resources available at the edge the computational capability of otherwise weak devices. This paradigm is mostly attractive to reduce user perceived latency. A central mechanism in edge computing is cyber-foraging, i.e., the search and delegation to capable edge cloud processes of tasks too complex, time consuming or resource intensive to be running on user devices or low-latency demanding to be running remotely, as a form of edge function. An edge function is any network or device-specific process that may be run on an edge process instead. Despite the recent interest for this technology from industry and academia, cyber-foraging techniques and protocols have yet to be standardized. In this paper, we leverage decomposition theory to propose an architecture providing insights in the design and implementation of protocols for cyber-foraging of multiple edge functions. In contrast with several existing solutions, we argue that the (distributed) cyber-foraging orchestration should be policy-based and not an ad-hoc solution, i.e., either a pure edge cloud burden or a device decision. To this end, via simulations, we show how our approach can be used by edge computing providers and application programmers to compare and evaluate different alternative cyber-foraging solutions. Our decomposition-based approach has general applicability to other network utility maximization problems, even outside the edge computing domain.more » « less
-
The growing demand of industrial, automotive and service robots presents a challenge to the centralized Cloud Robotics model in terms of privacy, security, latency, bandwidth, and reliability. In this paper, we present a ‘Fog Robotics’ approach to deep robot learning that distributes compute, storage and networking resources between the Cloud and the Edge in a federated manner. Deep models are trained on non-private (public) synthetic images in the Cloud; the models are adapted to the private real images of the environment at the Edge within a trusted network and subsequently, deployed as a service for low-latency and secure inference/prediction for other robots in the network. We apply this approach to surface decluttering, where a mobile robot picks and sorts objects from a cluttered floor by learning a deep object recognition and a grasp planning model. Experiments suggest that Fog Robotics can improve performance by sim-to-real domain adaptation in comparison to exclusively using Cloud or Edge resources, while reducing the inference cycle time by 4 to successfully declutter 86% of objects over 213 attempts.more » « less
-
Function-as-a-Service (FaaS) is becoming an increasingly popular cloud-deployment paradigm for serverless computing that frees application developers from managing the infrastructure. At the same time, it allows cloud providers to assert control in workload consolidation, i.e., co-locating multiple containers on the same server, thereby achieving higher server utilization, often at the cost of higher end-to-end function request latency. Interestingly, a key aspect of serverless latency management has not been well studied: the trade-off between application developers' latency goals and the FaaS providers' utilization goals. This paper presents a multi-faceted, measurement-driven study of latency variation in serverless platforms that elucidates this trade-off space. We obtained production measurements by executing FaaS benchmarks on IBM Cloud and a private cloud to study the impact of workload consolidation, queuing delay, and cold starts on the end-to-end function request latency. We draw several conclusions from the characterization results. For example, increasing a container's allocated memory limit from 128 MB to 256 MB reduces the tail latency by 2× but has 1.75× higher power consumption and 59% lower CPU utilization.more » « less
-
null (Ed.)Given its demonstrated ability in analyzing and revealing patterns underlying data, Deep Learning (DL) has been increasingly investigated to complement physics-based models in various aspects of smart manufacturing, such as machine condition monitoring and fault diagnosis, complex manufacturing process modeling, and quality inspection. However, successful implementation of DL techniques relies greatly on the amount, variety, and veracity of data for robust network training. Also, the distributions of data used for network training and application should be identical to avoid the internal covariance shift problem that reduces the network performance applicability. As a promising solution to address these challenges, Transfer Learning (TL) enables DL networks trained on a source domain and task to be applied to a separate target domain and task. This paper presents a domain adversarial TL approach, based upon the concepts of generative adversarial networks. In this method, the optimizer seeks to minimize the loss (i.e., regression or classification accuracy) across the labeled training examples from the source domain while maximizing the loss of the domain classifier across the source and target data sets (i.e., maximizing the similarity of source and target features). The developed domain adversarial TL method has been implemented on a 1-D CNN backbone network and evaluated for prediction of tool wear propagation, using NASA's milling dataset. Performance has been compared to other TL techniques, and the results indicate that domain adversarial TL can successfully allow DL models trained on certain scenarios to be applied to new target tasks.more » « less