skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Resilience Enhancement of Optical Network-Cloud Ecosystems with Dataspace Framework and Multi-entity Cooperation (Invited)
To enhance the resilience of network-cloud ecosystems, we establish a data governance framework for sharing optical testbed data across organizations and fostering machine learning research of optical networks. We further introduce multientity cooperation for efficient network-cloud recovery with open and policy-based information sharing among entities.  more » « less
Award ID(s):
2210384
PAR ID:
10646458
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
IEEE
Date Published:
ISSN:
2376-8614
ISBN:
979-8-3315-0903-3
Format(s):
Medium: X
Location:
Berlin, Germany
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Recent advances in cyber-infrastructure have enabled digital data sharing and ubiquitous network connectivity between scientific instruments and cloud-based storage infrastructure for uploading, storing, curating, and correlating of large amounts of materials and semiconductor fabrication data and metadata. However, there is still a significant number of scientific instruments running on old operating systems that are taken offline and cannot connect to the cloud infrastructure, due to security and network performance concerns. In this paper, we propose BRACELET - an edge-cloud infrastructure that augments the existing cloud-based infrastructure with edge devices and helps to tackle the unique performance & security challenges that scientific instruments face when they are connected to the cloud through public network. With BRACELET, we put a networked edge device, called cloudlet, in between the scientific instruments and the cloud as the middle tier of a three-tier hierarchy. The cloudlet will shape and protect the data traffic from scientific instruments to the cloud, and will play a foundational role in keeping the instruments connected throughout its lifetime, and continuously providing the otherwise missing performance and security features for the instrument as its operating system ages. 
    more » « less
  2. We investigate the problem of future disaster-resilient optical network-cloud ecosystems. We introduce our solutions considering openness/disaggregation and cooperation for single- and multi-entity network-cloud ecosystems, respectively. 
    more » « less
  3. Scalability and flexibility of modern cloud application can be mainly attributed to virtual machines (VMs) and containers, where virtual machines are isolated operating systems that run on a hypervisor while containers are lightweight isolated processes that share the Host OS kernel. To achieve the scalability and flexibility required for modern cloud applications, each bare-metal server in the data center often houses multiple virtual machines, each of which runs multiple containers and multiple containerized applications that often share the same set of libraries and code, often referred to as images. However, while container frameworks are optimized for sharing images within a single VM, sharing images across multiple VMs, even if the VMs are within the same bare-metal server, is nearly non-existent due to the nature of VM isolation, leading to repetitive downloads, causing redundant added network traffic and latency. This work aims to resolve this problem by utilizing SmartNICs, which are specialized network hardware that provide hardware acceleration and offload capabilities for networking tasks, to optimize image retrieval and sharing between containers across multiple VMs on the same server. The method proposed in this work shows promise in cutting down container cold start time by up to 92%, reducing network traffic by 99.9%. Furthermore, the result is even more promising as the performance benefit is directly proportional to the number of VMs in a server that concurrently seek the same image, which guarantees increased efficiency as bare metal machine specifications improve. 
    more » « less
  4. null (Ed.)
    Edge clouds can provide very responsive services for end-user devices that require more significant compute capabilities than they have. But edge cloud resources such as CPUs and accelerators such as GPUs are limited and must be shared across multiple concurrently running clients. However, multiplexing GPUs across applications is challenging. Further, edge servers are likely to require considerable amounts of streaming data to be processed. Getting that data from the network stream to the GPU can be a bottleneck, limiting the amount of work GPUs do. Finally, the lack of prompt notification of job completion from GPU also results in ineffective GPU utilization. We propose a framework that addresses these challenges in the following manner. We utilize spatial sharing of GPUs to multiplex the GPU more efficiently. While spatial sharing of GPU can increase GPU utilization, the uncontrolled spatial sharing currently available with state-of-the-art systems such as CUDA-MPS can cause interference between applications, resulting in unpredictable latency. Our framework utilizes controlled spatial sharing of GPU, which limits the interference across applications. Our framework uses the GPU DMA engine to offload data transfer to GPU, therefore preventing CPU from being bottleneck while transferring data from the network to GPU. Our framework uses the CUDA event library to have timely, low overhead GPU notifications. Preliminary experiments show that we can achieve low DNN inference latency and improve DNN inference throughput by a factor of ∼1.4. 
    more » « less
  5. In network-cloud ecosystems, large-scale failures affecting network carrier and datacenter (DC) infrastructures can severely disrupt cloud services. Post-disaster cloud service restoration requires cooperation among carriers and DC providers (DCPs) to minimize downtime. Such cooperation is challenging due to proprietary and regulatory policies, which limit access to confidential information (detailed topology, resource availability, etc.). Accordingly, we introduce a third-party entity, a provider-neutral exchange, which enables cooperation by sharing abstracted information. We formulate an optimization problem for DCP–carrier cooperation to maximize service restoration while minimizing restoration time and cost. We propose a scalable heuristic, demonstrating significant improvement in restoration efficiency with different topologies and failure scenarios. 
    more » « less