In the post-pandemic era, global working patterns have been reshaped, and the demand for online network services has increased significantly. Therefore, cross-data-center content migration has become a relevant problem to address, leading to higher attention in data backup/recovery planning. Beyond traditional pre-disaster content redundancy approaches, this work focuses on the challenge of rapid post-disaster content evacuation under the threat of cascading failures. In fact, due to the interdependence of data centers (DCs), inter-DC optical networks, and power grid networks, disasters may have a domino effect on these infrastructures, with their impact gradually expanding over time and space. In this paper, we propose two trajectory models that capture the dynamic evolution of cascading failures, and we propose a trajectory-based content evacuation (TCE) strategy that considers the spatiotemporal evolution of cascading failures to minimize content loss. Numerical results show that, when each DC needs to evacuate about 200 TB of massive content, TCE can reduce content loss by up to 25% compared to baseline strategies.
more »
« less
Content Evacuation in Inter-DC Optical Networks under Post-Disaster Cascading Failures
In the post-pandemic era, global work patterns have been reshaped, and the demand for cloud migration for enterprises and government has increased. As a result, cloud data disaster backup/recovery technology has been gaining more attention. Moving beyond the traditional focus on pre-disaster content backup, our study addresses the challenge of rapidly evacuating content during cascading failures in post-disaster scenarios. Due to the interdependence of i) data centers (DCs), ii) inter-DC optical networks, and iii) power grid networks, disasters can have a domino effect on these infrastructures, with their impact gradually expanding over time and space. In this work, we propose two trajectory models for constructing the spatio-temporal features of the inter-DC optical network under cascading failures, and we propose a trajectory-based content evacuation strategy (TCE). Numerical results show that TCE can reduce content loss by up to 25% compared to baseline content evacuation strategies.
more »
« less
- Award ID(s):
- 2210384
- PAR ID:
- 10646461
- Publisher / Repository:
- IEEE
- Date Published:
- ISSN:
- 2995-0686
- ISBN:
- 978-3-903176-62-1
- Page Range / eLocation ID:
- 1 to 5
- Format(s):
- Medium: X
- Location:
- Madrid, Spain
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This project for Rapid Response Research (RAPID) project collects ephemeral data to better understand the compounding impacts of Maui wildfires and Hurricane Dora and reveal residents' behavioral responses as affected by infrastructure failures. It examines the sources of warning information, protective action decision-making, and evacuation logistics at the individual level. In the meantime, the project captures the operation states of disaster warning operations in Maui under the loss of cell and electric power services. Failures at each system are documented, as well as the cascading effect among inter-connected infrastructure systems. The research outcomes expand the existing body of scientific knowledge on warning and evacuation while advancing the understanding of informal networks and decision-making in the absence of official guidance.more » « less
-
Network connectivity, i.e., the reachability of any network node from all other nodes, is often considered as the default network survivability metric against failures. However, in the case of a large-scale disaster disconnecting multiple network components, network connectivity may not be achievable. On the other hand, with the shifting service paradigm towards the cloud in today’s networks, most services can still be provided as long as at least a content replica is available in all disconnected network partitions. As a result, the concept of content connectivity has been introduced as a new network survivability metric under a large-scale disaster. Content connectivity is defined as the reachability of content from every node in a network under a specific failure scenario. In this work, we investigate how to ensure content connectivity in optical metro networks. We derive necessary and sufficient conditions and develop what we believe to be a novel mathematical formulation to map a virtual network over a physical network such that content connectivity for the virtual network is ensured against multiple link failures in the physical network. In our numerical results, obtained under various network settings, we compare the performance of mapping with content connectivity and network connectivity and show that mapping with content connectivity can guarantee higher survivability, lower network bandwidth utilization, and significant improvement of service availability.more » « less
-
In network-cloud ecosystems, large-scale failures affecting network carrier and datacenter (DC) infrastructures can severely disrupt cloud services. Post-disaster cloud service restoration requires cooperation among carriers and DC providers (DCPs) to minimize downtime. Such cooperation is challenging due to proprietary and regulatory policies, which limit access to confidential information (detailed topology, resource availability, etc.). Accordingly, we introduce a third-party entity, a provider-neutral exchange, which enables cooperation by sharing abstracted information. We formulate an optimization problem for DCP–carrier cooperation to maximize service restoration while minimizing restoration time and cost. We propose a scalable heuristic, demonstrating significant improvement in restoration efficiency with different topologies and failure scenarios.more » « less
-
Security concerns have been raised about cascading failure risks in evolving power grids. This paper reveals, for the first time, that the risk of cascading failures can be increased at low network demand levels when considering security-constrained generation dispatch. This occurs because critical transmission cor- ridors become very highly loaded due to the presence of central- ized generation dispatch, e.g., large thermal plants far from de- mand centers. This increased cascading risk is revealed in this work by incorporating security-constrained generation dispatch into the risk assessment and mitigation of cascading failures. A se- curity-constrained AC optimal power flow, which considers eco- nomic functions and security constraints (e.g., network con- straints, 𝑵 − 𝟏 security, and generation margin), is used to pro- vide a representative day-ahead operational plan. Cascading fail- ures are simulated using two simulators, a quasi-steady state DC power flow model, and a dynamic model incorporating all fre- quency-related dynamics, to allow for result comparison and ver- ification. The risk assessment procedure is illustrated using syn- thetic networks of 200 and 2,000 buses. Further, a novel preventive mitigation measure is proposed to first identify critical lines, whose failures are likely to trigger cascading failures, and then to limit power flow through these critical lines during dispatch. Results show that shifting power equivalent to 1% of total demand from critical lines to other lines can reduce cascading risk by up to 80%.more » « less
An official website of the United States government

