TY - GEN
T1 - Cloud-network disaster recovery against cascading failures
AU - Colman-Meixner, Carlos
AU - Tornatore, Massimo
AU - Mukherjee, Biswanath
PY - 2015
Y1 - 2015
N2 - Cloud computing uses cloud networks (CNs) that integrate and virtualize computing servers and communication networks. In a CN, virtual machines (VMs) are interconnected through virtual networks (VNs) provisioned over a physical optical network. A disaster event is a serious threat to cloud computing infrastructure, not only for CN disconnections caused by multiple infrastructure failures, but by subsequent and unpredictable CN disconnections induced by cascading failures. Studies on disaster protection for CNs suggest large pre-provisioning of additional capacity before a possible disaster, with limited protection for later cascading failures. In this work, we propose an adaptive and cascading- failure-aware CN disaster recovery scheme that (re-)acts after the disaster, and uses risk modeling to reduce the capacity required for the recovery and minimize the post-disaster disconnection of CNs. Major power grid outages could cause cascading failures on cloud infrastructure operation. Thus, in this study, propagation patterns of power grid failures are used to estimate the location of cascading failures. Simulation results based on human-made disasters, e.g., weapon of mass destruction (WMD) attacks, show that our approach can lead to significant reduction in the risk of CN disconnections due to cascading failures, while reducing up to 50% of the capacity re-provisioning required for the recovery.
AB - Cloud computing uses cloud networks (CNs) that integrate and virtualize computing servers and communication networks. In a CN, virtual machines (VMs) are interconnected through virtual networks (VNs) provisioned over a physical optical network. A disaster event is a serious threat to cloud computing infrastructure, not only for CN disconnections caused by multiple infrastructure failures, but by subsequent and unpredictable CN disconnections induced by cascading failures. Studies on disaster protection for CNs suggest large pre-provisioning of additional capacity before a possible disaster, with limited protection for later cascading failures. In this work, we propose an adaptive and cascading- failure-aware CN disaster recovery scheme that (re-)acts after the disaster, and uses risk modeling to reduce the capacity required for the recovery and minimize the post-disaster disconnection of CNs. Major power grid outages could cause cascading failures on cloud infrastructure operation. Thus, in this study, propagation patterns of power grid failures are used to estimate the location of cascading failures. Simulation results based on human-made disasters, e.g., weapon of mass destruction (WMD) attacks, show that our approach can lead to significant reduction in the risk of CN disconnections due to cascading failures, while reducing up to 50% of the capacity re-provisioning required for the recovery.
KW - Cascading failures
KW - Cloud computing
KW - Disaster resiliency
KW - Optical network
KW - Post-disaster survivability
KW - Virtual machine migration
KW - Virtual-network recovery
UR - http://www.scopus.com/inward/record.url?scp=84964812022&partnerID=8YFLogxK
U2 - 10.1109/GLOCOM.2014.7417558
DO - 10.1109/GLOCOM.2014.7417558
M3 - Conference Contribution (Conference Proceeding)
AN - SCOPUS:84964812022
T3 - 2015 IEEE Global Communications Conference, GLOBECOM 2015
BT - 2015 IEEE Global Communications Conference, GLOBECOM 2015
PB - Institute of Electrical and Electronics Engineers (IEEE)
T2 - 58th IEEE Global Communications Conference, GLOBECOM 2015
Y2 - 6 December 2015 through 10 December 2015
ER -