Modelling Resilience in Cloud-Scale Data Centres

John Cartlidge, Ilango L Sriram

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

43 Downloads (Pure)


The trend for cloud computing has initiated a race towards data centres (DC) of an ever-increasing size. The largest DCs now contain many hundreds of thousands of virtual machine (VM) services. Given the finite lifespan of hardware, such large DCs are subject to frequent hardware failure events that can lead to disruption of service. To counter this, multiple redundant copies of task threads may be distributed around a DC to ensure that individual hardware failures do not cause entire jobs to fail. Here, we present results demonstrating the resilience of different job scheduling algorithms in a simulated DC with hardware failure. We use a simple model of jobs distributed across a hardware network to demonstrate the relationship between resilience and additional communication costs of different scheduling methods.
Original languageEnglish
Title of host publication23rd European Modeling and Simulation Symposium (EMSS 2011)
Subtitle of host publicationProceedings of a meeting held 12-14 September 2011, Rome, Italy. Held at the International Mediterranean and Latin American Modeling Multiconference
EditorsAgostino Bruzzone, Miquel Piera, Francesco Longo, Priscilla Elfrey, Michael Affenzeller, Osman Balci
PublisherUniversity of Genoa Press
Number of pages9
ISBN (Print)9788890372445
Publication statusPublished - Mar 2014
Event23rd European Modeling & Simulation Symposium (EMSS-2011) - Rome, Italy
Duration: 12 Sep 201114 Sep 2011


Conference23rd European Modeling & Simulation Symposium (EMSS-2011)


  • cloud computing
  • cloud middleware
  • network topology
  • resilience
  • simulation


Dive into the research topics of 'Modelling Resilience in Cloud-Scale Data Centres'. Together they form a unique fingerprint.

Cite this