Energy efficient lifetime reliability-aware checkpointing for real-time system

Mohamad Imran Bin Bandan*, Subhasis Bhattacharjee, Dhiraj K. Pradhan, Jimson Matthew

*Corresponding author for this work

Research output: Contribution to journalArticle (Academic Journal)

2 Citations (Scopus)

Abstract

Due to continued technology scaling, reliability of today's integrated circuits (IC) is an emerging design challenge especially in varied range of operating environment. The lifetime reliability of modern system has been severely limited by higher wear-out and stress effects. Checkpointing has been extensively used as an effective method in fault-tolerant system design. Traditionally, it is used to tolerate the impact of transient faults through saving the intermediate results at predefined time and rolling-back to appropriate previously saved state whenever needed. In this paper, we proposed a new checkpointing mechanism for a duplex real-time system that achieves fault-tolerant against transient and permanent faults, and also provides a fault avoidance mechanism by migrating task from an unhealthy (perhaps near-to-die) host to a spare host. We developed a mathematical model for evaluating the performance of the proposed methodology in presence of various faults and task migration. The combination of checkpointing and task migration enhances the lifetime reliability of the system by tolerating faults and wear-out. Since checkpointing imposes additional overhead, energy consumption and ability to meet the task deadline are very crucial for any real-time system. The Expected-Execution-Time (EET) of a task is an important performance metric in respect to task completion. Similarly, the Average-Energy-Consumption (AEC) reflects the energy usage of a checkpointing mechanism under various faults. Under probabilistic distribution of various faults, we evaluate EET and AEC for our proposed checkpointing mechanism. We also investigated the deadline estimation for our proposed algorithm. We found that the proposed algorithm is able to meet the deadline even when the fault rate is as high as 10-3. Our simulation result shows that the proposed checkpointing mechanism can meet task deadline with only 12.57% time overhead.

Original languageEnglish
Pages (from-to)401-416
Number of pages16
JournalJournal of Low Power Electronics
Volume10
Issue number3
DOIs
Publication statusPublished - 1 Sep 2014

Keywords

  • Checkpointing
  • Energy-aware
  • Fault tolerance
  • Lifetime reliability
  • Microprocessors
  • Task deadline

Fingerprint Dive into the research topics of 'Energy efficient lifetime reliability-aware checkpointing for real-time system'. Together they form a unique fingerprint.

Cite this