Lifetime Reliability-Aware Checkpointing Mechanism: Modelling and Analysis: 2013 International Symposium on Electronic System Design

Dhiraj K Pradhan, Jimson Mathew

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

6 Citations (Scopus)

Abstract

Checkpointing mechanism is used to tolerate the
impact of transient faults through roll-back operation to a
previously saved system state. In this paper, we propose a novel
checkpointing mechanism that considers fault tolerance in a
duplex system in the presence of both transient and permanent
faults. The main objective of our proposed mechanism is to
extend the lifetime reliability of the duplex system by avoiding
or even tolerating permanent faults in microprocessors. In
addition, we also propose to migrate tasks from a ’near-todie’
processor to a spare processor under a condition where
the current Mean-Time-To-Failure (MTTF) value is less or
equal to a pre-determined threshold MTTF value. We validate
our proposed mechanism and perform overhead analysis using
various case studies. Later, we compare it with one of the most
popular existing checkpointing mechanism, namely the rollforward
checkpointing scheme [9]. We show that unlike rollback
or roll-forward mechanisms, our proposed mechanism
gives significantly higher lifetime reliability with reasonable
system overheads
Original languageEnglish
Title of host publicationLifetime Reliability-Aware Checkpointing Mechanism: Modelling and Analysis
Publication statusPublished - 2013

Fingerprint Dive into the research topics of 'Lifetime Reliability-Aware Checkpointing Mechanism: Modelling and Analysis: 2013 International Symposium on Electronic System Design'. Together they form a unique fingerprint.

Cite this