Fault-tolerant average execution time optimization for general-purpose multi-processor system-on-chips
2009 (English)In: Proceedings -Design, Automation and Test in Europe, DATE, 2009, 484-489 p.Conference paper (Refereed)
Fault-tolerance is due to the semiconductor technology development important, not only for safety-critical systems but also for general-purpose (non-safety critical) systems. However, instead of guaranteeing that deadlines always are met, it is for general-purpose systems important to minimize the average execution time (AET) while ensuring fault-tolerance. For a given job and a soft (transient) error probability, we define mathematical formulas for AET that includes bus communication overhead for both voting (active replication) and rollback-recovery with checkpointing (RRC). And, for a given multi-processor system-on-chip (MPSoC), we define integer linear programming (ILP) models that minimize AET including bus communication overhead when: (1) selecting the number of checkpoints when using RRC, (2) finding the number of processors and job-to-processor assignment when using voting, and (3) defining fault-tolerance scheme (voting or RRC) per job and defining its usage for each job. Experiments demonstrate significant savings in AET.
Place, publisher, year, edition, pages
2009. 484-489 p.
Engineering and Technology
IdentifiersURN: urn:nbn:se:liu:diva-52992DOI: 10.1109/DATE.2009.5090713ISI: 000273246700086ISBN: 978-1-4244-3781-8OAI: oai:DiVA.org:liu-52992DiVA: diva2:286358
2009 Design, Automation and Test in Europe Conference and Exhibition, DATE '09; Nice; France