liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Optimizing Fault Tolerance for Real-Time Systems
Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, The Institute of Technology. (ESLAB)
2012 (English)Licentiate thesis, monograph (Other academic)
Abstract [en]

For the vast majority of computer systems correct operation is defined as producing the correct result within a time constraint (deadline). We refer to such computer systems as real-time systems (RTSs). RTSs manufactured in recent semiconductor technologies are increasingly susceptible to soft errors, which enforces the use of fault tolerance to detect and recover from eventual errors. However, fault tolerance usually introduces a time overhead, which may cause an RTS to violate the time constraints. Depending on the consequences of violating the deadlines, RTSs are divided into hard RTSs, where the consequences are severe, and soft RTSs, otherwise. Traditionally, worst case execution time (WCET) analyses are used for hard RTSs to ensure that the deadlines are not violated, and average execution time (AET) analyses are used for soft RTSs. However, at design time a designer of an RTS copes with the challenging task of deciding whether the system should be a hard or a soft RTS. In such case, focusing only on WCET analyses may result in an over-designed system, while on the other hand focusing only on AET analyses may result in a system that allows eventual deadline violations. To overcome this problem, we introduce Level of Confidence (LoC) as a metric to evaluate to what extent a deadline is met in presence of soft errors. The advantage is that the same metric can be used for both soft and hard RTSs, thus a system designer can precisely specify to what extent a deadline is to be met. In this thesis, we address optimization of Roll-back Recovery with Checkpointing (RRC) which is a good representative for fault tolerance due to that it enables detection and recovery of soft errors at the cost of introducing a time overhead which impacts the execution time of tasks. The time overhead depends on the number of checkpoints that are used. Therefore, we provide mathematical expressions for finding the optimal number of checkpoints which leads to: 1) minimal AET and 2) maximal LoC. To obtain these expressions we assume that error probability is given. However, error probability is not known in advance and it can even vary over runtime. Therefore, we propose two error probability estimation techniques: Periodic Probability Estimation and Aperiodic Probability Estimation that estimate error probability during runtime and adjust the RRC scheme with the goal to reduce the AET. By conducting experiments, we show that both techniques provide near-optimal performance of RRC.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2012. , 97 p.
Linköping Studies in Science and Technology. Thesis, ISSN 0280-7971 ; 1558
National Category
Computer Science
URN: urn:nbn:se:liu:diva-84865Local ID: LiU–Tek–Lic–2012:43ISBN: 978-91-7519-735-7OAI: diva2:565040
2012-12-19, Alan Turing, Hus E, Campus Valla, Linköping University, Linköping, 13:15 (English)
EU, FP7, Seventh Framework ProgrammeSwedish Research Council
Available from: 2012-11-20 Created: 2012-10-25 Last updated: 2012-11-20Bibliographically approved

Open Access in DiVA

fulltext(1036 kB)173 downloads
File information
File name FULLTEXT02.pdfFile size 1036 kBChecksum SHA-512
Type fulltextMimetype application/pdf
omslag(208 kB)51 downloads
File information
File name COVER01.pdfFile size 208 kBChecksum SHA-512
Type coverMimetype application/pdf

By organisation
Software and SystemsThe Institute of Technology
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 173 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 386 hits
ReferencesLink to record
Permanent link

Direct link