liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
BETA
Nikolov, Dimitar
Publications (7 of 7) Show all publications
Petersen, K., Nikolov, D., Ingelsson, U., Carlsson, G. & Larsson, E. (2012). An MPSoCs Demonstrator for Fault Injection and Fault Handling in an IEEE P1687 Environment. In: IEEE 17th European Test Symposimu (ETS 2012), Annecy, France, May 28-June 1, 2012. Paper presented at ETS12.
Open this publication in new window or tab >>An MPSoCs Demonstrator for Fault Injection and Fault Handling in an IEEE P1687 Environment
Show others...
2012 (English)In: IEEE 17th European Test Symposimu (ETS 2012), Annecy, France, May 28-June 1, 2012, 2012Conference paper, Oral presentation only (Refereed)
Abstract [en]

As fault handling in multi-processor system-on-chips (MPSoCs) is a major challenge, we have developed an MPSoC demonstrator that enables experimentation on fault injection and fault handling. Our MPSoC demonstrator consists of (1) an MPSoC model with a set of components (devices) each equipped with fault detection features, so called instruments, (2) an Instrument Access Infrastructure (IAI) based on IEEE P1687 that connects the instruments, (3) a Fault Indication and Propagation Infrastructure (FIPI) that propagates fault indications to system-level, (4) a Resource Manager (RM) to schedule jobs based on fault statuses, (5) an Instrument Manager (IM) connecting the IAI and the RM, and (6) a Fault Injection Manager (FIM) that inserts faults. The main goal of the demonstrator is to enable experimentation on different fault handling solutions. The novelty in this particular demonstrator is that it uses the existing test features, i.e. IEEE P1687 infrastructure, to assist fault handling. The demonstrator is implemented and a case study is performed.

National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:liu:diva-80008 (URN)
Conference
ETS12
Available from: 2012-08-17 Created: 2012-08-17 Last updated: 2018-01-12
Nikolov, D., Ingelsson, U., Singh, V. & Larsson, E. (2011). Level of Confidence Evaluation and Its Usage for Roll-back Recovery with Checkpointing Optimization. In: 5th Workshop on Dependable and Secure Nanocomputing (WSDN 2011), Hong Kong, June 27, 2011: . Paper presented at WSDN11. IEEE
Open this publication in new window or tab >>Level of Confidence Evaluation and Its Usage for Roll-back Recovery with Checkpointing Optimization
2011 (English)In: 5th Workshop on Dependable and Secure Nanocomputing (WSDN 2011), Hong Kong, June 27, 2011, IEEE , 2011Conference paper, Published paper (Refereed)
Abstract [en]

Increasing soft error rates for semiconductor devices manufactured in later technologies enforces the use of fault tolerant techniques such as Roll-back Recovery with Checkpointing (RRC). However, RRC introduces time overhead that increases the completion (execution) time. For non-real-time systems, research have focused on optimizing RRC and shown that it is possible to find the optimal number of checkpoints such that the average execution time is minimal. While minimal average execution time is important, it is for real-time systems important to provide a high probability that deadlines are met. Hence, there is a need of probabilistic guarantees that jobs employing RRC complete before a given deadline. First, we present a mathematical framework for the evaluation of level of confidence, the probability that a given deadline is met, when RRC is employed. Second, we present an optimization method for RRC that finds the number of checkpoints that results in the minimal completion time while the minimal completion time satisfies a given level of confidence requirement. Third, we use the proposed framework to evaluate probabilistic guarantees for RRC optimization in non-real-time systems.

Place, publisher, year, edition, pages
IEEE, 2011
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:liu:diva-80007 (URN)10.1109/DSNW.2011.5958836 (DOI)978-1-4577-0374-4 (ISBN)
Conference
WSDN11
Available from: 2012-08-17 Created: 2012-08-17 Last updated: 2018-01-12
Nikolov, D., Ingelsson, U., Singh, V. & Larsson, E. (2011). Study on the Level of Confidence for Roll-back Recovery with Checkpointing. In: 1st Intl. Workshop on Dependability Issues in Deep-submicron Technologies (DDT 2011), Trondheim, Norway, May 26-27, 2011. Paper presented at DDT11.
Open this publication in new window or tab >>Study on the Level of Confidence for Roll-back Recovery with Checkpointing
2011 (English)In: 1st Intl. Workshop on Dependability Issues in Deep-submicron Technologies (DDT 2011), Trondheim, Norway, May 26-27, 2011, 2011Conference paper, Oral presentation only (Refereed)
Abstract [en]

Increasing soft error rates for semiconductor devices manufactured in later technologies enforces the use of fault tolerant techniques such as Roll-back Recovery with Checkpointing (RRC). However, RRC introduces time overhead that increases the completion (execution) time. For non-real-time systems, research have focused on optimizing RRC and shown that it is possible to find the optimal number of checkpoints such that the average execution time is minimal. While minimal average execution time is important, it is for real-time systems important to provide a high probability of meeting given deadlines. Hence, there is a need of probabilistic guarantees that jobs employing RRC complete before a given deadline. Therefore, in this paper we present a mathematical framework for the evaluation of level of confidence, the probability that a given deadline is met, when RRC is employed.

National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:liu:diva-80006 (URN)
Conference
DDT11
Available from: 2012-08-17 Created: 2012-08-17 Last updated: 2018-01-12
Nikolov, D., Ingelsson, U., Singh, V. & Larsson, E. (2010). Estimating Error-Probability and Its Application for Optimizing Roll-back Recovery with Checkpointing. In: Proceedings - 5th IEEE International Symposium on Electronic Design, Test and Applications, DELTA 2010: . Paper presented at 5th IEEE International Symposium on Electronic Design, Test and Applications, DELTA 2010; Ho Chi Minh City; Viet Nam (pp. 281-285). IEEE
Open this publication in new window or tab >>Estimating Error-Probability and Its Application for Optimizing Roll-back Recovery with Checkpointing
2010 (English)In: Proceedings - 5th IEEE International Symposium on Electronic Design, Test and Applications, DELTA 2010, IEEE , 2010, p. 281-285Conference paper, Published paper (Refereed)
Abstract [en]

The probability for errors to occur in electronic systems is not known in advance, but depends on many factors including influence from the environment where the system operates. In this paper, it is demonstrated that inaccurate estimates of the error probability lead to loss of performance in a well known fault tolerance technique, Roll-back Recovery with checkpointing (RRC). To regain the lost performance, a method for estimating the error probability along with an adjustment technique are proposed. Using a simulator tool that has been developed to enable experimentation, the proposed method is evaluated and the results show that the proposed method provides useful estimates of the error probability leading to near-optimal performance of the RRC fault-tolerant technique.

Place, publisher, year, edition, pages
IEEE, 2010
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-59611 (URN)10.1109/DELTA.2010.25 (DOI)978-1-4244-6026-7 (ISBN)978-0-7695-3978-2 (ISBN)
Conference
5th IEEE International Symposium on Electronic Design, Test and Applications, DELTA 2010; Ho Chi Minh City; Viet Nam
Note

©2010 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Dimitar Nikolov, Urban Ingelsson, Virendra Singh and Erik Larsson, Estimating Error-Probability and Its Application for Optimizing Roll-back Recovery with Checkpointing, 2010, 5th IEEE Intl. Symposium on Electronic Design, Test & Applications (DELTA 2010), Ho Chi Minh City, Vietnam, January 13-15, 2010, 281-285.

Available from: 2010-09-29 Created: 2010-09-21 Last updated: 2014-10-02Bibliographically approved
Nikolov, D., Karlsson, E., Ingelsson, U., Singh, V. & Larsson, E. (2010). Mapping and Scheduling of Jobs in Homogeneous NoC-based MPSoC. In: Swedish SoC Conference 2010, Kolmården, Sweden, May 3-4, 2010 (not reviewed, not printed).
Open this publication in new window or tab >>Mapping and Scheduling of Jobs in Homogeneous NoC-based MPSoC
Show others...
2010 (English)In: Swedish SoC Conference 2010, Kolmården, Sweden, May 3-4, 2010 (not reviewed, not printed), 2010Conference paper, Published paper (Other academic)
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-59603 (URN)
Available from: 2010-09-21 Created: 2010-09-21 Last updated: 2010-10-29
Nikolov, D., Ingelsson, U., Singh, V. & Larsson, E. (2010). On-line Techniques to Adjust and Optimize Checkpointing Frequency. In: IEEE International Workshop on Realiability Aware System Design and Test (RASDAT 2010), Bangalore, India, January 7-8, 2010 (pp. 29-33).
Open this publication in new window or tab >>On-line Techniques to Adjust and Optimize Checkpointing Frequency
2010 (English)In: IEEE International Workshop on Realiability Aware System Design and Test (RASDAT 2010), Bangalore, India, January 7-8, 2010, 2010, p. 29-33Conference paper, Published paper (Refereed)
Abstract [en]

Due to increased susceptibility to soft errors in recent semiconductor technologies, techniques for detecting and recovering from errors are required. Roll-back Recovery with Checkpointing (RRC) is one well known technique that copes with soft errors by taking and storing checkpoints during execution of a job. Employing this technique, increases the average execution time (AET), i.e. the expected time for a job to complete, and thus impacts performance. To minimize the AET, the checkpointing frequency is to be optimized. However, it has been shown that optimal checkpointing frequency depends highly on error probability. Since error probability cannot be known in advance and can change during time, the optimal checkpointing frequency cannot be known at design time. In this paper we present techniques that are adjusting the checkpointing frequency on-line (during operation) with the goal to reduce the AET of a job. A set of experiments have been performed to demonstrate the benefits of the proposed techniques. The results have shown that these techniques adjust the checkpointing frequency so well that the resulting AET is close to the theoretical optimum.

National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-59610 (URN)
Note
©2010 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Dimitar Nikolov, Urban Ingelsson, Virendra Singh and Erik Larsson, On-line Techniques to Adjust and Optimize Checkpointing Frequency, 2010, IEEE International Workshop on Realiability Aware System Design and Test (RASDAT 2010), Bangalore, India, January 7-8, 2010, 29-33. Available from: 2010-09-29 Created: 2010-09-21 Last updated: 2010-09-29Bibliographically approved
Nikolov, D., Väyrynen, M., Ingelsson, U., Larsson, E. & Singh, V. (2010). Optimizing Fault Tolerance for Multi-Processor System-on-Chip. In: Raimund Ubar, Jaan Raik, Heinrich Theodor Vierhaus (Ed.), Design and Test Technology for Dependable Systems-on-chip (pp. 578). Information Science Publishing
Open this publication in new window or tab >>Optimizing Fault Tolerance for Multi-Processor System-on-Chip
Show others...
2010 (English)In: Design and Test Technology for Dependable Systems-on-chip / [ed] Raimund Ubar, Jaan Raik, Heinrich Theodor Vierhaus, Information Science Publishing , 2010, p. 578-Chapter in book (Other academic)
Abstract [en]

Designing reliable and dependable embedded systems has become increasingly important as the failure of these systems in an automotive, aerospace or nuclear application can have serious consequences.

Design and Test Technology for Dependable Systems-on-Chip covers aspects of system design and efficient modelling, and also introduces various fault models and fault mechanisms associated with digital circuits integrated into System on Chip (SoC), Multi-Processor System-on Chip (MPSoC) or Network on Chip (NoC). This book provides insight into refined "classical" design and test topics and solutions for IC test technology and fault-tolerant systems.

Place, publisher, year, edition, pages
Information Science Publishing, 2010
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-63306 (URN)160-9602-12-9 (ISBN)978-1609-6021-2-3 (ISBN)
Available from: 2010-12-15 Created: 2010-12-15 Last updated: 2013-04-22Bibliographically approved
Organisations

Search in DiVA

Show all publications