FlexRay will very likely become the de-facto standard for in-vehicle communications. Its main advantage is the combination of high speed static and dynamic transmission of messages. In our previous work we have shown that not only the static but also the dynamic segment can be used for hard-real time communication in a deterministic manner. In this paper, we propose techniques for optimising the FlexRay bus access mechanism of a distributed system, so that the hard real-time deadlines are met for all the tasks and messages in the system. We have evaluated the proposed techniques using extensive experiments.
We present an approach to the analysis and optimization of heterogeneous distributed embedded systems. The systems are heterogeneous not only in terms of hardware components, but also in terms of communication protocols and scheduling policies. When several scheduling policies share a resource, they are organized in a hierarchy. In this paper, we address design problems that are characteristic to such hierarchically scheduled systems: assignment of scheduling policies to tasks, mapping of tasks to hardware components, and the scheduling of the activities. We present algorithms for solving these problems. Our heuristics are able to find schedulable implementations under limited resources, achieving an efficient utilization of the system. The developed algorithms are evaluated using extensive experiments and a real-life example.
FlexRay is a communication protocol heavily promoted on the market by a large group of car manufacturers and automotive electronics suppliers. However, before it can be successfully used for safety-critical applications that require predictability, timing analysis techniques are necessary for providing bounds for the message communication times. In this paper, we propose techniques for determining the timing properties of messages transmitted in both the static and the dynamic segments of a FlexRay communication cycle. The analysis techniques for messages are integrated in the context of a holistic schedulability analysis that computes the worst-case response times of all the tasks and messages in the system. We have evaluated the proposed analysis techniques using extensive experiments. We also present and evaluate three optimisation algorithms that can be used to improve the schedulability of a system that uses FlexRay. © 2007 Springer Science+Business Media, LLC.
FlexRay will very likely become the de-facto standard for in-vehicle communications. However, before it can be successfully used for safety-critical applications that require predictability, timing analysis techniques are necessary for providing bounds for the message communication times. In this paper, we propose techniques for determining the timing properties of messages transmitted in both the static (ST) and the dynamic (DYN) segments of a FlexRay communication cycle. The analysis techniques for messages are integrated in the context of a holistic schedulability analysis that computes the worst-case response times of all the tasks and messages in the system. We have evaluated the proposed analysis techniques using extensive experiments.
In this paper, we propose a SOC (system-on-chip) test scheduling technique that minimizes the test application time while considering test power limitations and test conflicts. The test power consumption is important to consider since exceeding the system's power limit might damage the system. Our technique takes also into account test conflicts that are due to cross-core testing (testing of interconnections), unit testing with multiple test sets, hierarchical SOCs where cores are embedded in cores, and the sharing of test access mechanism (TAM). Our technique handles these conflicts as well as precedence constraints, which is the order in which the tests has to be applied. We have implemented our algorithm and performed experiments, which shows the efficiency of our approach.
Test application time and core accessibility are two major issues in System-On-Chip (SOC) testing. The test application time must be minimised, and a test access mechanism (TAM) must be developed to transport test data to and from the cores. In this paper we present an approach to design a test interface (wrapper) at core level taking into account the P1500 restrictions, and to design a TAM architecture and its associated test schedule using a fast and efficient heuristic. A useful and new feature of our approach is that it supports also the testing of interconnections while considering power dissipation, test conflicts and precedence constraints. Another feature of our approach is that the TAM is designed with a central bus architecture, which is a generalisation of the TestBus architecture. The advantages and drawbacks of our approach are discussed, and the proposed architecture and heuristic are validated with experiments.
We present a constraint logic programming (CLP) approach for synthesis of fault-tolerant hard real-time applications on distributed heterogeneous architectures. We address time-triggered systems, where processes and messages are statically scheduled based on schedule tables. We use process re-execution for recovering from multiple transient faults. We propose three scheduling approaches, which each present a trade-off between schedule simplicity and performance, (i) full transparency, (ii) slack sharing and (iii) conditional, and provide various degrees of transparency. We have developed a CLP framework that produces the fault-tolerant schedules, guaranteeing schedulability in the presence of transient faults. We show how the framework can be used to tackle design optimization problems.The proposed approach has been evaluated using extensive experiments.
This paper presents a design optimisation tool for distributed embedded real-time systems that 1) decides mapping, fault-tolerance policy and generates a fault-tolerant schedule, 2) is targeted for hard real-time, 3) has hard reliability goal, 4) generates static schedule for processes and messages, 5) provides fault-tolerance for k transient/soft faults, 6) optimises for minimal energy consumption, while considering impact of lowering voltages on the probability of faults, 7) uses constraint logic programming (CLP) based implementation.
Today's embedded systems are typically exposed to varying load, due to e.g. changing num- ber of tasks and variable task execution times. At the same time, many of the most frequent real-life applications are not characterized by hard real-time constraints and their design goal is not to satisfy certain hard deadlines in the worst case. Moreover, from the user's perspective, achieving a high level of processor utilization is also not a primary goal. What the user needs, is to exploit the available resources (in our case processor time) such that a high level of quality of service (QoS) is delivered. In this paper we propose efficient run-time approaches, able to distribute the processor bandwidth such that the global QoS pro- duced by a set of applications is maximized, in the context in which the processor demand from individual tasks is continuously varying. Extensive experiments demonstrate the efficiency of the proposed approaches.
Today's embedded systems are exposed to variations in load demand due to complex software applications, hardware platforms, and impact of the run-time environments. When these variations are large, and efficiency is required, on-line resource managers may be deployed on the system to help it control its resource usage. An often neglected problem is whether these resource managers are stable, meaning that the resource usage is controlled under all possible scenarios. In this paper we develop mathematical models for the real-time embedded system and we derive conditions which, if satisfied, lead to stable systems. For the developed system models, we also determine bounds on the worst case response times of tasks.
Today’s embedded systems are exposed to variations in load demand due to complex software applications, dynamic hardware platforms, and the impact of the run-time environment. When these variations are large, and efficiency is required, adaptive on-line resource managers may be deployed on the system to control its resource usage. An often neglected problem is whether these resource managers are stable, meaning that the resource usage is controlled under all possible scenarios. In this paper we develop mathematical models for real-time embedded systems and we derive conditions which, if satisfied, lead to stable systems. For the developed system models, we also determine bounds on the worst case response times of tasks. We also give an intuition of what stability means in a real-time context and we show how it can be applied for several resource managers. We also discuss how our results can be extended in various ways.
Being predictable with respect to time is, by definition, a fundamental requirement for any real-time system. Modern multiprocessor systems impose a challenge in this context, due to resource sharing conflicts causing memory transfers to become unpredictable. In this thesis, we present a framework for achieving predictability for real-time applications running on multiprocessor system-on-chip platforms. Using a TDMA bus, worst-case execution time analysis and scheduling are done simultaneously. Since the worst-case execution times are directly dependent on the bus schedule, bus access design is of special importance. Therefore, we provide an efficient algorithm for generating bus schedules, resulting in a minimized worst-case global delay.
We also present a new approach considering the average-case execution time in a predictable context. Optimization techniques for improving the average-case execution time of tasks, for which predictability with respect to time is not required, have been investigated for a long time in many different contexts. However, this has traditionally been done without paying attention to the worst-case execution time. For predictable real-time applications, on the other hand, the focus has been solely on worst-case execution time optimization, ignoring how this affects the execution time in the average case. In this thesis, we show that having a good average-case global delay can be important also for real-time applications, for which predictability is required. Furthermore, for real-time applications running on multiprocessor systems-on-chip, we present a technique for optimizing for the average case and the worst case simultaneously, allowing for a good average case execution time while still keeping the worst case as small as possible. The proposed solutions in this thesis have been validated by extensive experiments. The results demonstrate the efficiency and importance of the presented techniques.
In multiprocessor systems, the traffic on the bus does not solely originate from data transfers due to data dependencies between tasks, but is also affected by memory transfers as result of cache misses. This has a huge impact on worst-case execution time (WCET) analysis and, in general, on the predictability of real-time applications implemented on such systems. As opposed to the WCET analysis performed for a single processor system, where the cache miss penalty is considered constant, in a multiprocessor system each cache miss has a variable penalty, depending on the bus contention. This affects the tasks' WCET which, however, is needed in order to perform system scheduling. At the same time, the WCET depends on the system schedule due to the bus interference. In this paper we present an approach to worst-case execution time analysis and system scheduling for real-time applications implemented on multiprocessor SoC architectures. The emphasis of this paper is on the bus scheduling policy and its optimization, which is of huge importance for the performance of such a predictable multiprocessor application.
Worst-case execution time analysis is the fundament of real-time system design, and is therefore an area which has been subject to great scientific interest for a long time. However, traditional worst-case execution time analysis techniques assume that the underlying hardware is a monoprocessor system, and this class of hardware platforms is getting less suitable for modern embedded applications, which demand more and more in terms of computational power. For multiprocessor systems, traditional worst-case analysis tools do not produce correct results and can consequently not be used. To solve this problem, we have previously proposed a technique for achieving predictability on multiprocessor systems-on-chip using a shared TDMA bus. One of the main benefits with our approach is that existing, traditional worstcase execution time analysis techniques can, after some small modifications, be applied. In this paper, we describe the nature of these modifications and how to handle different types of multiprocessor architectures.
Optimization techniques for improving the average-case execution time of an application, for which predictability with respect to time is not required, have been investigated for a long time in many different contexts. However, this has traditionally been done without paying attention to the worst-case execution time. For predictable real-time applications, on the other hand, the focus has been solely on worst-case execution time optimization, ignoring how this affects the execution time in the average case. In this paper, we show that having a good average-case delay can be important also for real-time applications for which predictability is required. Furthermore, for real-time applications running on multiprocessor systems-on-chip, we present a technique for optimizing the average case and the worst case simultaneously, allowing for a good average-case execution time while still keeping the worst case as small as possible.
This paper proposes a novel approach to solve the allocation and scheduling problems for variable voltage/frequency multiprocessor systems-on-chip, which minimizes overall system energy dissipation. The optimality of derived system configurations is guaranteed, while the computation efficiency of the optimizer allows for solving problem instances that were traditionally considered beyond reach for exact solvers (optimality gap). Furthermore, this paper illustrates the development- and run-time software infrastructures that assist the user in developing applications and implementing optimizer solutions. The proposed approach guarantees a high level of power, performance, and constraint satisfaction predictability as from validation on the target platform, thus bridging the abstraction gap.
Most problems addressed by the software optimization flow for multi-processor systems-on-chip (MPSoCs) are NP-complete, and have been traditionally tackled by means of heuristics and highlevel approximations. Complete approaches have been effectively deployed only under unrealistic simplifying assumptions. We propose a novel methodology to formulate and solve to optimality the allocation, scheduling and discrete voltage selection problem for variable voltage/frequency MPSoCs, minimizing the system energy dissipation and the overhead for frequency switching. We integrate the optimization and validation steps to increase the accuracy of cost models and the confidence in quality of results. Two demonstrators are used to show the viability of the proposed methodology.
In this paper, we propose a design framework for distributed embedded control systems that ensures reliable execution and high quality of control even if some computation nodes fail. When a node fails, the configuration of the underlying distributed system changes and the system must adapt to this new situation by activating tasks at operational nodes. The task mapping as well as schedules and control laws that are customized for the new configuration influence the control quality and must, therefore, be optimized. The number of possible configurations due to faults is exponential in the number of nodes in the system. This design-space complexity leads to unaffordable design time and large memory requirements to store information related to mappings, schedules, and controllers. We demonstrate that it is sufficient to synthesize solutions for a small number of base and minimal configurations to achieve fault tolerance with an inherent minimum level of control quality. We also propose an algorithm to further improve control quality with a priority-based search of the set of configurations and trade-offs between task migration and replication.
Many embedded control systems comprise several control loops that are closed over a network of computation nodes. In such systems, complex timing behavior and communication lead to delay and jitter, which both degrade the performance of each control loop and must be considered during the controller synthesis. Also, the control performance should be taken into account during system scheduling. The contribution of this paper is a control-scheduling co-design method that integrates controller design with both static and priority-based scheduling of the tasks and messages, and in which the overall control performance is optimized.
FlexRay is a popular communication protocol in modern automotive systems with several computation nodes and communication units. The complex temporal behavior of such systems depends highly on the FlexRay configuration and influences the performance of running control applications. In our previous work, we presented a design framework for integrated scheduling and design of embedded control applications, where control quality is the optimization objective. This paper presents our extension to the design framework to handle FlexRay-based embedded control systems. Our contribution is a method for the decision of FlexRay parameters and optimization of control quality.
At runtime, an embedded control system can switch between alternative functional modes. In each mode, the system operates by using a schedule and controllers that exploit the available computation and communication resources to optimize the control performance in the running mode. The number of modes is usually exponential in the number of control loops, which means that all controllers and schedules cannot be produced in affordable design-time and stored in memory. This paper addresses synthesis of multi-mode embedded control systems. Our contribution is a method that trades control quality with optimization time, and that efficiently selects the schedules and controllers to be synthesized and stored in memory.
Time-triggered periodic control implementations are over provisioned for many execution scenarios in which the states of the controlled plants are close to equilibrium. To address this inefficient use of computation resources, researchers have proposed self-triggered control approaches in which the control task computes its execution deadline at runtime based on the state and dynamical properties of the controlled plant. The potential advantages of this control approach cannot, however, be achieved without adequate online resource-management policies. This paper addresses scheduling of multiple self-triggered control tasks that execute on a uniprocessor platform, where the optimization objective is to find tradeoffs between the control performance and CPU usage of all control tasks. Our experimental results show that efficiency in terms of control performance and reduced CPU usage can be achieved with the heuristic proposed in this paper.
Concurrent testing of the cores in a modular core-based System-on-Chip reduces the test application time but increases the test power consumption. Power models and scheduling algorithms have been proposed to schedule the tests as concurrently as possible while respecting the power budget. The commonly used global peak power model, with a single value capturing the power dissipated by a core when tested, is pessimistic but simple for a scheduling algorithm to handle. In this paper, we propose a cycle-accurate power model with a power value per clock cycle and a corresponding scheduling algorithm. The model takes into account the switching activity in the scan chains caused by both the test stimuli and the test responses during scan-in, launch-and-capture, and scan-out. Further, we allow a unique power model per wrapper chain configuration as the activity in a core will be different depending on the number of wrapper chains at a core. Extensive experiments on ITC'02 benchmarks and an industrial design show that the testing time can be substantially reduced (on average 16.5% reduction) by using the proposed cycle-accurate test power model.
In this paper, we propose a simulation-based methodology for worst-case response time estimation of distributed real-time systems. Schedulability analysis produces pessimistic upper bounds on process response times. Consequently, such an analysis can lead to overdesigned systems resulting in unnecessarily increased costs. Simulations, if well conducted, can lead to tight lower bounds on worst-case response times, which can be an essential input at design time. Moreover, such a simulation methodology is very important in situations when the running application or the underlying platform is such that no formal timing analysis is available. Another important application of the proposed simulation environment is the validation of formal analysis approaches, by estimating their degree of pessimism. We have performed such an estimation of pessimism for two response-time analysis approaches for distributed embedded systems based on two of the most important automotive communication protocols: CAN and FlexRay.
Concurrent testing of the cores in a core-based system- on-chip reduces the test application time but increases the test power consumption. Power models, test architecture design, and scheduling algorithms have been proposed to schedule the tests as concurrently as possible while respecting the power budget. The commonly used global peak power model, with a single value capturing the power dissipated by a core when tested, is simple for a scheduling algorithm to handle but is pessimistic. In this paper, we propose a cycle-accurate power model with a power value per clock cycle and a corresponding test architecture design and scheduling algorithm. The power model takes into account the switching activity in the scan chains caused by both the test stimuli and the expected test responses during scan-in, launch-and-capture, and scan-out. Furthermore, we allow a unique power model per wrapper-chain configuration as the activity in a core will be different depending on the number of wrapper chains at a core. Through circuit simulations on ISCAS'89 benchmarks, we demonstrate a high correlation between the real test power dissipation and our cycle-accurate test power model. Extensive experiments on ITC'02 benchmarks and an industrial design show that the testing time can be reduced substantially by using the proposed cycle-accurate test power model.
FlexRay is an automotive communication protocol that combines the comprehensive time-triggered paradigm with an adaptive phase that is more suitable for event-based communication. We study optimization of average response times by assigning priorities and frame identifiers to tasks and messages. Our optimization approach is based on immune genetic algorithms, where in addition to the crossover and mutation operators, we use a vaccination operator that results in considerable improvements in optimization time and quality.
Multi-mode systems are characterised by a set of interacting operational modes to support different functionalities and standards. In this paper, we present a co-design methodology for multi-mode embedded systems that produces energy-efficient implementations. Based on the key observation that operational modes are executed with different probabilities, i.e., the system spends uneven amounts of time in the different modes, we develop a novel codesign technique that exploits this property to significantly reduce energy dissipation. We conduct several experi-ments, including a smart phone real-life example, that demonstrate the effectiveness of our approach. Reductions in power consumption of up to 64% are reported.
In this paper, we present an efficient two-step iterative synthesis approach for distributed embedded systems containing dynamic voltage scalable processing elements (DVS-PEs), based on genetic algorithms. The approach partitions, schedules, and voltage scales multi-rate specifications given as task graphs with multiple deadlines. A distinguishing feature of the proposed synthesis is the utilisation of a generalised DVS method. In contrast to previous techniques, which simply exploit available slack time, this generalised technique additionally considers the PE power profile during a refined voltage selection to further increase the energy savings. Extensive experiments are conducted to demonstrate the efficiency of the proposed approach. We report up to 43.2% higher energy reductions compared to previous DVS scheduling approaches based on constructive techniques and total energy savings of up to 82.9% for mapping and scheduling optimised DVS systems.
We present an iterative schedule optimisation for multi-rate system specifications, mapped onto heterogeneous distributed architectures containing dynamic voltage scalable processing elements (DVS-PEs).To achieve a high degree of energy reduction, we formulate a generalised DVS problem, taking into account the power variations among the executing tasks. An efficient heuristic is presented that identifies optimised supply voltages by not only "simply" exploiting slack time, but under the additional consideration of the power profiles. Thereby, this algorithm minimises the energy dissipation of heterogeneous architectures, including power managed processing elements, effectively. Further, we address the simultaneous schedule optimisation towards timing behaviour and DVS utilisation by integrating the proposed DVS heuristic into a genetic list scheduling approach. We investigate and analyse the possible energy reduction at both steps of the co-synthesis (voltage scaling and scheduling), including the power variations effects. Extensive experiments indicate that the presented work produces solutions with high quality.
In this paper, we introduce the LOPOCOS (Low Power Co-synthesis) system, a prototype CAD tool for system level co-design. LOPOCOS targets the design of energy-efficient embedded systems implemented as heterogeneous distributed architectures. In particular, it is designed to solve the specific problems involved in architectures that includedynamic voltage scalable (DVS) processors. The aim of this paper is to demonstrate how LOPOCOS can support the system designer in identifying energy-efficient hardware/software implementations for the desired embedded systems. Hence, highlighting the necessary optimization steps during design space exploration for DVS enable architectures. The optimization steps carried out in LOPOCOS involve component allocation and task/communication mapping as well as scheduling and dynamic voltage scaling. LOPOCOS has the following key features, which contribute to this energy efficiency. During the voltage scaling valuable power profile information of task execution is taken into account, hence, the accuracy of the energy estimation is improved. A combined optimization for scheduling and communication mapping based on genetic algorithm, optimizes simultaneously execution order and communication mapping towards the utilization of the DVS processors and timing behaviour. Furthermore, a separation of task and communication mapping allows a more effective implementation of both task and communication mapping optimization steps. Extensive experiments are conducted to demonstrate the efficiency of LOPOCOS. We report up to 38% higher energy reductions compared to previous co-synthesis techniques for DVS systems. The investigations include a real-life example of an optical flow detection algorithm.
By shrinking feature sizes, deep-submicron technology is enabling the design of systems with increased complexity on a single chip, but it is also introducing a productivity design gap. Additionally, system designers have to cope with an ever-increasing application complexity and shrinking time-to-market windows. Design re-use and system-level co-synthesis are two approaches that are being employed to bridge the design gap and to aid system designers. Power consumption has become one of the main barriers in embedded computing systems design and therefore, methodologies and techniques that provide power-aware hardware/software co-design are necessary. System-Level Design Techniques for Energy-Efficient Embedded Systems addresses the development and validation of co-synthesis techniques that allow an effective design of embedded systems with low energy dissipation. The book provides an overview of a system-level co-design flow, illustrating through examples how system performance is influenced at various steps of the flow including allocation, mapping, and scheduling. The book places special emphasis upon system-level co-synthesis techniques for architectures that contain voltage scalable processors, which can dynamically trade off between computational performance and power consumption. Throughout the book, the introduced co-synthesis techniques, which target both single-mode systems and emerging multi-mode applications, are applied to numerous benchmarks and real-life examples including a realistic smart phone. System-Level Design Techniques for Energy-Efficient Embedded Systems will be of interest to advanced undergraduates, graduate students, and designers, whom are interested in energy-efficient embedded systems design.
In this paper, we present a novel co-design methodology for the synthesis of energy-efficient embedded systems. In particular, we concentrate on distributed embedded systems that accommodate several different applications within a single device, i.e., multimode embedded systems. Based on the key observation that operational modes are executed with different probabilities, that is, the system spends uneven amounts of time in the different modes, we develop a new co-design technique that exploits this property to significantly reduce energy dissipation. Energy and cost savings are achieved through a suitable synthesis process that yields better hardware-resource-sharing opportunities. We conduct several experiments, including a realistic smart phone example, that demonstrate the effectiveness of our approach. Reductions in power consumption of up to 64% are reported.
This paper describes an environment for internetbased collaboration in the field of design and test of digital systems. Automatic Test Pattern Generation (ATPG) and fault simulation tools at behavioral, logical and hierarchical levels available at geographically different places running under the virtual environment using the MOSCITO system are presented. The interfaces between the integrated tools and also commercial design tools were developed. The tools can be used separately, or in multiple applications in different design and test flows. The functionality of the integrated design and test system was verified in several collaborative experiments over internet by partners locating in different geographical sites.
The paper describes the results of the COPERNICUS europroject JEP-97-7133 VILAB (Virtual LABoratory) obtained in a Internet-based joint activities of high-level design and hierarchical test generation of digital systems. Different CAD tools at geographically different places running under virtual environment were used for joint research purposes. The interfaces and convertors between the integrated tools were developed during the project. The tools can be used separately over Internet, or in multiple applications in different complex flows. The functionality of the virtual laboratory in a collaborative research on HW/SW codesign, high-level synthesis and test generation was tested and is described in the paper.
The design process for automotive electronics isan iterative process, where new components and distributedapplications are added over several design cycles incrementally.Hence, at each design iteration an existing communicationschedule is extended by new messages that have to be scheduledappropriately. In this paper, the goal has been to synthesizeschedules under real-time constraints for the dynamic segmentof FlexRay with respect to the 64-cycle protocol specification. Wepropose a flexible scheduling framework to generate all feasibleschedules for a set of messages satisfying real-time and protocolconstraints. Further, we present an optimization procedure toretain schedules according to suitable design metrics. Eventhough the size of the possible design space is exponential inthe number of messages, our proposed method keeps down theschedule synthesis time to practically acceptable values as shownin the experiments.
FlexRay has emerged as the de-facto next generation in-vehicle communication protocol. Messages are scheduled incrementally on FlexRay according to the automotive design paradigm where new applications are added iteratively. On this account, the schedules must be (i) sustainable, i.e., when messages are added in later iterations, they must preserve deadline guarantees of existing messages and (ii) extensible, i.e., they must accommodate future messages without changes to existing schedules. Unfortunately, traditionally used metrics of sustainability and extensibility for timing and schedulability analysis are generic and can not be trivially adapted to FlexRay schedules. This is because of platform-specific properties of FlexRay like being a hybrid paradigm, where both time-triggered and event-triggered segments are used for communication. In this paper, we first introduce new notions of sustainability and extensibility for FlexRay that capture protocol-specific properties and then present novel metrics to quantify sustainable and extensible schedules. We demonstrate the applicability of our results with industrial-size case studies and show that our proposed metrics may be visually represented allowing easy interpretation by system designers in the automotive industry.
[No abstract available]
This paper addresses Test Application Time (TAT)reduction for core-based 3D Stacked ICs (SICs). Applyingtraditional test scheduling methods used for non-stacked chiptesting where the same test schedule is applied both at wafer testand at final test to SICs, leads to unnecessarily high TAT. This isbecause the final test of 3D-SICs includes the testing of all thestacked chips. A key challenge in 3D-SIC testing is to reduce TATby co-optimizing the wafer test and the final test while meetingpower constraints. We consider a system of chips with coresequipped with dedicated Built-In-Self-Test (BIST)-engines andpropose a test scheduling approach to reduce TAT while meetingthe power constraints. Depending on the test schedule, the controllines that are required for BIST can be shared among severalBIST engines. This is taken into account in the test schedulingapproach and experiments show significant savings in TAT.
This paper addresses Test Application Time (TAT) reduction under power constraints for core-based 3D Stacked ICs (SICs) connected by Through Silicon Vias (TSVs). Unlike non-stacked chips, where the test flow is well defined by applying the same test schedule both at wafer sort and at package test, the test flow for 3D TSV-SICs is yet undefined. In this paper we present a cost model to find the optimal test flow. For the optimal test flow, we propose test scheduling algorithms that take the particulars of 3D TSV-SICs into account. A key challenge in testing 3D TSV-SICs is to reduce the TAT by co-optimizing the wafer sort and the package test while meeting power constraints. We consider a system of chips with cores that are accessed through an on-chip JTAG infrastructure and propose a test scheduling approach to reduce TAT while considering resource conflicts and meeting the power constraints. Depending on the test schedule, the JTAG interconnect lines that are required can be shared to test several cores. This is taken into account in experiments with an implementation of the proposed scheduling approach. The results show significant savings in TAT.
In this paper we have proposed a test cost model for core-based 3D Stacked ICs (SICs) connected by Through Silicon Vias (TSVs). Unlike in the case of non-stacked chips, where the test flow is well defined by applying the same test schedule both at wafer sort and at package test, the most cost-efficient test flow for 3D TSV-SICs is yet undefined. Therefore, analysing the various alternatives of test flow, we present a cost model with the optimal test flow. In the test flow alternatives, we analyse the effect of all possible moments of testing for a 3D TSV-SIC, viz., wafer sort, intermediate test and package test. For the optimal test flow, we have performed experiments with various varying yield and test time parameters, which further support our claim.
Test planning for core-based 3D stacked ICs with trough-silicon vias (3D TSV-SIC) is different from test planning for non-stacked ICs as the same test schedule cannot be applied both at wafer sort and package test. In this paper, we assume a test flow where each chip is tested individually at wafer sort and jointly at package test. We define cost functions and test planning optimization algorithms for non-stacked ICs and 3D TSV-SICs with two chips in the stack. We have implemented our techniques and experiments show significant reduction of test cost.
Test planning for core-based 3D stacked ICs under power constraint is different from test planning for non-stacked ICs as the same test schedule cannot be applied both at wafer sort and package test. In this paper, we assume a test flow where each chip is tested individually at wafer sort and jointly at package test. We define cost functions and test planning optimization algorithms for non-stacked ICs, 3D SICs with two chips and 3D SICs with an arbitrary number of chips. We motivate the problem by demostrating the trade-off between test time and hardware, within a power constraint, while arriving at the minimal cost.
Test planning for core-based 3D stacked ICs with trough-silicon vias (3D TSV-SIC) is different from test planning for non-stacked ICs as the same test schedule cannot be applied both at wafer sort and package test. In this paper, we assume a test flow where each chip is tested individually at wafer sort and jointly at package test. We define cost functions and test planning optimization algorithms for non-stacked ICs, 3D TSVSICs with two chips and 3D TSV-SICs with an arbitrary number of chips. We have implemented our techniques and experiments show significant reduction of test cost.
This paper addresses Test Application Time (TAT) reduction for core-based 3D Stacked ICs (SICs). Applying traditional test scheduling methods used for non-stacked chip testing where the same test schedule is applied both at wafer test and at final test to SICs, leads to unnecessarily high TAT. This is because the final test of 3D-SICs includes the testing of all the stacked chips. A key challenge in 3D-SIC testing is to reduce TAT by co-optimizing the wafer test and the final test while meeting power constraints. We consider a system of chips with cores equipped with dedicated Built-In-Self-Test (BIST)-engines and propose a test scheduling approach to reduce TAT while meeting the power constraints. Depending on the test schedule, the control lines that are required for BIST can be shared among several BIST engines. This is taken into account in the test scheduling approach and experiments show significant savings in TAT.