INSTITUTIONEN FÖR SYSTEMTEKNIK

UNDERSTANDING SUB-THRESHOLD SOURCE COUPLED LOGIC FOR ULTRA-LOW POWER APPLICATION

EXAMENSARBETE UTFÖRT I ELEKTRONIKSYSTEM
VID TEKNISKA HÖGSKOLAN I LINKÖPING

AV

SAJIB ROY AND MD. MURAD KABIR NIPUN

LiTH-ISY-EX--11/4465--SE

Linköping, May 24, 2011

Handledare: J Jacob Wikner
ISY, Linköpings universitet

Examinator J Jacob Wikner
ISY, Linköpings universitet
Understanding Sub-threshold source coupled logic for ultra-low power application.

Sajib Roy and Md. Murad Kabir Nipun

Abstract

This thesis work primarily focuses on the applicability of sub-threshold source coupled logic (STSCL) for building digital circuits and systems that run at very low voltage and promise to provide desirable performance with excellent energy savings. Sectors like bio-engineering and smart sensors require the energy consumption to be effectively very low for long battery life. Alongside meeting the ultra-low power specification, the system must also be reliable, robust, and perform well under harsh conditions. In this thesis work, logic gates are designed and analyzed, using STSCL. These gates are further used for implementation of digital subsystems in small-sized smart dust sensors which would operate at very low supply voltages and consume extremely low power.

For understanding the performance of STSCL with respect to ultra-low power and energy; a seven-stage ring oscillator, a 4-by-4 array multiplier, a fifth-order FIR filter and finally a fifty-fifth-order FIR filter were designed. The subcircuits and systems have been simulated for different supply voltages, scaling down to 0.2 V, at different temperature values (-20°C and 70°C) in both 45 nm and 65 nm process technologies. The chosen architectures for the FIR filters and array multiplier were conventional and essentially taken from traditional CMOS-based designs.

The simulated results are studied, analyzed and compared with same CMOS-based digital circuits. The results show on the advantage of STSCL-based digital systems over CMOS. Simulation results provide an energy consumption of 1.1388 nJ for a fifty-fifth-order FIR filter, at low temperatures (-20°C), using STSCL logic, which is comparatively less than for the corresponding CMOS logic implementation.

Keywords

STSCL, CMOS-CVL, PDP, FIR, DFF, NCL, slew rate.
Authors' information

Author 1: Sajib Roy
Sajib Roy took his Bachelor degree in Electrical and Electronic Engineering from East West University, Bangladesh in 2009. Since then he has worked six months as a lecturer at the Primeasia University Bangladesh before taking a leave of absence for the completion of his Master degree at Linköping University, Sweden.

Author 2: Md. Murad Kabir Nipun
Md. Murad Kabir Nipun took his Bachelor degree in Electrical and Electronic Engineering from American International University-Bangladesh in 2008. Since then he has worked as a project engineer in a reputed telecommunication company in Bangladesh. Currently, he is on the verge of completing his Master degree from Linköping University, Sweden.

Acknowledgment
The authors would like to thank their supervisor Dr. J Jacob Wikner for his contribution on the completion of the work. The authors would also like to thank their parents and friends for their constant support throughout the time of thesis. And last, but not the least, the authors would like to thank the whole mixed signal group.
Abstract

This thesis work primarily focuses on the applicability of sub-threshold source coupled logic (STSCL) for building digital circuits and systems that run at very low voltage and promise to provide desirable performance with excellent energy savings. Sectors like bio-engineering and smart sensors require the energy consumption to be effectively very low for long battery life. Alongside meeting the ultra-low power specification, the system must also be reliable, robust, and perform well under harsh conditions. In this thesis work, logic gates are designed and analyzed, using STSCL. These gates are further used for implementation of digital subsystems in small-sized smart dust sensors which would operate at very low supply voltages and consume extremely low power.

For understanding the performance of STSCL with respect to ultra-low power and energy, a seven-stage ring oscillator, a 4-by-4 array multiplier, a fifth-order FIR filter and finally a fifty-fifth-order FIR filter were designed. The subcircuits and systems have been simulated for different supply voltages, scaling down to 0.2 V, at different temperature values (−20°C and 70°C) in both 45 nm and 65 nm process technologies. The chosen architectures for the FIR filters and array multiplier were conventional and essentially taken from traditional CMOS-based designs.

The simulated results are studied, analyzed and compared with same CMOS-based digital circuits. The results show on the advantage of STSCL-based digital systems over CMOS. Simulation results provide an energy consumption of 1.1388 nJ for a fifty-fifth-order FIR filter, at low temperatures (−20°C), using STSCL logic, which is comparatively less than for the corresponding CMOS logic implementation.
# Table of Contents

1. Introduction .............................................................................................................................. 15  
   1.1. Motivation for sub-threshold operation .............................................................................. 15  
   1.2. Overview ............................................................................................................................. 16  
   1.3. Thesis Organization ............................................................................................................. 18  

2. Low power electronics at nano-meter process technology ....................................................... 21  
   2.1. Sources and consideration of power consumption in CMOS logic ................................. 22  
      2.1.1. Dynamic power consumption ....................................................................................... 22  
      2.1.2. Static power consumption ............................................................................................ 22  
   2.2. Power-energy reduction techniques in CMOS logic ......................................................... 22  

3. Wireless Sensor networks design and Development ................................................................. 27  
   3.1. Design concerns in sensor module development ................................................................. 28  
   3.2. Smart dust sensors: A look at current architecture and methodology ............................. 28  
      3.2.1. Example of smart dust architectures .......................................................................... 28  
      3.2.2. Communication methodologies .................................................................................. 29  
      3.2.3. Applications of smart dust ......................................................................................... 32  

4. Leakage in CMOS technology .................................................................................................. 33  
   4.1. Gate leakage ....................................................................................................................... 33  
   4.2. Gate leakage reduction scheme ......................................................................................... 36  
   4.3. Sub-threshold leakage ........................................................................................................ 38  
   4.4. Sub-threshold leakage reduction schemes ......................................................................... 38  
   4.5. Leakage model .................................................................................................................. 39  

5. Source coupled logic .................................................................................................................. 41  
   5.1. Fundamentals of source coupled logic (SCL) ..................................................................... 43  
   5.2. Sub-threshold source coupled logic (STSCL) .................................................................. 45  
   5.3. STSCL for leakage reduction .............................................................................................. 50  
   5.4. Ring oscillator operation .................................................................................................... 53  

6. Sub-threshold source coupled logic .......................................................................................... 55  
   6.1. STSCL logic gates .............................................................................................................. 55  
   6.2. Responses for the STSCL logic gates .................................................................................. 57  
   6.3. Implemented digital circuits ............................................................................................... 60  
      6.3.1. Full adder ....................................................................................................................... 60  
      6.3.2. Digital filter .................................................................................................................... 61  
   6.4. Performance measurement parameters ............................................................................ 64  
      6.4.1. Power consumption ....................................................................................................... 64  
      6.4.2. Power delay product (PDP) ......................................................................................... 64  
      6.4.3. Slew rate (SR) .............................................................................................................. 66  
   6.5. STSCL digital circuits ......................................................................................................... 66  
   6.6. Simulation and analysis of the systems ............................................................................. 69  

7. Conclusions ............................................................................................................................... 75  

8. Future work ............................................................................................................................... 77  

9. References ............................................................................................................................... 79
List of Tables

Table 1: Power consumption comparison for seven-stage ring oscillators for 65 nm process..................54
Table 2: Performance comparison of an STSCL XOR gate at different temperatures..........................57
Table 3: Performance comparison of an STSCL AND gate at different temperatures..........................57
Table 4: Performance comparison of an STSCL OR gate at different temperatures.........................59
Table 5: Performance comparison of an STSCL D-flip-flop at different temperatures.......................59
Table 6: Full adder truth table...........................................................................................................61
Table 7: Output specification for fifth-order FIR filter ........................................................................71
Table 8: Output specification for fifth-order FIR filter ........................................................................72
Table 9: Power consumption and PDP comparison for fifth-order FIR filter ........................................72
Table 10: Power consumption and PDP comparison for 4-by-4 array multiplier....................................72
Table 11: Comparison of STSCL with other logic gates ....................................................................76
## List of Figures

<table>
<thead>
<tr>
<th>Figure</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Schematic of an STSCL inverter.</td>
</tr>
<tr>
<td>2</td>
<td>DC response of the STSCL inverter for varying NMOS width at $V = 0.2$ V.</td>
</tr>
<tr>
<td>3</td>
<td>Voltage-transfer characteristics of the STSCL inverter for 65 nm process technology at $V = 0.5$ V.</td>
</tr>
<tr>
<td>4</td>
<td>Voltage-transfer characteristics of the STSCL inverter for 45 nm process technology at $V = 0.5$ V.</td>
</tr>
<tr>
<td>5</td>
<td>Traditional design flow where hardware and software are implemented independently.</td>
</tr>
<tr>
<td>6</td>
<td>Example of hardware-software co-design.</td>
</tr>
<tr>
<td>7</td>
<td>Utilizing parallelism, if voltage is reduced for the top block diagram then the path will produce output in seconds, with duplicating the modules with lower supply, outputs will be produced parallel in every seconds which are equivalent to every seconds, similar to the initial value before reducing the supply voltage [18].</td>
</tr>
<tr>
<td>8</td>
<td>Data path equalization [18].</td>
</tr>
<tr>
<td>9</td>
<td>A basic architecture of a Smart Dust sensor node.</td>
</tr>
<tr>
<td>10</td>
<td>Smart Dust mote (SMART DUST autonomous sensing and communication in a cubic millimeter: PI: Kris Pister Co-investigators: Joe Kahn, Bernhard Boser. Subcontract: Steve Morris, MLB Co. Supported by the DARPA/MTO MEMS program)</td>
</tr>
<tr>
<td>11</td>
<td>Components of Smart Dust.</td>
</tr>
<tr>
<td>12</td>
<td>RF communication technique of Smart Dust.</td>
</tr>
<tr>
<td>13</td>
<td>Optical communication technique of Smart Dust.</td>
</tr>
<tr>
<td>14</td>
<td>NMOS section indicating the gate leakage components.</td>
</tr>
<tr>
<td>15</td>
<td>Gate leakage current for high threshold NMOS device, 45 nm (left) and 65 nm (right).</td>
</tr>
<tr>
<td>16</td>
<td>Gate leakage current for low threshold NMOS device, 45 nm (left) and 65 nm (right).</td>
</tr>
<tr>
<td>17</td>
<td>Sub-threshold leakage current for low (top portion) and high (bottom portion) threshold NMOS device, 45 nm (left), and 65 nm (right).</td>
</tr>
<tr>
<td>18</td>
<td>CMOS logic with CVL feature (one PMOS switch diode).</td>
</tr>
<tr>
<td>19</td>
<td>(a) vs. for CMOS, (b) vs. for CMOS with CVL (three diodes), (c) vs. for CMOS with CVL (five diodes), and (d) vs. for CMOS with CVL (three diodes with 180 nm NMOS gating).</td>
</tr>
<tr>
<td>20</td>
<td>Power gating architecture with one sleep NMOS transistor [24].</td>
</tr>
<tr>
<td>21</td>
<td>Leakage model for an NMOS transistor.</td>
</tr>
<tr>
<td>22</td>
<td>Drain current (left) and gate current (right) variation over input gate voltage and supply voltage.</td>
</tr>
<tr>
<td>23</td>
<td>Drain current (left) and gate current (right) variation over gate input voltage and supply voltage at temperature.</td>
</tr>
<tr>
<td>24</td>
<td>Schematic of an SCL inverter.</td>
</tr>
<tr>
<td>25</td>
<td>Voltage-transfer characteristics of an SCL inverter for 45 nm and 65 nm process technology.</td>
</tr>
<tr>
<td>26</td>
<td>Cross section of PMOS load device showing the parasitic components.</td>
</tr>
<tr>
<td>27</td>
<td>Conventional (left) and the STSCL load device (right).</td>
</tr>
<tr>
<td>28</td>
<td>Simple bias circuit for STSCL gates.</td>
</tr>
<tr>
<td>29</td>
<td>An STSCL inverter input-output transient response.</td>
</tr>
<tr>
<td>30</td>
<td>Variation of STSCL inverter DC response respect to the gain amplifier.</td>
</tr>
<tr>
<td>31</td>
<td>Variation of node of the bias circuit respect to at 0.4 V supply.</td>
</tr>
<tr>
<td>32</td>
<td>Variation of node of the bias circuit respect to gain at 0.4 V supply.</td>
</tr>
</tbody>
</table>
Figure 33: An STSCL inverter DC response variation with respect to NMOS widths of the NMOS differential network at 0.4 V supply.

Figure 34: An STSCL inverter DC response variation with respect to PMOS width of the PMOS load device at 0.4 V supply.

Figure 35: Gate leakage current vs. supply voltage for a CMOS inverter with CVL (five diodes).

Figure 36: Gate leakage current vs. supply voltage for an STSCL inverter.

Figure 37: Seven-stage STSCL ring oscillator block diagram.

Figure 38: Seven-stage STSCL ring oscillator output at 0.5 V supply.

Figure 39: Oscillation frequency range and respective power consumption over different supply voltage for an STSCL oscillator.

Figure 40: Schematic of an STSCL XOR gate.

Figure 41: Schematic of an STSCL AND gate.

Figure 42: Schematic of an STSCL OR gate.

Figure 43: Schematic of an STSCL D-latch.

Figure 44: An STSCL master-slave D-flip-flop.

Figure 45: Output response for an STSCL XOR gate.

Figure 46: Output response for an STSCL AND gate.

Figure 47: Output response for an STSCL OR gate.

Figure 48: Output response for an STSCL DFF (negative edge triggered) gate.

Figure 49: Gate level diagram of a full adder.

Figure 50: A direct form FIR filter. No closed loops, also can be called a non-recursive structure.

Figure 51: A block diagram of IIR filter. The 'D' block is a unit delay. The coefficients and number of feedback paths are implementation-dependent.

Figure 52: Block diagram of a 4-by-4 array multiplier.

Figure 53: Fifth-order FIR filter.

Figure 54: Five-bit serial-parallel multiplier [13].

Figure 55: Fifty-fifth-order FIR filter.

Figure 56: Test bench of a 4-by-4 array multiplier.

Figure 57: Test bench for a fifth-order filter.

Figure 58: Test bench for a fifty-fifth-order filter.

Figure 59: Magnitude response of a fifth-order FIR filter.

Figure 60: Magnitude response of a fifty-fifth-order FIR filter.

Figure 61: Propagation delay vs. bias current.
### List of Acronyms

<table>
<thead>
<tr>
<th>What</th>
<th>Meaning</th>
<th>Where</th>
</tr>
</thead>
<tbody>
<tr>
<td>SCL:</td>
<td>Source Coupled Logic.</td>
<td>Discussed in chapter 5</td>
</tr>
<tr>
<td>STSCL:</td>
<td>Sub-threshold Source Coupled Logic.</td>
<td>Discussed in chapters 5 and 6</td>
</tr>
<tr>
<td>CMOS-CVL:</td>
<td>Complementary Metal Oxide Semiconductor-Controlled Voltage Leveling.</td>
<td>Discussed in chapter 2</td>
</tr>
<tr>
<td>MOSFET:</td>
<td>Metal Oxide Semiconductor Field Effect Transistor.</td>
<td>Discussed in chapter 6</td>
</tr>
<tr>
<td>PDP:</td>
<td>Power Delay Product.</td>
<td>Discussed in chapter 6</td>
</tr>
<tr>
<td>DFF:</td>
<td>D-Flip-Flop.</td>
<td>Discussed in chapter 6</td>
</tr>
<tr>
<td>ADC:</td>
<td>Analog to Digital Converter.</td>
<td>Discussed in chapter 3</td>
</tr>
<tr>
<td>DAC:</td>
<td>Digital to Analog Converter.</td>
<td>Discussed in chapter 3</td>
</tr>
<tr>
<td>MEMS:</td>
<td>Micro Electro-Mechanical Sensor.</td>
<td>Discussed in chapter 3</td>
</tr>
<tr>
<td>NCL:</td>
<td>Null Convention Logic.</td>
<td>Discussed in chapter 7</td>
</tr>
<tr>
<td>TDM:</td>
<td>Time Division Multiplexing.</td>
<td>Discussed in chapter 3</td>
</tr>
<tr>
<td>FDM:</td>
<td>Frequency Division Multiplexing.</td>
<td>Discussed in chapter 3</td>
</tr>
<tr>
<td>CDM:</td>
<td>Code Division Multiplexing.</td>
<td>Discussed in chapter 3</td>
</tr>
<tr>
<td>BST:</td>
<td>Base Station.</td>
<td>Discussed in chapter 3</td>
</tr>
<tr>
<td>FPGA:</td>
<td>Field Programmable Gate Array.</td>
<td>Discussed in chapter 2</td>
</tr>
<tr>
<td>DSP:</td>
<td>Digital Signal Processing.</td>
<td>Discussed in chapter 3</td>
</tr>
<tr>
<td>FIR:</td>
<td>Finite Impulse Response.</td>
<td>Discussed in chapter 6</td>
</tr>
<tr>
<td>IIR:</td>
<td>Infinite Impulse Response.</td>
<td>Discussed in chapter 6</td>
</tr>
<tr>
<td>SR:</td>
<td>Slew Rate.</td>
<td>Discussed in chapter 6</td>
</tr>
<tr>
<td>MOS:</td>
<td>Metal Oxide Semiconductor.</td>
<td>Discussed in chapter 1, 2, 4, and 6</td>
</tr>
<tr>
<td>FA:</td>
<td>Full Adder.</td>
<td>Discussed in chapter 6</td>
</tr>
<tr>
<td>TFF:</td>
<td>T-Flip-Flop.</td>
<td>Discussed in chapter 8</td>
</tr>
<tr>
<td>IC:</td>
<td>Integrated Circuit.</td>
<td>Discussed in chapter 2</td>
</tr>
<tr>
<td>CCR:</td>
<td>Corner Cube Recto-Reflector.</td>
<td>Discussed in chapter 3</td>
</tr>
</tbody>
</table>
1. INTRODUCTION

Ultra-low power requirements have become mandatory in most aspects of digital circuits and systems. The demands are very high in application fields, such as bio-engineering and smart sensors. Achieving low power dissipation includes minimization of the overall leakage current. For a CMOS logic gate, the primary concerns with respect to leakage are static dissipation (due to gate-to-channel tunneling and sub-threshold off-state leakage) and dynamic dissipation. For process technologies above 65 nm, the dissipation due to static leakage is comparatively negligible with respect to dynamic power, but as technology scales down to 45 nm and below, the static leakage exhibits larger impacts on the overall power consumption of CMOS logic [5]. A more illustrated study on the components contributing to static power dissipation are found in later chapters.

1.1. Motivation for sub-threshold operation

The concept of an ultra-low power specification sorely depends on the application and the purpose for which ultra-low energy usage will be served. This work addresses the use of sub-threshold circuits for so called Smart Dust sensors. A more elaborate architecture and application of Smart Dust is described in chapter 3. Now, the most important requirements for deploying such sensors are to have them designed in smaller size and running them at very low supply voltage. This is mandatory when the sensors are, e.g., exploiting energy harvesting techniques from the surrounding environment in which the sensors will be deployed. The energy harvesting techniques are more noteworthy for Smart Dust sensors as these techniques eliminate the use of having expensive and long-lasting batteries. Utilizing the energy harvester economically and consistently managing the performance of the driving system at very low supply voltages, urges the requirement for designing digital systems that uses sub-threshold circuitry techniques.

Initial concerns, while addressing Smart Dust applications, mainly includes operation under harsh conditions with the power availability being scarce. Sub-threshold operation mitigates the issue of minimizing power usage, but this does come with some drawbacks which include the degradation in the system throughput, variation of the system stability and functionality with process and temperature variations and most importantly design area utilization.

Finally, running a total, or part of a, system in the sub-threshold region requires an understanding for circuit design techniques that both accomplish the goal of achieving low voltage/low power operation and subsequently maintains a desirable performance level. Thus there is a need for different sub-threshold logic styles (except that of CMOS) within the field of ultra-low power applications. Even though a sub-threshold logic style normally requires quite a large design area, considering operating frequency of the sub-blocks within the system under Kilohertz range and extreme low power being the target, some sacrifice in terms of design area can be tolerated.

A key point has to be mentioned regarding the focus of this work: the purpose is not to to design the whole base band circuitry of a sensor mote, but instead to find and propose the applicability of using a different kind of logic style (i.e., not CMOS) to implement the sensor motes that can run at harsh and extreme conditions for a very low supply voltage, and still maintain and perform the necessary operations. In doing all this, the gates driving the system will also consume as little energy as possible. Even though CMOS logic with enhancements and other low power trends introduced (described in chapter 2.2.) has always been an obvious choice at the implementation level. Some low power trends also include the whole system to be designed in clock-less form or in other words asynchronous
manner, in the hope of achieving ultra-low energy consumption. But implementation is still difficult to understand, at schematic or layout level, such designs are very complex and it takes quite a lot of time to develop. Even after the design, testing such systems is also critical as it is very difficult to test an asynchronous system.

Thus this work studies the source coupled logic (SCL) and aims at using it for implementation, in replacement of CMOS, for conventional digital systems to identify and observe any advantages of this differential form of logic over CMOS. This work thereby tries to find whether a better low power configuration can be achieved for conventional systems without any enhancements, just by running source coupled logic at sub-threshold region.

1.2. Overview

Source coupled logic (SCL), operating in the sub-threshold region, is studied and analyzed in this work along with CMOS logic to observe the advantage of having lower static and dynamic leakage dissipation for 65 nm and 45 nm MOS. The SCL style is a dual-rail differential logic style and has an additional routing complexity due to its dual rail feature, but provides noise immunity and twice the signal output swing compared to CMOS logic. Now, operating the SCL logic in the sub-threshold region enables the option to run the gates at low supply voltage, down to 0.2 V, since current density for sub-threshold operation is very small for MOS devices. This lets the dynamic power dissipation to be very low as the dynamic power is quadratically proportional to the supply voltage according to the well-known formula $P_{\text{dynamic}} = \alpha \cdot f \cdot C \cdot V^2$.

Now, a few things have to be kept under consideration, that in CMOS or SCL circuits, by reducing supply voltage the gate delay increases which inadvertently increases the operation time further resulting in logic swing degradation and poor performance. But for SCL circuits operating in sub-threshold region, the power dissipation and operation time is controllable [6]. This allows better logic swing and performance even at low supply voltage.

![Diagram of STSCL inverter](image)

Figure 1: Schematic of an STSCL inverter.

Figure 1 shows a sub-threshold source coupled logic (STSCL) inverter/buffer (since the circuit is differential we can swap the outputs to obtain an inversion), studied in this work, utilizing MOS transistors operating in their weak inversion regions [9], [12]. The two differential NMOS transistors are source coupled to the source follower NMOS which is based by the bias circuit. The PMOS load devices replicate high-ohmic resistances which is necessary to generate a desirable, high output swing.
Understanding sub-threshold source coupled logic for ultra-low power application

The two NMOS branches are assigned a pull-up and pull-down network, respectively, thus creating an output and response of an inverter very similar to that of a CMOS gate. Theoretically, it requires a minimum voltage of 0.15 V [1] to completely switch the output of the next logic gates. However, in practice, to obtain desired performance, a minimum supply voltage of more than 0.2 V must be maintained. This is necessary as the minimum allowed voltage deviation for the STSCL gates designed and discussed in this work equals to 40 mV, thus any input supply given below 0.2 V, would exceed that allowable range of the deviation. This value can indeed be improved by varying the NMOS width and simultaneously increasing the PMOS device resistance, but initiating these changes would result in an area increase. It has to be that the STSCL by default uses larger design area to implement so further increasing the size of the devices will add up to the already large structure of an STSCL gate. The NMOS width chosen for the circuit implementation is 675 nm.

Figure 2 shows the variation of the input vs. output DC transfer characteristics (and the overall swing variation) of the inverter/buffer with the width of the source coupled NMOS, at a supply voltage of 0.2 V. Three waveforms are shown for NMOS widths equal to 0.27 μm, 0.675 μm and 1.35 μm, respectively. The simulation results, shown later on, will validate this statement. One major drawback which the STSCL might put into concern is the area issue as it comparatively takes more device area (approximately four times larger) in designing a system in STSCL than in CMOS logic. Also STSCL requires an additional bias circuit which determines the total amount of current, $I_{bias}$, required to operate the whole system properly. This bias current can be scaled down to as low as 250 pA for satisfactory output performance under all operating conditions. At this current the supply voltage can still be scaled down to 0.2 V.

![DC Response](image)

**Figure 2: DC response of the STSCL inverter for varying NMOS width at $V_{dd} = 0.2$ V.**

Figure 3 and 4 show the DC characteristics of an STSCL inverter for two process nodes (65 and 45 nm), with temperature variation, at a supply voltage of 0.5 V. Noticeable concerns that may emerge from the waveforms of these figures mostly include the inability of the DC responses to reach zero at different temperatures. Also, with a lower value of the temperature the tendency to reach zero

Dept. of Electrical Engineering, Linköping University, LiTH-ISY-EX--11/4465--SE 17
increases. The reason is due to the fact that the NMOS devices that are used have larger width which subsequently increases the capacitive value for the MOS device thus when logic transition from high to low takes place, the capacitance does not fully discharge. This phenomenon is temperature dependent, hence as a result with increased temperature, the time required to fully discharge increases. The transfer characteristics for both 65 nm and 45 nm looks very similar but varies along the transition zone where the response for the 45 nm process has a steeper slope. This is mainly due to level transition from high to low or vice versa occurs faster for the 45 nm process than 65 nm. Thus, the initial specification for designing a system is important, as for applications where size/area are of lower priority, the STSCL style is a better choice than CMOS circuits due to its low power consumption.

![Voltage-transfer characteristics of the STSCL inverter for 65 nm process technology at $V_{dd} = 0.5$ V.](image)

1.3. **Thesis Organization**

The task for this project mainly involved finding the minimum operating specification of STSCL logic with minimum transistor sizing. The target has been set to utilize the additional leakages that contribute to 45 nm technologies and run the STSCL gates desirably at sub-threshold operation. Gates, such as XOR/XNOR, OR/NOR, AND/NAND and DFF are then implemented in schematic level using the Cadence Virtuoso 6.1.4 framework and the gates have been simulated to check the performance under harsh conditions.

Several conventional digital circuits have been implemented including a 4-by-4 array multiplier, a fifth-order FIR filter and a fifty-fifth-order FIR filter. The fifty-fifth-order filter has been designed to test the STSCL gates performance for larger designs. All simulations have been carried out in a temperature range from $-20^\circ C$ to $70^\circ C$, with $-20^\circ C$ being considered the worst critical point of operating temperature.
The results have been compared to those of CMOS for the same structure but at a supply voltage of 0.5 V. This supply voltage was the lowest achievable scaling of supply voltage for the CMOS devices. Even though CMOS can further be run at sub-threshold region (as shown in e.g. [21]), we have not considered this in this work as we did not want to do any modifications to the already existing CMOS standard cell library. The conventional CMOS circuits used did not allow running in the sub-threshold region.

The initial simulation work included designing the STSCL-based logic gates and finding their corner limitations. In the simulations, testing the functionality, calculating the power consumption for different input frequencies, the maximum frequency of operation, the minimum supply voltage and the obtained output swing were considered as the standard parameters to compare with standard CMOS logic gates.

The final goal of the work was to study, understand, design and analyze the advantages of having digital circuits or systems, implemented with an STSCL standard cell library, over digital systems implemented using other conventional or sub-threshold logic styles, such that the digital sections of the Smart Dust sensor motes can be run at very low supply voltages and consume as low energy as possible.

![DC Response](image)

Figure 4: Voltage-transfer characteristics of the STSCL inverter for 45 nm process technology at $V_{dd} = 0.5$ V.
2. LOW POWER ELECTRONICS AT NANO-METER PROCESS TECHNOLOGY

CMOS technology, as a whole, has significantly advanced in the last five decades. The total number of transistors in a chip have increased from millions to billions, with CMOS processes being scaled down to very small nano-scale levels (currently, the Intel i7 microprocessor uses 32 nm CMOS process technology). Along with this increase in number of transistors the speed or frequency of operation of the devices have also increased immensely. The reasons behind this are also due to the fact of the performance boosting techniques (parallel architecture, pipe-lining, interleaving, etc.) used at the system level of a design. These techniques are used in almost all digital systems mainly to improve the throughput of the overall system and make the system work faster. But they have a drawback with respect to power consumption. The dynamic power dissipation, which is dependent on the frequency of operation, increases significantly. Also with dynamic dissipation, additional power consumption due to leakage at nano-scale CMOS processes cause an overall increment in total power consumption. The most overused way to counteract this increase in power consumption is voltage scaling. In most of nowadays and previous systems this was the prime way of reducing total power dissipation. But one has to remember, that the rate or factor of scaling of voltage levels for system is comparatively less than the scaling of the CMOS processes. Normally, to get faster performance from CMOS logic style a standardized high supply voltage level is required, otherwise switching of CMOS gates will slow down and eventually would harm the performance of the system. Thus only rely on the voltage level scaling as a prime reduction scheme for power dissipation of digital systems will not be fruitful. Hence new techniques, schemes and logic styles must be looked out and built to achieve better and more enhanced systems with lower energy consumption rate.

There are other several factors (besides the operating frequency) that contribute to the rate which power consumption increases. These include the leakage components of MOS transistors, data computation of complex functionality, temperature, and slow progresses in for example battery development. In MOS devices, leakage such as gate leakage, junction leakage, and sub-threshold leakage contribute to the overall flow of leakage current. For large scale devices the gate and junction leakage is negligible compared to sub-threshold leakage, but by going down to nano-scale level the gate leakage gets in comparable range.

The data computation rate is an application-driven phenomenon. Most applications are related to the communication or network systems that involve higher rate of data computation and perform complex computations dissipating a large amount of power. The operating temperature also contributes to an increase of energy consumption of digital systems. A system starts to consume significantly more power when its operating temperature rises to a very high value or when the surrounding temperature at which the system is applicable takes an extremely low value. It also jeopardizes the functionality of the system. It is relatively hard to counteract this kind of problem as most applications like sensor networks and bio-engineering systems do not have any built-in coolers or state-of-the-art cooling system that can control the overheating of the ICs or the batteries. With all these factors, the most important one to add is the slow progress in development of batteries that can have longer life-time. The life-time of a battery is very important factor when longevity of a system is considered. Modern day sensors, routers, pacemakers, etc. are just few of the examples, where long-lasting batteries are highly essential. Since having batteries with longer life-time is still out of grasp, new kinds of low power design techniques are the best possible ways to obtain low energy consumption, better battery life-time and not compromising the performance of the system.
In the next sections of this chapter, the focus will be on the main sources of power consumption in digital systems and the current trends or strategies of low power design techniques that are used at different hierarchical level of a digital system.

2.1. Sources and consideration of power consumption in CMOS logic

There are mainly two sources of power dissipation, when CMOS logic is considered. First, dynamic power consumption, which occurs during switching or transition logic of levels and secondly static power consumption, occurs, due to a shorted direct path for leakage current to flow between the supply voltage and reference ground of a system.

2.1.1. Dynamic power consumption

The dynamic power dissipation is in some sense mandatory and necessary for a logic gate in a system to perform or in other words the power required by a gate to operate. In case of a CMOS inverter, it is the power required to completely switch its output. Hence, dynamic power consumption due to dynamic power only occurs for transition in the input. To better understand the dynamic power consumption, a CMOS inverter can be taken as an example. A simple CMOS inverter is composed of a PMOS and an NMOS transistor. During input transition from 1 to 0, the PMOS transistor turns ON and NMOS switches OFF, causing the output capacitance to charge up to the supply voltage. And during input transition from 0 to 1, the NMOS transistor switches ON and PMOS turns OFF, causing the output capacitance to discharge to the ground. The dynamic power can be represented by the following equation,

\[ P_{\text{dynamic}} = C_{\text{load}} \cdot V_{\text{dd}}^2 \cdot f \cdot \alpha, \]

where \( V_{\text{dd}} \) is the supply voltage, \( f \) is the operating frequency, \( C_{\text{load}} \) is the load capacitance, and \( \alpha \) is the switching activity of the output capacitance for each clock cycle. As from the (1), it is clear that, the reduction of the dynamic power consumption is more effective if the supply voltage is reduced, but it costs with longer charging and discharging time for the output load capacitance.

2.1.2. Static power consumption

The static power consumption occurs due to the flow of leakage from supply to ground when there is no input provided on a logic gate. In the ideal case this power consumption should be zero, but in reality it is not, as the drain current through CMOS does not become zero for zero input. The leakage current flows through the channel of the MOS devices, mainly known as sub-threshold leakage (discussed in chapter 4). Power consumption due to this current is smaller than dynamic consumption when the MOS channel lengths are high, but for modern nano-scale transistors with shorter channel this current has comparable amount of impact on the total power consumption as dynamic power consumption.

2.2. Power-energy reduction techniques in CMOS logic

Power reduction in any system, or minimization, is nowadays a necessary requirement for designers before moving on to the implementation phase. Before applying a power reduction scheme, a few considerations have to be kept in mind regarding the parameters upon which, the power consumption of CMOS logic gates depends. These include supply voltage, threshold voltage of the MOS devices, physical capacitance, and switching frequency. Most importantly a designer has to understand the differences between the energy consumption and the power consumption. Energy is the total power...
consumed by a system over a period of time whereas power is the rate at which energy is consumed. Energy consumption for a system or circuit can be expressed by the following equation,

\[ E = P \cdot N_s \cdot \tau \]

where \( P \) is the average power consumed, \( N_s \) is the number of clock cycles for one whole computation, and \( \tau \) is the time period of each clock cycle. Thus, when the reduction is considered along with improving the life-time of battery supply, designers have to keep the energy consumption under their consideration as one might reduce the power by a certain factor but utilizes more number of cycles to finish one computation than before, hence, this will unintentionally increase the energy consumption rate and reducing the power consumption will eventually not help. Thus, to minimize energy consumption, average power needs to be reduced, which can be lowered down by decreasing the average current flow through the system or circuit. So the reduction techniques applied at any phase of a system must comply with the idea of reducing the average flow of current in order to decrease energy consumption. In digital system design, this consideration is very important as the digital domain only considers the power minimization causing performance degradation of the system. The power optimization has to be at a level where the optimization itself does not affect the performance. Thus, while using optimization or reducing techniques for digital systems, power-delay product (PDP) [1] is taken as the acceptable optimization metric for measuring the performance of the system.

Power-Energy minimization techniques can be performed at several design stages of a digital system. These stages include: system level, architectural level, gate level, circuit level and device level. These are discussed below.

a) System-level power reduction techniques: The main techniques involved at the system level are instruction-level optimization, which involves new types of instruction encoding schemes for fetching less power hungry instructions, hardware-software co-designing which involves DSP structures, given in Figure 5 and 6, FPGAs, memory, and buses all integrated in one single chip. Other techniques include dynamic power management and multiple-voltage supply schemes to reduce system level power consumption.

b) Architecture-level power reduction technique: At the architectural level of abstraction the power reduction is done by using techniques involving clock gating which causes logic blocks to shut down during inactive state, using parallel processing, shown in Figure 7, and pipelining techniques which allows the system performance not to degrade when the supply voltage is reduced.

c) Gate-level power reduction techniques: The gate level includes strategies which involve data path equalization, shown in Figure 8, by lowering supply voltage, and sizing of the gates to reduce gate capacitance which eventually saves power. These path equalization schemes help in reducing glitches, which are unwanted signals that occur at the input-output interconnects or buses causing increased transition of signal.

d) Circuit-level power reduction techniques. At the circuit-level only local optimization is feasible, in other words this technique involves strategies like creating new cell library designs, sizing of transistors, and modifying design style of circuits.

e) Device-level power reduction techniques: At the device level, the approach is mainly technology based. Any changes or variations can only be done during the fabrication stage. Applying reduction schemes at this level is pretty rigid for a designer as flexibility is very low. But there are options like
using multi-threshold or variable threshold devices for implementing the circuits and gates. Applying
variable threshold allows threshold to be reduced, which instantly minimizes the power consumption.

So far this chapter has looked at the main sources of power consumption and the techniques, at
different level of abstraction of the system, that are used to reduce the power consumption without
diminishing the performance of the system. Most of these techniques are done and used, keeping
CMOS logic gates in consideration. In this work, concentration has been on gate-level and circuit-level
power reduction techniques, and we have tried to understand, how a new logic style - rather than
conventional CMOS- would work better under harsh conditions. Also, how a system behaves when it is
being implemented by the new logic style. Optimization at system- and architecture-level has not been
considered for the simulations that are demonstrated in chapter 6.

Figure 5: Traditional design flow where hardware and software are implemented independently.
Understanding sub-threshold source coupled logic for ultra-low power application

Figure 6: Example of hardware-software co-design.

Figure 7: Utilizing parallelism, if voltage is reduced for the top block diagram then the path will produce output in $2T$ seconds, with duplicating the modules with lower supply, outputs will be produced parallel in every $2T$ seconds which are equivalent to every $T$ seconds, similar to the initial value before reducing the supply voltage [18].
Figure 8: Data path equalization [18].
3. WIRELESS SENSOR NETWORKS DESIGN AND DEVELOPMENT

Modern day wireless sensor networks are mainly composed of very small-scale distributed sensors or nodes, with each node communicating the other via wireless media and the whole network of nodes is in globally communicating, with a central base station. The control signals normally are assigned to the nodes by the base stations. The applications of such sensors are vast; ranging from military purposes like surveying of border areas to chemical plants in monitoring and measuring the level of impurities or toxicity.

In Figure 9, a very basic architecture of a sensor node is provided. The components include mainly six parts – supply, digital signal processing (DSP) section, analog-to-digital converter (ADC), digital-to-analog converter (DAC), sensor module, and transceiver. The transceiver is required for sending and receiving of information of data of surrounding environment in which the sensors are being deployed. The supply uses a battery with rechargeable feature, where charging is done by using for example solar power or some energy harvesting techniques. A DSP section is required for processing the data or information gathered by the node mainly consisting of tasks like filtering, noise elimination, equalization, etc. The ADC is needed for analog to digital conversion of the signals generated by the sensor interface. The DAC is required for digital to analog conversion of the processed digital signal to be transmitted by the transceiver.

The configurations of the network created between the sensors and the main base or control station are normally of two types – (i) direct communication of all nodes in a cluster with its gateway and (ii) inter-communication or chaining way of communication with the gateway. The second configuration allows less energy consumption in terms of transmission, but causes an increase in energy processing and computation of signal data within each sensor node.

As far as operation of the nodes is concerned that are also done in two different modes - (i) active mode, where the sensor nodes gather the data from surrounding environment at a continuous rate and communication within the nodes and control station occurring at this same period of time, (ii) standby mode, in this mode of operation the nodes remain under power down state with performing no computation or data processing. At this stage the nodes await any kind of commands from the control station and if it receives one, the nodes again powers up to perform data collection and computation tasks.

In accordance to this work, the main focus is on designing a digital filter required in the DSP section of the sensor node. The filter was designed using the new logic style (STSCL) and simulated to observe its system-level performance. The operating condition for these sensors can vary significantly due to the application and environment in which it will operate. Hence, how the filter will performs with the new
logic style is the point of interest for us. The filter was designed in both CMOS and STSCL styles to understand the comparison more vividly.

3.1. Design concerns in sensor module development

The application, for which the sensors are to be used, requires these sensors to be very small in size, consume very low energy and power to enable long life-time, reliable, and should be cheap to employ and operate. Thus the design is challenging for a designer with respect to scalability, power, cost, and reliability [3]. We elaborate on these aspects further:

Scalability – The number of sensors that will be deployed for the respective application can vary from hundred to much more. At a given time, some sensors might get overused or no longer performs. In such cases new nodes should be introduced or operations should continue, whatever amount of nodes are left with, hence during setting up of the protocols the network itself must be scalable.

Power – Power, or energy, consumption is the most important issue. As nodes are very small, saving power during the operation and reducing the power dissipation due to unwanted leakage has become a major concern in the designers in the development of long-lasting sensors. In chapter 2, power reduction techniques at different level of abstractions in a system were described. These techniques can also be used in designing the nodes which may allow a lower level of energy consumption without forsaking the performance level. In this thesis work, the main focus is looking at the reduction of energy consumption for Smart Dust sensor [2] nodes at the gate and circuit level by utilizing the benefactors of a new logic style.

Cost – In the application zones the sensors are deployed and required in large quantity and hence most of the sensors used are disposable as soon as they ran out of power or get damaged by other environmental and technical issues. Thus the cost for designing and making these sensors has to be as low as possible.

Reliability – To keep the maintenance cost down the sensor modes must function and operate properly. Before deployed in application proper testing of the nodes and the network itself have to be performed, in other words the whole system itself have to be reliable so that the information provided by the nodes is correct and valid.

3.2. Smart dust sensors: A look at current architecture and methodology

Smart dust is a rising technology which is made from tiny wireless sensors or ‘notes’ as shown in the magnified diagram in Figure 10. Eventually, these devices are smart enough to talk with other sensors, yet small enough to fit on the head of a pin. Each mote is a tiny computer with power supply, one or more sensors, and a communication system (Hsu, Kahn, and Pister 1998, p. 1). It is a very small dust size device with unlimited capabilities. It can also contain components, such as a micro electro-mechanical sensor (MEMS). The mote can sense any movement, can do combinations, and also communicate via wireless and has a dedicated power supply with ultra light weight. As a result when it is ‘thrown’ into the nature, it seems like a dust particle. Even the air can move and change its directions. It is so small that it is very difficult to detect by human eyes.

3.2.1. Example of smart dust architectures

The above mentioned smart dust system (Hsu, Kahn, and Pister 1998, p. 1) includes smart dust particle that contain:
• A semiconductor laser-diode and MEMS beam steering mirror, for active optical transmission.

• A MEMS corner cube retro-reflector (CCR) for passive optical transmission. It is a kind of device that can be used as a transmitter in wireless communication technologies.

• An optical receiver. It is device which can transform light into electrical signal. It is consists of a photo diode semiconductor and an amplifier.

• A signal processing and control circuitry.

• A power source based on thick-film batteries and solar cells.

3.2.2. Communication methodologies

Figure 13 shows how the corner cube retro-reflector (CCR), described above, works for optical communication. A CCR consists of three mutually perpendicular mirrors of gold-coated poly-silicon. It has the property of: if any incident ray of light is reflected back to the source, it wills incident within a certain range of angles centered the cube's diagonal body. It has an electrostatic actuator that can avoid one of the mirrors at Kilohertz rates. As a result the information can be feed back into a form of a modulated signal.

However, there are several different communication methodologies that can be used in the network and below we have listed a few.

1. Radio frequency transmission using T/F/CDM multiplication techniques. As a result modulation, demodulation, and filtering in needed. The disadvantage of using this technique is it requires large antenna. This transmission technique is suitable for this smart dust project.

2. Optical transmission technique which would include either (a) a passive laser-based communication or (b) active laser based communication or (c) fiber optic communication. For passive laser the communication between BST (base station) and mote is done by using a
Understanding sub-threshold source coupled logic for ultra-low power application

modulated laser beam. Dust to BST communication is done by un-modulated laser beam at a node, which reflects back the beam to the BST. The advantage of this technique is that it requires only a simple base band circuitry, i.e., no need of modulators or filters. The problem is, as it is a single-hop network the dust nodes cannot communicate with each other. The active laser-based communication consists of a semiconductor laser, a lens, and a beam steering micro-mirror. The advantage is it uses multi-hop networks as a result; sensors can communicate with each other. The disadvantage is that it consumes high power. Finally, the fiber optic communication also includes a semiconductor laser but also a fiber cable, and diode receiver to generate, transfer, and detect the optical signals. The advantage is that the communication between sensors and base station is assured. The problem is the mobility of the dust motes gets some restriction because of fiber cables.

![Smart Dust Components](image)

Figure 11: Components of Smart Dust.
Figure 12: RF communication technique of Smart Dust.

Figure 13: Optical communication technique of Smart Dust.
3.2.3. Applications of smart dust

There are wide-ranging applications for Smart Dust. There are actually three prospective applications of Smart Dust: industrial plants, environmental protections, and light and power suppliers. The descriptions of these applications are given below:

a) Industrial plant: Smart Dust can be used to reduce the industrial plant's fall-time and increases the safety issues. For an example if a chemical plant uses pipes to transport risky chemicals or liquids, day by day it becomes week and thinner due to corrosion. So, to stop any kind of accident, operators check those pipes manually in a routine basis, which is time consuming. To get rid of this problem or become more efficient and more protective, heavy insulating pipes are used but in the near future there will be corrosion detecting sensors will be applied throughout the entire pipes. As a result there will be no longer need of any supervisor to inspect those pipes which will save money as well as time.

b) Environmental protection: Environmental organizations like forest service or farming can use Smart Dust for enormous purpose. For example farmers can through these sensors at the boundary of the fields to sense the cattle do not cross the field or other animals do not enter the field. Another example can be given in a forest it can be dropped into the dust and it can monitor the temperature. As it has their own network so that they can communicate with each other whenever. If any of the particles sense any abnormal temperature it will communicate with others and by this the central network will get the information the location, area. So within a flash the prevention can be taken by fire fighters and thus saves a huge loss.

c) Light and power suppliers: Present days in the streets, there are thousands of electronic equipments. Everyday manual inspections are done in order to check whether there is any failure in any equipment which cost a lot of money and consumes lot of time. To make it more specific an example can be given if any of the million street lights is become faulty then manual search is recommended, to find the faulty lamp or have to wait for the complains but if this dust sensors are there then it will be a matter of seconds to reach the faulty lamp and do the necessary repair which saves money and man power.

Smart Dust can also be used in military applications such as monitoring the activity of forbidden areas or to alert the soldiers who are in enemy territory, when there are dangerous or poisonous gases or substances in the air. Also it can use for indoor/outdoor monitoring. It can also uses for business purpose, security and tracking vehicles, monitor health by entering to the body of human, and also can monitor traffics by sensing the vibration.

Sensor network is having mushrooming growth in the modern science. So it will be not fruitful if it is not small in size. Scientists are more concerned about the size of the sensors which also brings the name, Smart Dust, as it is a powerful tiny sensor.

Smart Dust is a wireless network sensor which is very small is hardly visible to the eyes. It will be very helpful to every kind of people from scientists to business man; everybody will have benefit from it. It is a very new technology though there were lot of works has been done in these sectors but still work is going on to make it smaller. It will be cost-effective by reducing the man power and will save lot of money which can be invest in other sectors to help the mankind.
4. LEAKAGE IN CMOS TECHNOLOGY

This chapter looks at different mechanisms for leakage that contribute to the constraints in achieving low power consumption in modern logic gates, with scaled down process technologies and the minimization techniques that are currently used in circuit and device level design and analysis.

As mentioned before: for CMOS logic, the power consumption can be classified in two groups – static power dissipation and dynamic power dissipation. Static dissipation has mainly two components; gate leakage and sub-threshold leakage. We discuss these in the following sections.

4.1. Gate leakage

The NMOS device in Figure 14 shows the prime components that contribute to the gate leakage of a transistor. The components are gate-to-bulk leakage $I_{gb}$, gate-to-source and drain $I_{gs}$ and $I_{gd}$, and gate-to-channel $I_{gc}$. Each component's implication depends on the device's operation mode [5]. The two operation modes that are mainly taken into account for understanding the impacts of each of the above mentioned gate leakage are off-state and on-state operation. During each mode of operation the $I_{gc}$ component has more significant impact than the other components [7].

![Figure 14: NMOS section indicating the gate leakage components.](image)

Simulations have been conducted on simple NMOS and PMOS devices for 45 nm and 65 nm process technologies to analyze the behavior of gate leakage current at room temperature. The analysis involved DC sweeping over the supply voltage and plotting the gate current flow. Figure 15 shows the gate current $I_g$ vs. voltage plot for an NMOS device with high threshold voltage and Figure 16 shows the gate current $I_g$ vs. voltage for an NMOS with low threshold voltage. Figure 17 shows similar plots but indicates the sub-threshold leakage for both the devices at off and on-state of operation with varying input voltage levels.
From these figures, it can clearly be seen that the impact of sub-threshold leakage becomes less dominant and sensitive for 45 nm processes compared to 65 nm, whereas gate leakage has increased significantly for 45 nm process. As per simulations the gate leakage roughly increases by a factor of ten when moving from 65 nm to 45 nm, whereas the sub-threshold leakage increases only by a factor of three. Thus new logic styles are essential for compensating and reducing the overall leakage through...
the MOS devices. It is to be noted that the tunneling current for the NMOS is higher in magnitude compare to the PMOS [6], hence mostly; the NMOS has been used for the above simulations.

![Figure 17: Sub-threshold leakage current for low (top portion) and high (bottom portion) threshold NMOS device, 45 nm (left), and 65 nm (right).](image-url)
4.2. Gate leakage reduction scheme

With gate oxides getting thinner as processes are scaled down, new schemes and methods are highly necessary in lowering tunneling current. From Figure 16 and 17, it is evident that the sub-threshold leakage will be less substantial for nanometer devices; hence circuit designs with additional transistors acting as cut-off transistors [17], used to minimize sub-threshold leakage, must be modified, and redesigned more robustly, to suppress the gate leakage too. This can be done by having those cut-off transistors using thick gate oxides and adding multiple PMOS diode switches, which provides a controllable voltage level (CVL) [16], in parallel with the cut-off transistors. The cut-off transistors remove the direct path for the leakage current flowing from supply to ground. Now, strategies with such transistors, in current applications, mainly target schemes on reducing the sub-threshold leakage, but with the addition of switch diodes, a significant reduction in the gate leakage can also be observed, Figure 18 shows a CMOS logic with CVL feature.

The difference in the overall leakage current vs. supply voltage plots at 45 nm process for CMOS logic with CVL feature are shown in Figure 19, using three and five switched diodes separately. These results point out the advantage of the CVL feature in reduction of the overall leakage current reduction. Adding a finite number of PMOS diodes allows the virtual ground rail, (V_{gnd}), to be adjusted at a higher ground voltage with respect to the operation modes (stand-by or active), thus resulting in a smaller V_{gs} voltage, which corresponds to minimizing both the gate and the sub-threshold leakage currents (according to (1) and (2)). The expressions for the gate leakage, J_{g}, i.e., the tunneling current density is:

\[
J_g = A \left( \frac{V_{gs}}{t_{ox}} \right)^2 \cdot e^{-t_{ox}/B} \cdot B \cdot K ,
\]  

(3)

where A, B, and K are constants and V_{aux} is an auxiliary function that approximates the density of tunneling carriers.

The plot in Figure 19 (d) shows that similar results can be achievable as in (c), which involved using five diodes by taking longer-channel NMOS devices. In this case the channel length is 180 nm.

Figure 18: CMOS logic with CVL feature (one PMOS switch diode).
Figure 19: (a) $I_d$ vs. $V_{dd}$ for CMOS, (b) $I_d$ vs. $V_{dd}$ for CMOS with CVL (three diodes), (c) $I_d$ vs. $V_{dd}$ for CMOS with CVL (five diodes), and (d) $I_d$ vs. $V_{dd}$ for CMOS with CVL (three diodes with 180 nm NMOS gating).
4.3. Sub-threshold leakage

Understanding sub-threshold leakage requires us to focus on its expression, given below,

\[ I_{ds} = I_{dso} \cdot e^{(V_{gs} - V_t)/nV_T} \cdot \frac{1 - e^{-V_{ds}/V_T}}{nV_T}, \]  

where \( I_{dso} \) is the saturation current. Equation (4) depicts that the drain current for an NMOS transistor varies exponentially with the gate-source voltage \( V_{gs} \). As nowadays voltage scaling is a prime method in reduction of power dissipation and increasing longevity of the MOS devices, it is to be kept into consideration that as scale downs to smaller processes, the MOS threshold has to be scaled for maintaining a sustainable gate delay. Reducing \( V_t \) will however result in an increase in sub-threshold leakage current as seen in (4). Thus the gate-source voltage, \( V_{gs} \), should be reduced to counteract the impact of the threshold voltage reduction.

4.4. Sub-threshold leakage reduction schemes

There are several techniques to reduce leakage current. More recent and common way nowadays is the use power gating architectures [24]. A conventional power gating topology with a sleep NMOS transistor is shown in Figure 20. It minimizes the static power consumption due to the sub-threshold leakage current. This is done by temporarily shutting down the logic blocks in a system that is currently in stand-by or inactive state. Thus when a block is ready to operate again, the gating topology allows the logic block switch to active state.

![Power gating architecture with one sleep NMOS transistor](image)

Figure 20: Power gating architecture with one sleep NMOS transistor [24].

The drawback with this technique is the time required to switch from stand-by to active state, in other words, wake-up latency [17]. Also the gating transistors used for this architecture has to be designed differently from the transistors in the logic blocks since it contributes less leakage compare to other devices.
4.5. Leakage model

The leakage model shown in Figure 21 has mainly been designed to replicate all the impacts of gate and sub-threshold leakage for an NMOS at 45 nm process technology at schematic level basis. The current source connecting to the gate to source reflects the contribution of gate leakage on the transistor. The value of this current source varies with the applied input voltage to the gate and the oxide thickness. Now, the oxide thickness is mostly fixed, with any variation that may occur, comes from fabrication. Thereby the only varying parameter is the gate-to-source voltage. For the simulations performed using the leakage model, the NMOS transistors are only considered, as the SCL uses differentially coupled NMOS devices to create the gates. On an abstract level, this model provides a better understanding on the amount of leakage, that is contributed by an NMOS transistor during its operation and how much impact can the components of leakage have for a sub-threshold logic style.

![Figure 21: Leakage model for an NMOS transistor.](image)

The waveforms in Figure 22 reflect the gate and drain current flowing through the NMOS transistor model in Figure 21. The supply and the gate input voltages are varied between 0.2 V and 0.5 V at a high temperature, 70°C. This range is mainly chosen as the focus towards sub-threshold region operation and since the target application will run at very low voltage. Simulation results for supply voltages above 0.5 V are being neglected. The waveforms show the peak value of both the drain and gate current (denoted by M0, M1, M2, and M3) for NMOS. The gate current remains within the femtoampere range for both gate input and supply (V_DD), but the drain current, as expected, varies significantly when scaling the gate input voltage and supply voltage. For 0.2 V the drain current equals 93 nA and for 0.5 V, the value equals 93 μA which is approximately 200 times higher. It has to be remembered, that this drain current is not the sub-threshold leakage. Figure 17 shows the waveforms for sub-threshold leakage. The NMOS transistor width for the simulation of the model is 0.675 μm which is deliberately taken to be in coherency with SCL gates where the NMOS width is off the same value. Similar waveforms like that in Figure 22 are also observed when the model is simulated at −20°C, but the significant changes in the drain current is observed whereas gate current has not changed that much. From Figure 23 it can be seen that the drain current at 0.2 V, has decreased by more than fifteen times compared to the waveforms in Figure 22, whereas the gate leakage remains almost the same. Thus, this suggests that the temperature variation does not have too much impact on the gate current of a transistor device but have a significant impact on its drain current. These simulation results of the leakage model gives the idea on which parameter to tune and worry about when designing the SCL logic to perform and operate in the sub-threshold region and what role external parameters like temperature would play on the performance in terms of leakage.
Figure 22: Drain current (left) and gate current (right) variation over input gate voltage and supply voltage.

Figure 23: Drain current (left) and gate current (right) variation over gate input voltage and supply voltage at −20°C temperature.
5. SOURCE COUPLED LOGIC

This chapter looks at the characteristics of the source coupled (SCL) and sub-threshold source coupled logic (STSCL) [1], [4], and why they provide, better flexibility and advantage in leakage current minimization over CMOS logic. Before going to the fundamentals of the SCL or STSCL gates, a brief overview of the parameters on which the functionality and performance of SCL/STSCL gates depends should be given.

Output voltage swing – The output voltage swing, by definition, is the maximum voltage that an output can attain prior to the point it gets clipped. For differential logic gates, with differential input-output topology, the output voltage swing is twice comparing to normal CMOS logic. This voltage swing parameter is important for the SCL and STSCL gates. The value of output swing must be kept sufficiently high in order to make SCL/STSCL series gates perform desirably. For an SCL inverter the minimum output swing for complete switching of the next input gate must equal to \( \sqrt{2 \cdot n \cdot V_{DSSat}} \), where \( n \) is the sub-threshold slope factor and \( V_{DSSat} \) is the drain-to-source overdrive [10] applied voltage at the NMOS, when it is operating at strong inversion region or saturation region.

Sub-threshold or weak inversion MOS operating region – The sub-threshold region of operation for an NMOS transistor will occur when an applied input voltage will be below the threshold voltage value \( (V_{GS} = V_{th}) \). At this point theoretically no conduction between the drain and the source terminal occurs but in practical applications, due to Boltzmann distribution of electron energies, a flow of electrons between the source-to-drain take place causing the flow of a very small amount of drain current. This region is also known as the weak inversion region and the drain current is called the sub-threshold leakage current which is exponential in nature:

\[
I_D = I_{D0} \cdot e^{\left(\frac{V_{GS} - V_{th}}{n \cdot V_T}\right)} \left(1 - e^{-\frac{V_{DS}}{V_T}}\right),
\]

where \( I_{D0} \) is the equivalent current at the point where input \( V_{gs} \) equals to the threshold voltage represented by

\[
I_{D0} = 2 \cdot \mu_n \cdot C_{ox} \cdot n \cdot V_T^2 \cdot e^{-\frac{V_{th}}{V_T}},
\]

Approximate values for the \( I_{D0} \) stand within the range of \( 10^{-15} \) A to \( 10^{-12} \) A, \( V_T \) is the thermal voltage which equals to \( KT/q \) (\( K \) is the Boltzmann constant, \( T \) is the operating temperature, and \( q \) is the electron charge), \( V_{th} \) is the threshold voltage (which equals to approximately 0.2 V for a 65 nm NMOS transistor), and \( n \) is the sub-threshold slope factor which is represented by

\[
n = 1 + \frac{C_D}{C_{ox}},
\]

where \( C_D \) is the depletion layer capacitance and \( C_{ox} \) is the oxide layer capacitance. The sub-threshold current in (5) have two dependency parameters, threshold voltage and the slope factor respectively. Both the parameters are varies with respect to change or variation in process. The threshold voltage

Dept. of Electrical Engineering, Linköping University, LiTH-ISY-EX--11/4465--SE 41
value of an NMOS transistor will fluctuate when fabrication variations like oxide thickness and doping concentration occur. This in turn will cause exponential variation of the sub-threshold current. The other parameter, slope factor, varies with the doping concentration. Normally, if the doping concentration is increased by ten times, the sub-threshold slope factor will rise by some 11% \[14\].

From (5) one thing has to be noticed that, if the \( V_{DS} \) is greater than \( 4 \cdot V_T \) (equals to 0.1 V at room temperature) the exponential term including the \( V_{DS} \) portion can be ignored and the (5) can be further simplified to approximately

\[ I_D \approx I_{D0} \cdot e^{\left(\frac{V_{GS} - V_{th}}{n} \cdot V_{DS} \right)} \]  

(8)

It is important to remember that the value for \( V_{DS} \) does not depend on the input gate voltage. This can be beneficial in achieving low power design at sub-threshold region.

**Linear or moderate inversion MOS operating region** – The linear region of operation occurs when an NMOS transistor channel has been formed by an applied input voltage, slightly higher than the threshold voltage level (\( V_{GS} = V_{th} - 0.1 \)) of the NMOS transistor. The channel current flows from the drain terminal to the source of the transistor, which can be represented by the NMOS transistor drain current equation at linear region

\[ I_D = \mu_n \cdot C_{ox} \cdot \frac{W}{L} \cdot \left( V_{GS} - V_{th} \right) \cdot \left( V_{DS} - \frac{V_{DS}}{2} \right)^2 \]  

(9)

where \( \mu_n \) is the effective mobility. \( W \) and \( L \) are the length and width of the NMOS transistor respectively. The drain current dependency is linear, as it can be seen from the (9), compare to sub-threshold region which varied exponentially. Thus the input variation will not sharply fluctuates the drain current at the moderate inversion region.

**Saturation or strong inversion region** – The saturation region occurs when the input gate voltage is significantly higher than threshold voltage value (\( V_{GS} = V_{th} + 0.1 \)) and the drain-to-source voltage is greater than the input voltage subtracting the threshold voltage (\( V_{DS} > V_{GS} - V_{th} \)). The drain equation for this region

\[ I_D = \frac{\mu_n \cdot C_{ox} \cdot W}{2 \cdot L} \cdot \left( V_{GS} - V_{th} \right) \cdot \left( 1 + \lambda \cdot \left( V_{DS} - V_{DSat} \right) \right) \]  

(10)

The \( \lambda \) term is the channel-length modulation. Equation (10) cannot be accurately used for describing MOS devices with shorter channel lengths. This is due to the impact that the short channel length devices have on the output current. Other problems include for example the drain-induced barrier lowering \[10\] phenomenon, which causes the threshold voltage value to change. Equation (10) also shows the quadratic dependency of the drain current on the input applied voltage.

**Sub-threshold slope factor** – The sub-threshold slope factor term contributes during the weak inversion region operation of the MOS devices. This parameter has importance in design of STSCL gates as it sets value for the minimum voltage swing, required to switch a corresponding STSCL gate. The value for the minimum voltage swing equals to \( 4 \cdot n \cdot V_T \), where the term \( V_T \) fluctuates with region of operation, and environment. This term can be adjusted externally, whereas the sub-threshold slope factor term cannot, as it is a fabrication dependent phenomenon. Thus, once any mismatch occurs
during fabrication process, the sub-threshold slope factor term will vary and thereby can later on not be altered. As previously mentioned, the sub-threshold slope factor [12] is dependent on doping concentration of the device.

5.1. Fundamentals of source coupled logic (SCL)

A basic SCL inverter circuit diagram can be seen in Figure 24. Gates composed of such topologies are differential in nature. The immediate choice over differential logic is due to its noise immunity respect to switching. Even though wiring differential logic is a complex task and also implementing it in the CMOS logic, put a barrier in scaling the supply voltage to a minimum level, which in current nanometer-electronic platforms is a drawback.

Figure 24: Schematic of an SCL inverter.

Figure 25 describes the input-output voltage characteristics of an SCL inverter. The differential inputs, as in Figure 24, controls the overall current flow through the circuit. When input voltage for the NMOS N1 tops that of NMOS N2, the output voltage, \( V_{out2} \), at that point starts to lower down and attains a steady state. During this time \( V_{out1} \) gets charged up to the supply voltage level via PMOS P1. The output swing, \( V_{sw} \), achieved can be defined by the following (5),

\[
V_{sw} = V_{out1} - V_{out2} = 2 \cdot R_d \cdot I_{ss} = 2 \cdot V_{del},
\]

(11)

where \( I_{ss} \) is the bias current, \( R_d \) is the equivalent load resistance for the PMOS devices, and \( V_{del} \) is the voltage drop across the PMOS P2. This provides an additional advantage of using differential logic by having an output swing two times the actual drop across the load. The optimal performance in such a topology can be achieved, with the total current flowing through the circuit being equal to \( I_{ss} \) and having a smaller load resistance for a smaller delay.
From the inverter circuit it can be seen that the two PMOS transistors perform the task of converting current level biasing in SCL to the voltage level. Both the PMOS have their source and body connected to the supply voltage and has a smaller source to drain voltage. Thus these two PMOS transistors operate at linear region and can be portrait as load resistors with equivalent values. If that value is taken to be \( R_d \) then the differential output voltage of the SCL inverter will equal to

\[
V_{\text{out}} = V_{d1} - V_{d2} = -R_d (i_{d1} - i_{d2})
\]

Thereby the minimum differential input (\( V_i \)) is needed for complete switching of the output is equal to

\[
V_i = \sqrt{2 \cdot I_{ss} \cdot \frac{W}{L}}
\]

where \( I_{ss} \) is the tail bias current propagating from the current source. This will provides the necessary equations for the transfer characteristics for different mode of operation of the SCL. The transfer characteristic equation is

\[
\left| V_{\text{out}} \right| = \begin{cases} 
R_d \cdot I_{ss}, & V_i < -\sqrt{2 \cdot I_{ss} \cdot \frac{C_{ox}}{L} \cdot \frac{W}{L}} \\
-R_d \cdot I_{ss}, & V_i > -\sqrt{2 \cdot I_{ss} \cdot \frac{C_{ox}}{L} \cdot \frac{W}{L}}
\end{cases}
\]
This equation corresponds to the waveforms in Figure 25 and thus also represents the input-output transfer function of the SCL inverter. The plot provides the transfer function for both 45 nm and 65 nm. For 45 nm the saturation region output voltage is lower than 65 nm because higher \( \frac{W}{L} \) ratio for 45 nm. The widths for both processes were kept under the same value, hence smaller \( L \) size will decrease \( V_i \) according to (13) and by (14) the output response will smaller for 45 nm.

5.2. **Sub-threshold source coupled logic (STSCL)**

The SCL topology discussed previously brought the fact that to use differential logic styles, in place of CMOS logic and then achieve optimum performance; the output swing has to be strong enough for one differential gate to completely drive the other. From an NMOS device operating region point of view, if the NMOS device of the next stage has to be operated at saturation region, the \( V_{sw} \) of the first stage needs to be larger than \( 4 \cdot n \cdot v_T \) [8], where \( v_T \) is the thermal voltage and \( n \) being the sub-threshold slope factor (varies with the operating temperature and process). But for sub-threshold or linear region operation this swing equals to \( 4 \cdot n \cdot v_T \). To acquire sub-threshold or linear region operation the PMOS load device has to be modified in order to provide high equivalent resistance. This also removes the dependency of \( V_{sw} \) on the slope factor; hence process variations will not have decisive impact on the logic, operating at sub-threshold region. Thereby operating the SCL logic at linear region will result in further supply voltage scaling, which previously was not possible. Most importantly running at sub-threshold region allows sub-threshold leakage to be negligible compare to CMOS logic, also gate leakage (explained more elaborately in the later chapters) gets smaller due to the usage of differential style of sub-threshold SCL.

One more crucial thing that comes into consideration during the performance analysis, is the operation time or speed for the SCL gate running at sub-threshold or linear region. According to (15):

\[
  t_d = \frac{V_{swing} \cdot C_{load}}{I_{bias}},
\]

where the delay for the sub-threshold SCL is inversely proportional to bias current, the trade-off in this case is the choice of bias current as to achieve better performance more \( I_{bias} \) is needed as power consumption is directly proportional towards \( I_{bias} \).

Figure 1 and 3 show the schematic diagrams of the sub-threshold source coupled logic (STSCL) inverter and its voltage-transfer characteristics, respectively. From Figure 1, the PMOS load device [8] that has been used for further studying the behavior of STSCL gates and its operation under sub-threshold region. The PMOS load is mainly implemented by connecting the bulk of a PMOS to its drain causing a formation of reverse-biased diode; Figure 26 shows a cross-section view of the reverse-biased diode formation between drain and the substrate of the PMOS giving a high resistive value. This high resistive value helps the SCL logic to operate at sub-threshold region at a very small applied bias current.

From previous discussions and studies of both SCL and STSCL, it is clear that in order to run a normal SCL gate at sub-threshold region, the applied bias current at the tail of an SCL must be of very small; say a few nano-ampere range or even less. Thus the normal PMOS load concept which has been used in the SCL inverter does not have a resistance high enough to simultaneously lower the bias current and at the same time maintaining a controlled output swing to function accordingly. This requirement has lead to the use of a new kind of PMOS load concept, which will provide a higher resistance compared to normal PMOS and will allow the output swing to be fixed at a reasonable and controlled value even at very small biasing current. The normal PMOS, if considered, with its body connected to
the supply, operates at the linear region; hence it will still not provide a large resistance as the channel length is small. For the modified PMOS load device, with its drain connected to the body of MOS, when the applied \( V_{\text{source} - \text{drain}} \) voltage becomes greater than zero the device itself starts to provide high resistive value.

For the modified PMOS load device, with its drain connected to the body of MOS, when the applied \( V_{\text{source} - \text{drain}} \) voltage becomes greater than zero the device itself starts to provide high resistive value.

The cross-section diagram of Figure 26 shows the tying down of the n-well with the p-substrate to form the reverse bias diode at the output node. Impacts of parasitic capacitance might come into consideration but with the MOS device sizes being 65 nm and below, the parasitic impacts will be very small and hence can be ignored. The parasitic components for the diodes present in the load device are shown in Figure 26. The equation for the equivalent resistance of the this new PMOS load is given by,

\[
R_{\text{dnew}} = \frac{n_p \cdot U_T \cdot \left( \frac{V_{\text{SD}}}{e^{V_G}} \right) \cdot \left( 1 - e^{V_{\text{SD}}/U_T} \right)}{n_p - 1 \cdot e^{V_{\text{SD}}/U_T} + 1}
\] (16)

From the equation, the \( R_{\text{dnew}} \) can be controlled by the source to drain current flow which for an STSCL circuit is equivalent to the bias current. Also from the equation, the resistance is exponentially dependent on the source to gate voltage allowing further tune-ability of the resistive value over a large range. The tuning of the resistive value feature allow the STSCL gate to be operated at different region and conditions without modifying much of the devices internal parameters like size, etc.

The most important parameter for an STSCL gate to run properly and consume less power is to control the amount of \( I_{\text{bias}} \) flowing through the logic. Theoretically for the STSCL, \( I_{\text{bias}} \) can be reduced to a value closes to the sub-threshold leakage through the circuit. As mentioned above a voltage swing equivalents to \( 4 \cdot n \cdot V_T \) is enough to drive an STSCL gate in sub-threshold region, but for maintaining a controlled operation over that region requires a stable and controlled biasing. This is achieved by using a biasing circuit that will keep the swing at the output under check and on a desired value. An example of simple bias circuit used for the STSCL inverter gate is shown in Figure 28.

Figure 27 shows the transistor view of conventional load devices (left) that are commonly used and the STSCL load device (right) required for sub-threshold operation, respectively.
The bias circuit in Figure 28 should be attached with the STSCL gates for proper operation. The current mirror section of the bias circuit must function properly. Otherwise, if there is any small deviation on the generated bias current from the bias circuit, it will counter with a large deviation across the PMOS load devices, hence this circuit is more sensitive towards mismatch due to process deviations. It is important, prior to designing the bias circuit, that the NMOS devices in the bias circuits has to be taken of higher threshold and they should contribute low leakage, otherwise the output swing will deviate considerably and the power consumption might also increase. To redeem this problem the amplifier's gain can be tuned in such a way, that a swing higher than \( 4 \cdot n \cdot v_T \) can be achieved, without disrupting the required performance.

Further on, the amplifier should have a small offset voltage. For simplicity, an ideal amplifier has been used in our simulation cases. Figure 30 through 32 show deviations of the DC response respect to gain for the STSCL inverter, the deviation of \( V_{bn} \), and \( V_{pn} \) node of the bias circuit due deviation \( I_{bias} \) and amplifier's gain at a supply voltage of 0.4 V respectively. The plot in Figure 32 gives a clear indication that having a higher gain for the amplifier will cause high voltage of magnitude significantly above the supply voltage, hence a small gain amplifier must be chosen for system implementation using the STSCL gates.
Figure 30 shows the characteristic variation with the gain of the amplifier in the bias circuit is represented. It is evident from the plot that having a higher gain causes the DC characteristic of an STSCL inverter to change completely. At a gain of ten, a zero DC response for the STSCL inverter can be observed. Thus having an amplifier with smaller gain is required for designing the complete STSCL logic gate.

Adding this bias circuit causes the overall area for an STSCL gate to be much higher than CMOS, but if saving power and energy consumption is a prime focus and concern then area overhead rise can be taken more flexibly and lightly. Also one has to remember that to run CMOS at low supply in attempts to save power, requires designers to choose significantly larger MOS devices.
Figure 31: Variation of $V_{bn}$ node of the bias circuit respect to $I_{bias}$ at 0.4 V supply.

Figure 32: Variation of $V_{pm}$ node of the bias circuit respect to gain at 0.4 V supply.

Figure 33 and 34 show the variation in the DC response of the STSCL inverter as function of the widths of the NMOS (differential network) and PMOS (load device), respectively. From the figures it is clear that the NMOS network for the differential circuit needs to be wider for better performance and output swing. The PMOS width on the other hand as discussed earlier needs to be low to get high resistance value for better controlling the output swing. Another way to get a higher resistance value for the load device is, by taking long-channel PMOS device, but that means using a variable process MOS in design. Thus to ignore process variability in the design, same channel length for the MOS has been used. The plot in Figure 31 shows how the transient output response of the $V_{bn}$ node from the mirror circuit varies with the bias current. The notable point here is that he supply was kept constant at 0.4 V.
and to have a voltage close to 0.4 V at the node $V_{bn}$ (which will allow the source follower NMOS to switch ON and propagate a current equivalent to the applied bias current) a bias current of minimum 40 nA. This value is higher but it has to be reminded that this value can be pull down to a much lower amount by resizing the differential NMOS network STSCL gate. This will have a drawback by increasing the area overhead, which again has to be considered of as a lower priority as lowering the power and energy consumption is the main focus in this work.

In the next section, it is discussed how the STSCL allows a reduction of the leakage current. The concept of sub-threshold circuits normally allows better resistance over the major leakage components that the current nano-meter MOS technology faces. Later on, we present how much power can be saved by applying such logic styles for both 45 nm and 65 nm processes.

![Figure 33: An STSCL inverter DC response variation with respect to NMOS widths of the NMOS differential network at 0.4 V supply.](image)

### 5.3. STSCL for leakage reduction

In the previous section the discussion was about the leakage currents that will have impact on the modern nano-scale level systems, where the gate leakage is getting significantly higher for CMOS geometries below 65 nm level. It is also noticeable that there are reduction schemes that help without harming much of the system's integrity and performance. Our next focus is on the sub-threshold source coupled logic (STSCL) and its advantage over other schemes in handling leakage reduction.

The main advantage of using STSCL is the possibility of running the gates in sub-threshold region. This allows the impact of having any sub-threshold leakage current, to be negligible. Only leakage that might have any significance is the gate leakage. The plots in Figure 36 show the gate current flowing for the two inputs for STSCL at 45 nm and 65 nm process technologies. By considering the two plots for 45 nm processes, the average gate current that will flow corresponding to 0.4 V input (this is equal to the
supply voltage level for all the STSCL designs in following chapters) equals to, (349.6 fA – 156.9 fA), 192.7 fA. This is a very low value considering the fact that, this is only significant for leakage. The Figure 35 below shows the gate leakage contribution graph of the CMOS with CVL features (five PMOS diodes). At 0.4 V of input voltage the gate leakage for the 45 nm processes is less, 112.6 fA, compared to an STSCL, but the CMOS logic with the CVL feature [16] does not operate properly at 0.4 V. The required minimum input voltage required should be above 0.4 V. From the graph it is noticeable that at 0.5 V input gate leakage is 249.9 fA. Also one has to keep in mind that the sub-threshold leakage is also there for the CMOS-CVL logic, which in case of an STSCL is not present and neither is a part of concern. Now, one thing which is not mentioned is the dynamic power consumption, which in the case of an STSCL is less at low supply voltage. One side effect that may arise is the probable rise in propagation due to lower supply voltage, when a whole system is designed with STSCL gates running at 0.4 V supply. In chapter 3, digital circuits implemented with the STSCL gates and then its performance and power consumption were compared to that of CMOS.

![Figure 34: An STSCL inverter DC response variation with respect to PMOS width of the PMOS load device at 0.4 V supply.](image)

Dept. of Electrical Engineering, Linköping University, LiTH-ISY-EX--11/4465--SE
Understanding sub-threshold source coupled logic for ultra-low power application

Figure 35: Gate leakage current vs. supply voltage for a CMOS inverter with CVL (five diodes).

Figure 36: Gate leakage current vs. supply voltage for an STSCL inverter.
5.4. Ring oscillator operation

To understand and imply the concepts of the STSCL, a seven-stage ring oscillator circuit as shown in Figure 37 was created and its output was analyzed for different supply voltages. Also the oscillation frequency range was observed as well as the corresponding power dissipation to find the circuit limitations and the maximum frequency that could be achieved in an STSCL-based ring oscillator. Figure 38 provides the output response for the ring oscillator at 0.5 V supply. From Figure 37 it can be observed that the differential input-output wiring and also the driving of all the seven stages by a single bias circuit.

![Figure 37: Seven-stage STSCL ring oscillator block diagram.](image)

The plot in Figure 39 shows the range of the output oscillation frequency that can be attained by varying the supply voltage for an SCL based ring oscillator. The simulation results go in coherence with the basic concept of a ring oscillator where the frequency of oscillation increases with increasing the supply voltage.

![Figure 38: Seven-stage STSCL ring oscillator output at 0.5 V supply.](image)

For better understanding the advantage of STSCL over other logic styles, similar oscillators have been designed using different logic blocks and the results are compiled in Table 1. The bias current, $I_{\text{bias}}$, has been varied for the STSCL oscillator to get proper oscillations at the output. One notable point to be mentioned about the table is the difference in output frequency, for each oscillator implemented with
Understanding sub-threshold source coupled logic for ultra-low power application

different logic inverter gates. For the STSCL inverter output oscillation frequency is less and with lower voltage supply the frequency gets further lowered which is understandable, but the problem lies in comparison with the CMOS and the CMOS-CVL. Even though power consumption is lower for the STSCL a difference is found in the output frequency, so to a certain extent this comparison is not viable. This is however not a matter of high concern from an application perspective, as it is less likely that the STSCL will be used for any application with operating frequency ranging above Megahertz.

Note: The output frequency is measured at Gigahertz range, whereas STSCL will be applied to use for very low voltage and low frequency application.

![Figure 39: Oscillation frequency range and respective power consumption over different supply voltage for an STSCL oscillator.](image)

**Table 1: Power consumption comparison for seven-stage ring oscillators for 65 nm process**

<table>
<thead>
<tr>
<th>Supply 0.7 V</th>
<th>CMOS</th>
<th>CMOS-CVL</th>
<th>STSCL</th>
</tr>
</thead>
<tbody>
<tr>
<td>Output frequency (GHz)</td>
<td>1.67</td>
<td>1.34</td>
<td>0.99</td>
</tr>
<tr>
<td>Power consumption (μW)</td>
<td>12.6</td>
<td>10.19</td>
<td>6.48</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Supply 0.5 V</th>
<th>CMOS</th>
<th>CMOS-CVL</th>
<th>STSCL</th>
</tr>
</thead>
<tbody>
<tr>
<td>Output frequency (GHz)</td>
<td>0.6778</td>
<td>0.5400</td>
<td>0.3993</td>
</tr>
<tr>
<td>Power consumption (μW)</td>
<td>3.21</td>
<td>2.09</td>
<td>1.01</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Supply 0.4 V</th>
<th>CMOS</th>
<th>CMOS-CVL</th>
<th>STSCL</th>
</tr>
</thead>
<tbody>
<tr>
<td>Output frequency (GHz)</td>
<td>0.2982</td>
<td>0.2378</td>
<td>0.1895</td>
</tr>
<tr>
<td>Power consumption (μW)</td>
<td>0.7285</td>
<td>0.5897</td>
<td>0.3577</td>
</tr>
</tbody>
</table>
6. SUB-THRESHOLD SOURCE COUPLED LOGIC

In this chapter the focus will be on different gates implemented by using the STSCL. The circuit level diagrams of all the gates are given along with their output responses. All the circuits are simulated and analyzed at conditions from worst case to normal. The results indicate the performance of each gates under conditions which will allow us to understand the limitations of STSCL, so that during designing of systems nominal parameters of the STSCL can be used which will generate an optimum level output for the corresponding system.

6.1. STSCL logic gates

The logic gates that are designed by using the STSCL are mainly an XOR gate, AND gate, OR gate, and a D-latch. The schematics are given in the following figures.

![Figure 40: Schematic of an STSCL XOR gate.](image1)

![Figure 41: Schematic of an STSCL AND gate.](image2)
The D-flip-flop is designed by cascading two of these D-latches with feeding opposite phase clock signals as shown in Figure 44. This way of connecting these two latches is called master-slave as the second latch changes in response to the first one. This D-flip-flop is negative edge triggered, which means when clock signal makes transition from logic high to low the master latch stores input value and itself goes low, during the same time the slave latch, that has an inverted clock, goes to high value which allows input signal to pass to slave latch from master latch.

Figure 42: Schematic of an STSCL OR gate.

Figure 43: Schematic of an STSCL D-latch.
6.2. Responses for the STSCL logic gates

In this section the responses for the logic gates (previous section) are simulated and shown. All the results are compiled in tables below and the supply voltage is 0.2 V and the bias current $I_{bias}$ is 250 pA. The supply voltage has been scaled down to 0.2 V in order to check the gate performance and operation at $-20^\circ C$. The operation occurs properly with 0.2 V supply within temperature range of $-20^\circ C$ to $70^\circ C$. The results are shown in Table 2 through 5 respectively for all four gates.

The results in the tables indicate that the operating time gets longer for temperatures below $0^\circ C$, but having lower power consumption. In this chapter, we only show the input/output responses for only one kind of pattern and with a clock period of 25 $\mu s$. Simulations have however been carried out for the different scenarios. The results in Figure 45 through 48 for the XOR, AND, OR, and DFF, respectively, are captured at $-20^\circ C$, mainly to observe the gate functionality being satisfied at the worst critical point.

It is seen that for a 0.2-V input swing, all the output responses of the respective gates, and latch can be resolved and proper operation is thus guaranteed.

<table>
<thead>
<tr>
<th>Temperature (°C)</th>
<th>20</th>
<th>70</th>
</tr>
</thead>
<tbody>
<tr>
<td>Delay ($\mu s$)</td>
<td>2.52</td>
<td>0.1</td>
</tr>
<tr>
<td>Power (nW)</td>
<td>0.15</td>
<td>1.11</td>
</tr>
</tbody>
</table>

Table 2: Performance comparison of an STSCL XOR gate at different temperatures.

<table>
<thead>
<tr>
<th>Temperature (°C)</th>
<th>20</th>
<th>70</th>
</tr>
</thead>
<tbody>
<tr>
<td>Delay ($\mu s$)</td>
<td>1.42</td>
<td>0.14</td>
</tr>
<tr>
<td>Power (nW)</td>
<td>0.15</td>
<td>1.15</td>
</tr>
</tbody>
</table>

Table 3: Performance comparison of an STSCL AND gate at different temperatures.
Figure 45: Output response for an STSCL XOR gate.

Figure 46: Output response for an STSCL AND gate.
Table 4: Performance comparison of an STSCL OR gate at different temperatures

<table>
<thead>
<tr>
<th>Temperature (°C)</th>
<th>-20</th>
<th>70</th>
</tr>
</thead>
<tbody>
<tr>
<td>Delay (μs)</td>
<td>1.49</td>
<td>0.13</td>
</tr>
<tr>
<td>Power (nW)</td>
<td>0.16</td>
<td>1.15</td>
</tr>
</tbody>
</table>

Table 5: Performance comparison of an STSCL D-flip-flop at different temperatures

<table>
<thead>
<tr>
<th>Temperature (°C)</th>
<th>-20</th>
<th>70</th>
</tr>
</thead>
<tbody>
<tr>
<td>Delay (μs)</td>
<td>2.97</td>
<td>0.15</td>
</tr>
<tr>
<td>Power (nW)</td>
<td>0.44</td>
<td>3.37</td>
</tr>
</tbody>
</table>

The output response of the D-flip-flop can be seen to transit from logic low to high slower than from logic transition of high to low. The reason is due to the master-slave configuration, for which the capacitive load for this gate is comparatively higher than the other previously mentioned gates, and also as the logic network is implemented by the NMOS transistor, thus the charging period takes little longer than the discharging period (it is to be noted an NMOS transistor passes a strong logic low but a weak logic high).
A few things have to be considered for these gates: the bias current, $I_{\text{bias}}$, is chosen to be 250 pA at a supply voltage of 0.2 V for achieving the output response at minimum given specifications. Below this bias current value the gates no longer provide satisfactory outputs, in other words, do not operate. Even though it is suggested in some articles that the bias can be lowered down to further pico-ampere range we conclude that this can only be achieved for MOS devices with longer channels (90 or 180 nm). In these cases, the gate leakage impact is also significantly less. By going below 65 nm the gate leakage rises, as shown for the NMOS in chapter 1, and thus puts yet a limitation on the scaling of the bias current $I_{\text{bias}}$.

The supply voltage is taken down to 0.2 V and all circuits designed in the later sections are operated at this voltage for temperature $-20^\circ C$ and $70^\circ C$. The power consumption shown for each gate in their respective tables is an average value, calculated by different input patterns with period 100 $\mu s$. For the D-flip-flop, clock period was 25 $\mu s$ while keeping the data rate same as before.

6.3. Implemented digital circuits

The full-adder was used for our digital filter implementations and below we give some more background to these components.

6.3.1. Full adder

A full adder adds binary numbers and three single bits. The full adder is important when a carry input emerge and need to add with the other two binary numbers to achieve the correct result. To create a full adder (as shown in Figure 49) two half adder can be used with one OR gate as shown in the figure below.
Here A and B are used as inputs for gate 1 and gate 2. Thus one half adder can be obtained. The output of gate one is using as one of the input of the second half adder which is the combination of gate 3 and gate 4. The second input of that half adder is the carry from the previous circuit. The output of gate 2 and gate 4 is the input of the gate 5 which is considering as carry of those half adders. A truth table for the adder is shown in Table 6. For STSCL, the design is same except for the fact that input/output for the gate will have differential dual rails.

Table 6: Full adder truth table

<table>
<thead>
<tr>
<th>A</th>
<th>B</th>
<th>Cin</th>
<th>Sum Out</th>
<th>Carry Out</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

6.3.2. Digital filter

A system which can execute mathematical operations on a sampled, discrete-time signal to lessen or improve some specific aspect of that signal is known as a digital filter. A digital filter can typically be divided into two types: FIR (finite-length impulse response) and IIR (infinite-length impulse response) filters.

The reason for using FIR filters is largely due to its linear-phase response which is suitable in the field of communication and in speech and image processing systems, etc. Figure 50 shows an example architecture for an FIR filter.
The general transfer function of an FIR filter can be expressed as

\[ H(z) = \sum_{n=0}^{M} b[n] \cdot z^{-n} = h[0] z^M + h[1] z^{M-1} + \ldots + h[M-1] z + h[M] \frac{1}{z^M} \]  

(17)

where \( h(n) \) is the impulse response and \( M \) is the order of the filter.

The IIR filters have an impulse response which is non-zero over an infinite length of time. The transfer function of an IIR filter is generally given by

\[ H(z) = \frac{\sum_{i=0}^{P} b_i z^{-i}}{1 + \sum_{j=1}^{Q} a_j z^{-j}} \]  

(18)

Since the impulse response of the IIR filter is infinite there has to be a feedback path in the filter. It is more effective than FIR when narrow transition bands are required. IIR filters are also used for noise cancellation schemes. Figure 51 shows an example of an IIR filter architecture.

**A comparison between FIR and IIR filters**

The FIR filter is characterized by

1. The FIR filter depends on previous input samples. This means that the output response no longer depends on previous output samples but rather depends on the previous input samples. Due to this fact, we always get a finite impulse response from the filter.

2. The FIR filter has linear-phase characteristics. This means that the group delay is constant and has equal delay times for all components’ relationship with frequency. FIR filters are often designed to be linear-phase, mainly to be used for applications with linear-phase characteristics. The linear-phase characteristic is obtained by using a symmetric coefficient
sequence around a center coefficient. This means that the first coefficient will be identical to the the last, the second coefficient will be equal to the second last and so on.

3. The major problem with FIR filters is that they require more memory space than IIR filters, as the FIR filters are not recursive, and consist only of zeros. Thus, for the same specification the FIR filters need to store more coefficients. This will also eventually lead to longer computational delay for FIR filters.

4. The FIR filters do not require any extensive stability analysis with respect to e.g. round-off errors.

![Figure 51: A block diagram of IIR filter. The 'D' block is a unit delay. The coefficients and number of feedback paths are implementation-dependent.](image)

The IIR filter is characterized by

1. The output of the IIR filters, unlike FIR filters, depend on both previous input and output samples.

2. The IIR filters do not have linear-phase response or characteristic, hence they are used for applications where this is not of big concern. Due to this characteristic, the group delay for IIR filters vary with frequencies causing phase distortion to occur.

3. IIR filters can be unstable as they have both zeros and poles, unlike FIR filters which have only zeros, thus any poor or mediocre design will result in the poles and zeros to lie outside the unit circle of the z-plane and hence cause unsuitability.

4. As IIR filters are recursive consist for zeros and poles, they require less number of memory space than FIR filters. This allows IIR filters to have shorter delays and hence have high computational efficiency. Also due to the IIR filters being recursive, feedback is involved in their design and thus they serve as a better alternative for digital feedback systems compare to FIR filters.

5. The major problem with IIR filters is the complexity and difficulty in implementing or designing the filters, as any distortion or flaw in the design will alter the pole and zero position. Such alteration will eventually result instability in the filter.
In our case, the FIR filter was chosen due to its simple design method and since it also provides better output responses even though it uses more space.

6.4. **Performance measurement parameters**

We outline some typical performance parameters for a digital system in this section, namely the power consumption, the power-delay product (PDP) and slew rate (SR).

6.4.1. **Power consumption**

The power consumption for a circuit or a system, is a global property, although it may have been considered as a performance parameter for both local and global perspectives of the system. In this work, the power consumption is a crucial parameter as we compare the options to have a sub-threshold differential logic style over traditional CMOS.

As described in chapter 2 the two main sources of power dissipation in digital circuits are dynamic power consumption and static power consumption. In CMOS there is an extra power dissipation due to the simultaneous switching ON of both PMOS and NMOS, also known as short circuit power dissipation. This occurs however for a very short period of time in STSCL gates. The dynamic consumption occurs for the continuous charging and discharging of the capacitive loads during the time when a system performs computation. The static consumption occurs during the time when a system has no input or it is not performing any computation. All these mentioned power consumptions are dependent on the intrinsic capacitance of the MOS devices. So MOS sizing is a crucial parameter when reducing power consumption is a prime focus.

In this thesis work, for all the digital circuits designed in the STSCL and CMOS, both dynamic and static power consumption are calculated. The dynamic power dissipation was calculated for different input patterns, for all the designed circuits and then the average of those values were taken to provide the average dynamic power. The static power was calculated for all the circuits, with providing no input and then measuring the consumed power. The sum of the average dynamic and static power consumption was taken as the total power consumption for each circuit.

6.4.2. **Power delay product (PDP)**

The performance of digital circuits can also be defined and evaluated in terms of their power-delay product (PDP). The PDP is mainly the energy consumption that the circuit will use during a certain time period to complete a task or in other words the product of the power consumption required to complete one computation with the time period taken to complete the computation. This parameter portraits the efficiency and effectiveness of a system in respect to both power and delay. The PDP is calculated by multiplying the propagation delay of a circuit or system with the overall power consumption.

\[
PDP = P_{\text{total}} \cdot t_{\text{delay}},
\]

where \( P_{\text{total}} = P_{\text{dynamic}} + P_{\text{static}} \) is the combination of the dynamic and static power consumption and \( t_{\text{delay}} \) is the propagation delay.

The PDP for an STSCL gate, is calculated in the following way. First we start with the power dissipation as such:
\[ P_{\text{diss}} = V_{\text{dd}} \cdot I_{\text{bias}} \]  

(20)

where \( V_{\text{dd}} \) is the supply voltage and \( I_{\text{bias}} \) is the bias current passing through the whole gate. The \( t_{\text{delay}} \) (gate delay) of an SCL gate is equal to

\[ t_{\text{delay}} = \frac{\ln 2 \cdot C_L \cdot V_{\text{swing}}}{I_{\text{bias}}} \]  

(21)

where \( C_L \) is the load capacitance and \( V_{\text{swing}} \) is the output voltage swing. Equation (21) shows that the gate delay \( t_{\text{delay}} \) depends on the swing, transistor’s physical capacitance, and the bias current. Since the voltage swing is also dependent on the bias current, the delay of the gate has to be kept under control. This will guarantee a desirable performance of the gate. Thus, the bias current is an important parameter (assuming the physical capacitance of the gate transistors are not varied). The power-delay product equals to

\[ PDP|_{\text{STSCL}} = \ln 2 \cdot C_L \cdot V_{\text{swing}} \cdot V_{\text{dd}} \]  

(22)

Further on, the maximum operating frequency for the STSCL gate is equal to

\[ f_{\text{max}} = \frac{1}{2} \cdot t_{\text{delay}} \]  

(23)

From (21) it can be observed that the bias current controls the maximum achievable operating frequency for an STSCL gate. This allows us to achieve a lower PDP by supply voltage reduction without varying the delay and performance of the gate. In case of CMOS logic gates, this is not possible as the PDP for CMOS equals to

\[ PDP|_{\text{CMOS}} = \left( 1 + \frac{2 \cdot e \cdot V_{\text{dd}}}{\alpha \cdot C_L} \right) \cdot C_L \cdot V_{\text{dd}}^2 \]  

(24)

The equation above also shows a dependency on the supply voltage (just like the STSCL), but now it depends quadratically. However, the gate delay for CMOS, which equals to

\[ t_{\text{delay}}|_{\text{CMOS}} = \frac{C_L \cdot (V_{\text{dd}} - V_T)}{K} \]  

(25)

also depends on the supply voltage, hence lowering the supply voltage to achieve lower PDP will cause the performance of the CMOS gate to degrade. This problem is eliminated for STSCL logic.

From the above equations one can see that for STSCL gates the attainable power-delay product per unit capacitance is equivalent to \( V_{\text{swing}} \cdot V_{\text{dd}} \). This implies that if \( V_{\text{dd}} = 0.2 \) V and the output voltage swing \( V_{\text{swing}} > 0.15 \) V, the power-delay product per unit capacitance will be approximately 0.03 fJ.
6.4.3. Slew rate (SR)

In digital circuits speed is of course one of the prime criteria for establishing the performance of a system. The gate delay has already been discussed and in addition to this delay, also the slew rate determines the obtainable speed of a gate. Slew rate (SR) is defined as the time required for a signal to transition from 10% to 90% of its desired (DC) values. High speed requires a high slew rate and vice versa. The slew rate is defined as

\[
SR = \frac{I_{\text{bias}}}{C_{gs}},
\]

(26)

where \(C_{gs}\) is the gate capacitance of the next stage, i.e., the load capacitance and \(I_{\text{bias}}\) is the bias current, i.e., the maximum output drive current. The slew rate should satisfy the condition

\[
SR > 2 \cdot \pi \cdot f_{\text{max}} \cdot V_{\text{swing}},
\]

(27)

where \(f_{\text{max}}\) is the maximum operating frequency. Theoretically, the maximum operating frequency for the STSCL gates using at 45 nm channel lengths and 675 nm widths and a bias current of \(I_{\text{bias}}=250\) pA, is approximately \(f_{\text{max}}=200\) kHz.

6.5. STSCL digital circuits

The focus on the digital STSCL circuits in this section is to compare the power consumption with CMOS logic to further solidifying the usage of STSCL in designing ultra-low power digital systems. The systems that we have investigated are designed both in the CMOS and STSCL (at 45 nm process technology) using the same overlying architecture for getting an acceptable comparison between the two logic styles. Quite conventional architectures are used and no power optimization techniques, for example pipelining or voltage scaling per se, are applied to any of the systems. The designed digital circuits are:

- 4-by-4 array multiplier,
- fifth-order FIR filter and a 55th-order FIR filter.

All the block diagrams are given in Figure 52 through 55, respectively. The option for choosing a digital filter is mainly because it is a very common system in any DSP (digital signal processor) or mixed-signal systems today.

The supply voltage used for the CMOS logic is \(V_{\text{dd}}=0.5\) V and for the STSCL it is \(V_{\text{dd}}=0.2\) V. The CMOS logic did not operate correctly at supply voltages below \(V_{\text{dd}}=0.4\) V.

The array multiplier in Figure 52 is a very conventional architecture which is realized based on the tree-based multiplier structure [13]. The partial bit products are generated in parallel by AND gates and then added by full adders (FA). For a 4-by-4 multiplier, twelve full adders and 16 AND gates are required. This architecture has a long critical path and consumes quite high amount of energy per computation.

The five-bit serial-parallel multiplier, as shown in Figure 54, is used for assigning fixed coefficients that are to be multiplied with the input bit stream, \(x_i\). The serial-parallel multiplier has been used in the fifth-order filter but for the fifty-fifth-order filter, eight bit serial-parallel multiplier were used. In this kind of multiplier the input data bits enter serially and gets multiplied in parallel with the binary coded two's complement form of the fixed coefficients. In Figure 54 the serial-parallel block diagram uses the concept of carry-save adders. The D flip-flops have reset ports and have to be reset at the start of any computation. Further hardware resources can be saved for this design by using canonical signed digit implementation [15] of the coefficients. A lot of full adders and AND gates could have been saved when utilizing that kind of digit representation, however any optimization was not the focus of this thesis as understanding the STSCL styles and the corresponding advantages over CMOS was the main target.
Figure 52: Block diagram of a 4-by-4 array multiplier.

Figure 53: Fifth-order FIR filter.
Understanding sub-threshold source coupled logic for ultra-low power application

For both the fifth and fifty-fifth-order FIR filters, the chosen architecture used is very common. FIR filters, in practical cases, are realized by both recursive and non-recursive algorithms [15]. A non-recursive algorithm is normally chosen due to its stability and ability in redeeming parasitic oscillations. The structure is also called transposed direct form, which follows a serial input and serial output characteristics with the structure derived from applying transposition theorem on the direct form structure of a FIR filter. From Figure 53 and 55 it can be seen that the structures for the filters are same, except for the fact that the number of multipliers, adders, and delay elements varies due to different filter orders. For both filters the outputs are generated at the $D_{\text{out}}$ terminal (as given in the Figure 53 and 55) by each clock cycle for an input sequence provided at the $D_{\text{input}}$ terminal.

The lower-order filter was first tested by applying STSCL gates to check whether it performs correctly. Test benches are designed for each gates and system. All the simulations are run with Cadence spectre simulator (i.e., full spice) and the analysis are carried out to observe the system behavior and variation with temperature, supply voltage, and bias current $I_{\text{bias}}$. A supply voltage of $V_{dd} 0.2$ V was chosen, mainly to test the STSCL gates at their lowest possible configuration. Simulation results and more information on the results are given in the next section. The results include the energy consumption rate and delay along with the difference in area usage for the STSCL in comparison with CMOS logic style.
6.6. Simulation and analysis of the systems

The test bench architectures for the array multiplier, fifth-order and fifty-fifth-order filter are provided in Figure 56 through 58, respectively. For the 4-by-4 array multiplier (with its test bench shown in Figure 56) the input sequence generator generates two sets of parallel input sequences (both inverted and non-inverted). Each sequence is four bits long and are then fed to the two inputs of the array multiplier. The non-inverted eight output bits are passed through a binary-to-digital converter (right in the figure) to obtain the digital value (OUT) for verification of the multiplier.

For the two filter test benches in Figure 57 and 58, respectively, the testing method is quite similar. The input sequence generator generates a serial bit sequence at a clock frequency of 44.1 kHz. The output bits are also taken serially, but the binary values are dumped to a log file, which is later on read with MATLAB and the corresponding responses for the respective filter, as shown in Figure 59 and 60 are found.

The filter architectures given in Figure 53 and 55 are quite conventional representation of FIR filters and hence no high-level RTL modeling was performed in this work as the system's functionality could be verified quite easily with Cadence and MATLAB. The coefficients used for the fifth and fifty-fifth-order filters are given in the Appendix.

The magnitude responses for both filters are given in Figure 59 and 60, respectively. For the fifth-order filter a low pass response is observed according to the specifications in Table 7 and for the fifty-fifth-order a notch filter response is observed whose specification is provided in Table 8.

For the 4-by-4 multiplier several input sequences are given and both the four-bit parallel input and output responses are taken as binary forms. These are then converted to decimal values in order to compare with the values given the original inputs. To understand the performance of the array multiplier using STSCL, a propagation delay versus bias current plot is shown in Figure 61. The supply is in this example $V_{dd}=0.4$ V. From the plot it is noticeable that the propagation delay for the multiplier
Understanding sub-threshold source coupled logic for ultra-low power application

decreases by increasing bias current, $I_{bias}$, which goes well along with the theory discussed in chapter 5, i.e., the speed of STSCL gates is proportional to the bias current, $I_{bias}$.

For the fifth-order filter, the power consumption and the PDP is shown in Table 9. We find the the difference in energy consumption for the filter designed in CMOS and STSCL, respectively. For CMOS, the supply voltage was 0.5 V and for STSCL it was 0.2 V.

Figure 57: Test bench for a fifth-order filter.

Figure 58: Test bench for a fifty-fifth-order filter.
Understanding sub-threshold source coupled logic for ultra-low power application

Table 10 shows the differences between STSCL and CMOS in terms of energy consumption, propagation delay, and also approximate area. Also here, for CMOS, the supply voltage was 0.5 V and for STSCL it was 0.2 V.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Specification</th>
<th>Unit</th>
</tr>
</thead>
<tbody>
<tr>
<td>Sampling frequency</td>
<td>44.1</td>
<td>KHz</td>
</tr>
<tr>
<td>Passband frequency</td>
<td>3</td>
<td>KHz</td>
</tr>
<tr>
<td>Stopband frequency</td>
<td>6</td>
<td>KHz</td>
</tr>
<tr>
<td>Stopband attenuation</td>
<td>-25</td>
<td>dB</td>
</tr>
</tbody>
</table>

Figure 59: Magnitude response of a fifth-order FIR filter.

Figure 60: Magnitude response of a fifty-fifth-order FIR filter.

For the 4-by-4 array multiplier Table 10 shows the differences between STSCL and CMOS in terms of energy consumption, propagation delay, and also approximate area. Also here, for CMOS, the supply voltage was 0.5 V and for STSCL it was 0.2 V.
Table 8: Output specification for fifth-order FIR filter

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Specification</th>
<th>Unit</th>
</tr>
</thead>
<tbody>
<tr>
<td>Sampling frequency</td>
<td>44.1</td>
<td>kHz</td>
</tr>
<tr>
<td>Notch frequency</td>
<td>11.5</td>
<td>kHz</td>
</tr>
<tr>
<td>Stopband attenuation</td>
<td>-50</td>
<td>dB</td>
</tr>
</tbody>
</table>

Table 9: Power consumption and PDP comparison for fifth-order FIR filter

<table>
<thead>
<tr>
<th>Logic</th>
<th>STSCL</th>
<th>CMOS</th>
</tr>
</thead>
<tbody>
<tr>
<td>Temperature</td>
<td>-20</td>
<td>70</td>
</tr>
<tr>
<td>Delay</td>
<td>9.47</td>
<td>1.06</td>
</tr>
<tr>
<td>PDP</td>
<td>600.4</td>
<td>45.81</td>
</tr>
</tbody>
</table>

Table 10: Power consumption and PDP comparison for 4-by-4 array multiplier

<table>
<thead>
<tr>
<th>Logic</th>
<th>STSCL</th>
<th>CMOS</th>
</tr>
</thead>
<tbody>
<tr>
<td>Delay</td>
<td>52.47</td>
<td>38.65</td>
</tr>
<tr>
<td>PDP</td>
<td>302.4</td>
<td>313.15</td>
</tr>
<tr>
<td>Area</td>
<td>330</td>
<td>100</td>
</tr>
</tbody>
</table>

The area required for the STSCL is considerably higher than for CMOS, but it must be remembered that the STSCL logic is being run at a very low supply voltage compared to CMOS. Even though running at such low supply the overall energy consumption for the STSCL is comparatively less. Further on, if we would run the CMOS gates at a supply below 0.5 V it would then require much larger size to be able to meet the same requirements.

For the fifty-fifth-order filter energy consumption is considered to be the prime focus when choosing logic styles. Using STSCL, the energy consumption came to 1.1388 nJ at a bias current of $I_{bias} = 250$ pA with $V_{dd} = 0.2$ V supply compared to CMOS which was 1.3857 nJ. It should also be mentioned that several bias circuits have been used to drive the STSCL logic. It would be possible to save even more energy by reducing the number of bias circuits. This is however left as future work.

The output waveforms while simulating in Cadence, may give high surges or spikes due during clock transition, at least when the clock frequency is comparatively high. In such cases, those spikes can be eliminated by using additional buffers at the output. This may also add to the total power consumption.

From the simulation results it can be observed that the STSCL provides good performance compared to the CMOS at very low supply voltages but with a drawback of larger area. For a designer, if the application can tolerate a larger area, the STSCL logic allows the system to be run at very low supply voltages and using low bias currents (scaling to 250 pA), hence reducing the error consumption to a very low level. It is important to know that the circuits designed with CMOS logic has not been tried to run at sub-threshold region for this thesis work as mentioned earlier, it is very difficult to guarantee functionality of CMOS logic at sub-threshold region.
Figure 61: Propagation delay vs. bias current.
7. CONCLUSIONS

Sub-threshold region operation is highly beneficial in order to reach a very low power-energy consumption specification. This type of specification would be required for implementation of e.g. modern-day distributed sensor networks. Designers nowadays apply power reduction techniques in every level of abstraction in order to achieve optimum level of power reduction. These reduction schemes must be applied such that the importance of maintaining the system performance is not jeopardize.

This thesis deals with power-energy reduction methods at the circuit-level abstraction. Sub-threshold differential logic styles have been described and motivated by designing two FIR filters if different lengths. Practically, these filters can be used in the DSP section of for example smart dust sensors. The experiments have been performed for a fifth-order and fifty-fifth-order filter (fifty-fifth-order is chosen to test the STSCL logic operation and performance for a larger system). The important aspect of the experiment has been to observe the impacts of the PVT variation on filter performance using an STSCL cell library. The results obtained for the STSCL based filters operating at $-20^\circ C$, show that there is indeed a reduction in energy consumption over CMOS-based filters, without too much degradation in performance. This observation puts notability on using the STSCL for building smart dust sensors that have to operate under harsh conditions. Although the reduction in energy is not significant, one must remember that effort on PDP improvement for STSCL gates have not been performed in this thesis work. The digital designs that have been used are not energy efficient and also any kind of output buffers have not been introduced (in any phase of design) in order to improve the gate performance. The purpose of this work has partially been to suggest STSCL gates as a replacement over conventional CMOS logic gates for applications that require very low operating voltage and also consume ultra-low energy. Table 11 shows a comparison of the parameters/specifications used to run CMOS and NCL (null convention logic) [22] at sub-threshold logic with the STSCL.

The area required for designing the STSC cell library is notably higher than for CMOS. The area drawback is tolerable when ultra-low energy consumption per computation is of high priority. Also, further minimization of the bias current is possible by resizing the differential NMOS network in the STSCL; this will lead to a further reduction in terms of energy consumption.

The major problems that can be faced with STSCL gates for processes, is the presence of higher gate leakage (can possibly be reduced by introducing devices with high-k dielectric), also reduction of bias current would require increasing the width of NMOS devices that can inadvertently increase the impact of the parasitic capacitance and might also increase the dynamic power consumption. Thus, device sizing has to be done properly, before going for circuit implementation with STSCL standard cells [13].

Recent articles and papers show that current CMOS process or CMOS standard cells can be tailored and modified to run at sub-threshold region of operation. The problem with such modifications, however, has always been the inconsistency in maintaining the CMOS gates to perform under sub-threshold region of operation. The reason being that basic CMOS technology is scaled and fabricated to operate under normal threshold condition [14]. Other problems include process parameters, like sub-threshold slope factor, deeply impact CMOS logic operation at sub-threshold region. To mitigate this problem, processes are needed to have better slope factor.

The impact of the different conditions mentioned above can be reduced, which in turn will allow the CMOS gates to run at sub-threshold region in a very stable manner, but the cost increases. Transition from basic bulk CMOS technology to SOI technology, use of double gate CMOS, variable threshold
voltage sub-threshold CMOS [23], etc. are a few examples where device level modification have been done in attempt of using CMOS to run at sub-threshold region for low voltage-low power digital application. All these device level modification will however in the end increase the fabrication cost dramatically and could turn out to be not that cost effective.

In this work, the STSCL standard cell library has been designed using a standard bulk technology without any device-level modification. Thus, the fabrication cost will quite likely be lower compared to sub-threshold CMOS. The only additional cost will contribute, comes from the use of level shifters (both in terms of design area and design complexity) which are required to run the STSCL based system in parallel with the CMOS based systems, as these two sections in a system will run concurrently at different voltage levels. Also other issues involving parasitic capacitance will be less, thus opting to design the digital section of the smart dust sensor mote is highly influential (as discussed in previous chapters), considering that these sections will operate at low frequency, typically in the kilohertz range.

Finally having the STSCL-based digital logic gates operating at a supply voltage as low as 0.2 V without too much performance degradation and also consuming very low energy per computation makes the STSCL-based digital systems highly suitable for designing the digital part of the smart dust sensors.

Table 11: Comparison of STSCL with other logic gates

<table>
<thead>
<tr>
<th>Ref.</th>
<th>Architecture</th>
<th>Tech. (nm)</th>
<th>Applied logic</th>
<th>Temp. (°C)</th>
<th>Supply (mV)</th>
<th>Energy (pJ)</th>
</tr>
</thead>
<tbody>
<tr>
<td>[21]</td>
<td>Ninth order FIR filter</td>
<td>65</td>
<td>Sub-CMOS</td>
<td>25</td>
<td>220</td>
<td>1.33</td>
</tr>
<tr>
<td>[22]</td>
<td>Fifth-order FIR filter</td>
<td>65</td>
<td>NCL</td>
<td>25</td>
<td>300</td>
<td>4.56</td>
</tr>
<tr>
<td>[23]</td>
<td>8-by-8 carry-save multiplier</td>
<td>350</td>
<td>VT-sub-CMOS</td>
<td>25</td>
<td>500</td>
<td>0.672</td>
</tr>
<tr>
<td>[1]</td>
<td>8-by-8 carry-save multiplier</td>
<td>180</td>
<td>STSCL</td>
<td>25</td>
<td>350</td>
<td>1</td>
</tr>
<tr>
<td>This work</td>
<td>4-by-4 array multiplier</td>
<td>45</td>
<td>STSCL</td>
<td>70</td>
<td>200</td>
<td>0.3</td>
</tr>
<tr>
<td>This work</td>
<td>Fifth-order FIR filter</td>
<td>45</td>
<td>STSCL</td>
<td>70</td>
<td>200</td>
<td>0.0458</td>
</tr>
<tr>
<td>This work</td>
<td>Fifty-fifth-order FIR filter</td>
<td>45</td>
<td>STSCL</td>
<td>-20</td>
<td>200</td>
<td>1139</td>
</tr>
</tbody>
</table>
8. FUTURE WORK

The motivation behind this work is primarily to understand the applicability of using sub-threshold source coupled logic for designing digital systems. The results obtained, reflect on the advantages of using STSCL over CMOS. The bias circuit used for designing the STSCL gates contains ideal amplifier and ideal current source and the implications of having a transistor-level version of the amplifier and current source have not been checked. Thus it will be an important aspect, in the design level, to check functionality and performance of STSCL gates with a proper designed bias circuit, composed of non-ideal components. The bias circuit is an important part of the STSCL design process, hence the next stage of the work would involve non-ideal bias circuit.

The next stage would include a further enlargement of the standard cell library by adding extra components (based on the STSCL design) such as multiplexers, ripple carry adders, TFF, three-input NAND/AND gates, three-input NOR/OR gates, 8-by-8 pipelined multipliers, etc.

The work contributes on designing low power systems at circuit level only. Issues like sizing, parasitic capacitance, threshold variation, etc., are common design concerns that must be dealt with and addressed during the design phase. Addressing such issues requires designing the system at layout level too. This would provide accurate figures in terms of area, parasitic capacitance, and power consumption. Also, post-layout simulation will provide more realistic values regarding the performance of the digital system (FIR filter, in this work) designed with STSCL.

The next step after the completion of layout level will be to carry out the chip level implementation of the filter. This will solidify and provide better grounds on the usability of the STSCL over the CMOS logic. The system performance and energy consumption can be measured at practical level with variation of the temperature and applied voltage. The verification, evaluation, and performance measurement of the implemented chip using STSCL logic will also motivate designing the ADCs and DACs of the single smart dust sensor module with STSCL gates.

Other important examples of work that are to be addressed in the future involve further increasing the resistance of the PMOS load devices, in order to further lower the bias current. This can be achieved by using a long channel PMOS devices. This eventually will allow the bias current flowing through an SCL NMOS network to be of value lower than 250 pA. Even though this may cause the maximum achievable operating frequency for the gates to be lower than the theoretical value of 200 kHz. The important parameter, energy consumption, will however be reduced.

A proper comparison between the STSCL and the CMOS must be established by trying to run the CMOS also in sub-threshold region. The area usage and energy consumption can be calculated and measured later on to further solidify the advantage of STSCL over CMOS. In this work, CMOS running in saturation region has been used to compare with STSCL, even though the comparison provided better results for STSCL, but comparison with sub-threshold CMOS will serve as a more better and reasonable work.

The architecture of the digital circuits/systems used in this thesis work are not optimized to match the recent low power trend. Thus, it is necessary to design the digital circuits/system with performance optimized architecture. Architecture that involves pipelining or interleaving techniques may be designed with STSCL gates and then evaluated in respect of energy consumption and area to observe the benefits of this new logic gate.
The goal of this work has been to study, find, and suggest the STSCL as a better replacement over CMOS and other sub-threshold logic designs. The designed systems will later on be used for applications like Smart Dust sensor, where low voltage and low energy specifications are highly influential. This thesis work suggests on achieving those specifications, but only at the schematic level. Thereby proper and satisfactory evaluation of the STSCL based systems will need chip level implementation of the designs to serve the ultimate objective of this thesis.
9. REFERENCES


Appendix

The different coefficients which have been used to design the FIR filters are described in this appendix. The coefficients have been derived using the Digital Filter toolbox in MATLAB. Given below are the derived coefficients.

Filter coefficients (fifth-order):
\[
c1 = 0.0625; \\
c2 = 0.125; \\
c3 = 0.1875; \\
c4 = 0.1875; \\
c5 = 0.125; \\
c6 = 0.0625;
\]

Filter coefficients (fifty-fifth-order):
\[
a1 = 0; \\
a2 = 0.0078125; \\
a3 = 0.0078125; \\
a4 = 0; \\
a5 = -0.0078215; \\
a6 = -0.0015625; \\
a7 = -0.0015625; \\
a8 = 0.0234375; \\
a9 = -0.03125; \\
a10 = 0.0234375; \\
a11 = -0.0390625; \\
a12 = -0.015625; \\
a13 = -0.015625; \\
a14 = 0.03125; \\
a15 = 0.0625; \\
a16 = 0.0546875; \\
a17 = -0.0546875; \\
a18 = -0.09375; \\
a19 = -0.078125; \\
a20 = 0.078125; \\
a21 = 0.109375; \\
a22 = 0.0859375; \\
a23 = -0.0859375; \\
a24 = -0.125; \\
a25 = -0.1171875; \\
a26 = 0.1171875; \\
a27 = 0.2421875; \\
a28 = 0.25; \\
a29 = 0.25; \\
a30 = 0.2421875; \\
a31 = 0.1171875; \\
a32 = -0.1171875; \\
a33 = -125; \\
a34 = -0.09375; \\
a35 = 0.109375; \\
a36 = 0.078125; \\
a37 = -0.078125; \\
a38 = -0.09375; \\
a39 = -0.0546875; \\
a40 = 0.0546875; \\
a41 = 0.0625;
\]
a42 = 0.03125;
a43 = 0.0625;
a44 = -0.015625;
a45 = -0.015625;
a46 = -0.0390625;
a47 = 0.0234375;
a48 = -0.03125;
a49 = 0.0234375;
a50 = -0.0015625;
a51 = -0.0015625;
a52 = -0.0078215;
a53 = 0;
a54 = 0.0078125;
a55 = 0.0078125;
a56 = 0;