Single Event Upset Detection at Gate Level

a) Unprotected c) Detection

(Parity Pair Latch)

d) Detection &

Localization RN

…

R₁ n Latchesn Latches …

⊕

C C' S

…

⊕⊕

⊕

⊕⊕

⊕

R₁ n/2 PPL

n/2 PPL

p_N p₁

b) Detection (Reference)

R₁ RN

…n/2 PPLn/2 PPL

p_N

⊕⊕

⊕

p₁

⊕⊕

⊕

n/2 NAND2n/2 NAND2

…

R₁ n Latchesn Latches Gating n NAND2Gating n NAND2

p₁

⊕

⊕⊕

⊕

p_N

⊕

⊕⊕

⊕

Figure 4.1.: Presented Non-Concurrent Configurations.

The remainder of this chapter is organized as follows. Section 4.2 details the Parity-Pair Latch standard cell and shows how registers can be composed that allow an efficient SEU detection at gate level (Block I in Figures 4.1-c and 4.1-d). Section 4.3 depicts how the register parities are encoded by an error detecting code and how the code computation is implemented in an area efficient way to allow the localization of the failing register (Block II in Figure 4.1-d). Finally, both parts are evaluated experimentally in Section 4.4 before concluding the chapter with a short summary.

4.2. Single Event Upset Detection at Gate Level

In order to detect Single Event Upsets affecting the state of a circuit stored in sequential elements, redundancy in form of additional check bits is employed. To allow error detection, two properties need to hold:

▷ The checksum needs to depend on all data bits to be protected.

▷ A changing value of any one bit entails a change of the checksum.

4.2.1. Register Parity Protection

The shortest possible checksum to detect Single Event Upsets consists of one addi-tional bit being used for an arbitrary amount of data bits. Often, a parity bit is used, that depicts if the number of data bits with value ’one’ is even or odd. For an even parity, a ’one’ parity bit is added for an odd number of ’ones’ in a register, thereby resulting in an even amount of ’ones’ in the total set of bits. Correspondingly, a ’zero’

parity bit is employed for an even number of ’ones’ (see Definition 2.2.6). ²

The even parity bit can be calculated effectively as the exclusive OR (XOR) sum of the data bits, which evaluates to ’zero’ for an even parity and ’one’ for an odd parity.

For a register R⃗_i withndata bits, the parityp_i is calculated byn−1 two-input XOR gates, which are typically organized in a tree of depthlog2(n).

To detect Single Event Upsets affecting the register data during a clock gated phase, the parity bit is computed upon entry of the clock gated phase and stored in an additional register bit. From now on, this parity will be called thereference parity p_i of register R_i. While the clock is gated, the parity tree concurrently computes the recomputed parity p^′_i. By comparison of the reference and the recomputed parity, the syndrome S_i is computed with the help of one additional XOR operation.

s_i = p_i ⊕ p^′_i (4.1)

Whenever the reference and the recomputed parity differ, the syndrome s_i is one.

Two causes leading to such a non-zero syndrome can be distinguished:

▷ A flipped data bit manifests in the recomputed parity p^′_i and the syndrome correctly indicates a corruption.

▷ The stored reference parity bitp_i is directly affected by a Single Event Upset while the data bits are correct.

For a protected register, all n+ 1 bits can be affected by a Single Event Upset. Though, the probability of a data bit being affected isn/(n+ 1), which is significantly larger than the probability for a corrupted parity bit 1/(n+ 1). Hence, in rare cases, a Single Event Upset is indicated although all data bits are correct (false detection). ³Besides

2An odd parity bit, the inversion of an even parity bit, could be used in the same manner. In the following, the even parity will be used without loss of generality.

3The amount of false detections can be further reduced by increasing the minimum Hamming distance, e.g. by duplication of the parity bit (n+ 2 total bits) and consensus checking (see Section 2.2.3.1).

4.2. Single Event Upset Detection at Gate Level

a slightly reduced system performance such a false detection is not an issue as the triggered countermeasures (e.g. a recomputation) also rectify the corrupted parity.

The power consumption of a module during idle phases is reduced by clock gating.

As opposed to normal operation, where the state is written once in a while, the vulnerability to Single Event Upsets raises during long periods of data retention.

It follows, that focusing the protection on the clock gated phase offers the highest potential to increase robustness. In order to not sacrifice the power consumption during operation, the parity computation itself is gated whenever the clock is enabled.

Therefore, NAND2 gates are employed due to their small area footprint, which are attached to the leaves of the parity tree. A reference implementation of a register with four data bits and a gated parity computation is shown in Figure 4.2.

As the area overhead introduced in order to escalate robustness against SEUs directly relates to the production cost, area is possibly the most important metric besides power consumption and delay. The area of the unprotected register depends on its width in terms of bits as well as the size of the used sequential element and will be used as the area baseline in the following (Eq. 4.2). The area of the reference parity computation is larger due to the additional parity tree (XOR2 gates) and its gating (NAND2 gates) (Eq. 4.3).

AreaUnprotected = n· A_Latch (4.2)

Area_Parity = n· A_Latch+n· A_NAND2+(n−1) · A_XOR2 (4.3)

Latch

⊕

XOR2

NAND2

Latch

⊕

XOR2

NAND2

Latch

⊕

XOR2

NAND2

Latch

NAND2

clkdis clk

dis clk

dis

Latch Latch Latch

Figure 4.2.: Reference Parity Tree Implementation (n = 4).

4.2.2. Area Efficient Register Parity Computation - Parity-Pair Latch While the parity computation for a register is effectively implemented by a tree of standard cells as depicted in the last section, such an implementation might not be as area efficient as desired. Rewriting Formula (4.3) unveils, that a significant share of the area overhead is spent to implement the first level of the parity computation.

Area_Parity = n·(A_Latch+A_NAND2) + n 2

· A_XOR2

| {z }

first level

+(n 2

−1) · A_XOR2

| {z }

remaining levels

(4.4)

Implementations with much less impact on power, delay and area can be found, if the first level of the parity tree is merged with the latches. Figure 4.3 shows the schematic of the parity computation between two latches from [IWZ08a; IWZ08b], referred to as the Parity-Pair Latch (PPL)⁴. Each latch consists of a feedback loop with two inverters (INV1/2 resp. INV3/4) that drives the outputQ, as well as two transmission-gates (TG1/2 resp. TG3/4) used to control each latch. To this end, either the feedback loop is disabled and a new value from input Dis registered (latch-operation), or the registered value is stored within the enabled feedback loop (hold-operation). The parity computation is especially efficient to implement with two transmission-gates TG5 and TG6, as the latches already provide both polarities of their internal state.

Latch 1 ¹ Latch 2

4 1

3 4 5

D_i Q_i

D_i+1 Q_i+1 P

L L

LB LB

Figure 4.3.: Schematic of the Parity-Pair Latch (PPL).

4The Parity-Pair Latch from [IWZ08a; IWZ08b] implements local clock gating internally. The generalized implementation depicted here results in a lower area overhead if clock gating is not desired.

4.2. Single Event Upset Detection at Gate Level

With these Parity-Pair Latches (PPL) a register is formed like in Figure 4.4. In order to reduce the power consumption during operation, clock gating is implemented by accompanying the Parity-Pair Latch cells with standard CMOS NAND2 cells controlled by the inverted clock gating signal. Hence, the parity is only computed during clock gated phases of the module and the switching activity during operation is reduced to a minimum. As the PPL parity computation is performed cell internal, the NAND2 gating cells are connected to the parity outputs of the PPL cells where they also restore the signal levels of the transmission-gate based parity computation.

Compared to a standard CMOS XOR2 with 8 or 10 transistors, the parity computation in the Parity-Pair Latch requires only 4 transistors. The critical path of the PPL is just 4 inverters and 3 pass transistors which is less than three times the delay through a latch, and in the same range as any of the double latch solutions mentioned in Section 3.1.2. As in the reference implementation, the remaining levels of the parity computation are implemented as a XOR tree composed of standard XOR2 cells.

In excess of the advantages with respect to power consumption and delay, such an optimized register parity computation comprises a reduced area overhead (Eq. 4.5)

▷ as soon as the area of a Parity-Pair Latch standard cell implementation is smaller than the sum of the replaced equivalent latch and exclusive OR standard cells;

▷ in presence of clock gating, as the amount of gate cells at the leaves of the remaining parity tree is bisected.

Area_PPL = n 2

·(A_PPL+ A_NAND2)

| {z }

first level

+(n 2

−1) · A_XOR2

| {z }

remaining levels

(4.5)

PPL

⊕

Latch 2

⊕

XOR2NAND2

Latch 1

PPL

⊕

Latch 2

NAND2

Latch 1

clkdis clk

dis

Figure 4.4.: Parity Tree Implementation utilizing Parity-Pair Latches (n=4).

Im Dokument Fault tolerance infrastructure and its reuse for offline testing : synergies of a unified architecture to cope with soft errors and hard faults (Seite 89-94)