

# Power Supply Noise-Aware Scan Test Pattern Reshaping for At-Speed Delay Fault Testing of Monolithic 3D ICs\*

Shao-Chun Hung<sup>†</sup>, Yi-Chen Lu<sup>‡</sup>, Sung Kyu Lim<sup>‡</sup>, and Krishnendu Chakrabarty<sup>†</sup>

<sup>†</sup>Department of Electrical & Computer Engineering, Duke University, USA

<sup>‡</sup>Department of Electrical & Computer Engineering, Georgia Institute of Technology, USA

**Abstract**—Monolithic 3D (M3D) integration is an emerging technology that offers significant power, performance, and area benefits for integrated circuit (IC) design. However, a problem with the 3D power distribution network in such ICs is that it can lead to high power supply noise (PSN) during the capture cycles in at-speed scan testing for transition delay faults. Therefore, the failure of good chips (i.e., yield loss) resulting from the PSN-induced voltage droop is a major concern for M3D designs. In this paper, we first assess the PSN and voltage droop problems, and their impact on path delays for at-speed testing of benchmark M3D designs. Next, we present an analysis framework to identify test patterns that are most likely to lead to yield loss. We describe a test-pattern reshaping solution based on integer linear programming to make appropriate changes to the test patterns that cause yield loss. Simulation results for four M3D benchmarks highlight the effectiveness of the proposed solution.

**Keywords**—Monolithic 3D integration, Power distribution network, Delay fault testing, Voltage droop

## I. INTRODUCTION

Three-dimensional (3D) integration provides a path to go beyond Moore's law and achieve higher circuit performance and package density, as well as reduce power consumption. Monolithic 3D (M3D) is a promising technology enabled by fine-grained vertical interconnects, known as monolithic interlayer vias (MIVs). MIVs are one to two orders smaller in size than the through-silicon-vias (TSVs) used in today's 3D integration technology [1]. Despite these benefits, a number of test challenges need to be addressed before M3D integration can become ready for commercial exploitation. One of these challenges is related to power supply noise (PSN) during scan testing. Recent work based on both static and dynamic analysis has shown that compared with traditional two-dimensional designs, M3D suffers more from PSN problems [2]. A major problem with the 3D power distribution network (PDN) in M3D ICs is that it can lead to high PSN during the capture cycles in at-speed scan testing for transition delay faults. The PSN problem is more severe in test mode due to higher switching activities of circuit nodes compared to functional operation [3]. Therefore, the failure of good chips (i.e., yield loss) resulting from the PSN-induced voltage droop during scan testing is a major concern for M3D designs.

\*This research was supported in part by the DARPA ERI 3DSOC Program under Award HR001118C0096. The work of Shao-Chun Hung and Krishnendu Chakrabarty was also supported in part by the National Science Foundation under grant CCF-1908045.

At-speed scan testing is necessary for effective delay testing [4]. However, a problem with at-speed scan testing is that the current drawn from the PDN is much higher than what is included in functional-mode specifications for designing the PDN [5]. This problem is especially severe for the capture cycles because the rated functional clock frequency is used to simultaneously capture test responses in all scan flip-flops (FFs) in the design. Excessive power consumption and high current drawn from the PDN lead to voltage droop.

Various strategies have been proposed in the literature to mitigate the problem of high power consumption during testing; these methods include test scheduling [6], circuit modification [7], test-pattern modification [8], and scan-chain ordering [9]. Algorithms based on the filling of don't-care bits (X-filling) have been proposed to manipulate test patterns to reduce power consumption [10]–[12]. However, previously proposed X-filling algorithms are of limited effectiveness for M3D designs due to the differences in layout and the PDN.

In this paper, we first present an analysis framework to conduct dynamic power and rail analysis for an M3D design. Based on this analysis, we determine the PDN voltage droop and compute the increase in delay for logic paths sensitized by each pattern. This information is used to determine test patterns for which the slack on long paths becomes negative under the rated functional clock period. These test patterns are then appropriately reshaped through X-filling technique. We present an algorithm based on integer linear programming to ensure that the test patterns are reshaped without any adverse impact on fault coverage.

The rest of the paper is organized as follows. Section II provides an overview of M3D integration and scan testing. Section III describes the design flow for M3D ICs, especially the PDN and current delivery. Section IV presents the proposed framework for dynamic power and rail analysis. A PSN-aware pattern reshaping algorithm is proposed in Section V. Section VI presents simulation results for benchmark M3D designs and a comparison with a baseline solution. Finally, Section VII concludes the paper.

## II. BACKGROUND

### A. Monolithic 3D Integration

M3D integration has been made possible by significant breakthroughs in low-temperature manufacturing pro-

cesses [13]. Manufacturing the upper tiers of an IC with low-temperature processing avoids damage to transistors and interconnects in the bottom tiers. M3D design styles depend on the type of design-partitioning method employed; partitioning at the transistor level, gate level, and block level have been described in the literature [14]. In transistor-level M3D ICs, P-channel and N-channel transistors are divided into different tiers; in gate-level M3D, each tier is composed of standard cells. Functional blocks are partitioned into multiple tiers in a block-level M3D IC. Gate-level design appears to be the most promising because a cumbersome redesign of standard cells is required for transistor-level design, while block-level design does not fully exploit the benefits of high-density MIVs [15].

Despite advances in design techniques, much less effort has been devoted to the testing of M3D ICs. In [16], a dedicated layer was introduced as a design-for-test solution for M3D integration. A PDN design technique to alleviate reliability and PSN problems was presented in [17]. However, the problem of test generation for M3D ICs, especially under PSN constraints, has not been addressed in prior work.

### B. Power-Supply Noise

PSN is defined as the difference between the nominal supply voltage value and the voltage level at local receivers. PSN-induced voltage droop is composed of two components: IR-drop and  $Ldi/dt$ . When switching activities occur, instantaneous current flow through the PDN to cause transitions at the inputs of logic gates. The equivalent resistance along this conduction path causes IR-drop. In a high-speed circuit, rapid changes in the current drawn from the PDN and the parasitic inductance result in large  $Ldi/dt$ . For M3D IC design, considerable research efforts have been devoted to PDN optimization. In [2], system-level modeling and simulations were carried out in both the time- and frequency-domains. A comprehensive full-chip study was carried out in [18]. The design of a reliable PDN based on genetic programming is described in [17]. However, the PDN in [17] and other related work is optimized only for functional-mode operations. Such a PDN design overlooks the PSN in test mode and the impact of voltage droop on scan testing.

### C. Delay Testing

The transition delay fault model is commonly used by automatic test pattern generation (ATPG) tools [19]. There are two methods typically used to implement the launch of a transition, namely *launch-off-shift* (LOS) and *launch-off-capture* (LOC). The LOS technique makes it easier for ATPG to generate transition delay fault patterns because of the controllable launch path. However, at-speed testing requires the scan-enable signal to change at-speed within the functional clock period, which increases design cost and effort. The LOC solution increases ATPG runtime and leads to lower fault coverage, but it is more practical because the scan enable signal does not have to switch at-speed. Therefore, we consider the LOC test-application method. Our goal is to evaluate the impact of PSN and voltage droop on LOC-based transition

delay-fault testing, and reshape the test patterns to minimize yield loss with no adverse impact on fault coverage.

## III. DESIGN FLOW

We use Compact2D in this work [20]. The original Compact2D flow does not consider PDNs; we have enhanced the original flow to incorporate a PDN in the M3D design. The final 3D design during 2D stages, Compact2D first scales the RC parasitics by  $1/\sqrt{2}$  for placement and routing, and then projects the entire design onto a tier with half of the original footprint. After the projection, a bin-based min-cut tier partitioning algorithm is utilized to transform the 2D design into 3D by assigning z-location for every instance. This partitioning algorithm minimizes the overall connection between the two partitioned tiers (tiers), while balancing the cell area in both tiers.

After tier partitioning, we build the 3D PDN before the original MIV planning stage to avoid signal MIVs being placed at the rails of PDN or overlapped with power MIVs, thereby preventing PDN degradation caused by the conventional MIV planning. Note that the pins of the cells are annotated with respect to the original cell locations, so that the pins in the bottom tier will leverage the original M1 layer, and the pins that are original in the top tiers would utilize the M7 layer (assume a tier has 6 metal layers). The power vias that connect the top metal layer of the bottom tier to the bottom metal layer of the top tier are the power MIVs.

To determine the locations of signal MIVs, a 3D global routing is performed on the stacked metal layers of both tiers. After MIV planning, legalization and timing-driven tier-by-tier routing are performed, resulting in a fully placed and routed sub-designs in both tiers. Finally, the sub-netlists in both tiers are merged into a single final M3D design, on which timing/power and PDN analysis are performed.

## IV. DYNAMIC ANALYSIS

Our simulations were performed on four benchmark two-tier M3D designs, namely LDPC and Tate Bilinear Pairing (Tate) from OpenCores, and netcard and leon3mp from the ISPD 2012 benchmark suite. Table I provides the design matrix and ATPG results for these designs. Experimental results described below highlight the problem of voltage droop due to PSN from the PDN.

### A. M3D Power and Rail Simulation

We developed a framework to conduct dynamic power and rail analysis for M3D ICs with Cadence Voltus. We generated transition-delay patterns after place and route. The patterns were written out in STIL format and converted into a Verilog testbench by Synopsys Tetramax. Next, we used Mentor Graphic ModelSim to conduct post-routed gate-level simulation and dump the value change dump (VCD) files to record switching activities of each pattern. VCD files were imported into Cadence Voltus to perform vector-based dynamic power and rail analysis.

TABLE I  
DESIGN MATRIX OF BENCHMARK M3D DESIGNS.

| Design  | Frequency (MHz) | Footprint ( $\mu\text{m}^2$ ) | # Cells | # Logic Gates | # FFs  | # Power MIVs | # Patterns | Fault Coverage |
|---------|-----------------|-------------------------------|---------|---------------|--------|--------------|------------|----------------|
| LDPC    | 650             | 263 $\times$ 262              | 92010   | 141940        | 2048   | 347          | 194        | 99.73%         |
| Tate    | 714             | 631 $\times$ 630              | 209425  | 555965        | 31409  | 12338        | 558        | 98.37%         |
| netcard | 500             | 816 $\times$ 815              | 250842  | 883625        | 67746  | 19649        | 43103      | 96.71%         |
| leon3mp | 333             | 1035 $\times$ 1035            | 397420  | 1460565       | 108720 | 15328        | 17087      | 98.87%         |



Fig. 1. Instantaneous voltage droop distributions in the worst-case scenario during testing: (a) LDPC; (b) Tate; (c) netcard; (d) leon3mp.

Because commercial tools do not consider M3D designs, we created a method to analyze two tiers in an M3D IC separately with the 2D power and rail analysis flow. For the top tier, the PDN design can be extended to a system-level model considering PCB, package, and C4 bumps [2]. The distance between two C4 bumps was set to 120  $\mu\text{m}$ . One major difference between traditional 2D ICs and M3D ICs is that the supply current for the bottom tier in M3D flows through the top tier. Therefore, additional power consumption and current demand are superimposed on the top tier. To simulate this scenario, we scaled the current in the PDN of the top tier during power and rail analysis. For the bottom tier, the locations and the parasitics of power MIVs, i.e., MIVs belonging to the PDN, were extracted during place and route. However, the reference voltage for the power MIVs was no longer the nominal value due to the voltage droop in the top tier. To analyze the worst-case scenario, we subtracted the worst-case voltage droop obtained in the rail analysis of the top tier from the nominal supply voltage, and utilized this new value as the power source of the PDN in the bottom tier.

### B. M3D Dynamic Rail Analysis

Due to the limitations inherent in commercial tools with respect to M3D, the simulation window could not be extended to the complete test procedure. Total power consumption is proportional to the switching activity for each net multiplied by its fanout. The weighted switching activity (WSA) [21] is used to estimate power consumption during scan capture. We first calculate WSA for every pattern and extract patterns with large WSA values. Next, we simulate the extracted patterns to obtain the worst-case voltage droop during test application.

Fig. 1 shows the voltage droop distributions of the PDN in the worst-case scenario for scan capture, where the upper part of each figure refers to the top tier and the bottom part

refers to the bottom tier. Note that the power source is different for each tier. For the top tier, the voltage is supplied from a dc power source with the nominal voltage; the voltage in the bottom tier is supplied from power MIVs with the reference voltage lower than the nominal value due to the voltage droop in the top tier, as discussed in Section IV-A. As the vertical connections are unique to the 3D design, our discussion focuses on the switching activities of two tiers and their impacts on the voltage droop during testing. Note that the voltage droop problem in the bottom tier is less severe than in the top tier. As the size of the designs increases, this difference becomes more obvious. It has been demonstrated in [2] that high-density power MIVs provide a large number of current sources for the bottom tier, which prevents a large-magnitude current from flowing through power rails near power MIVs and therefore, mitigates the IR-drop problem in the PDN. On the other hand, for the top tier, the number of C4 bumps is limited by the bump size. The reduction of footprint in an M3D IC compared with its 2D counterpart exacerbates this problem. Therefore, it is only expected that the voltage droop problem for the bottom tier is less severe than for the top tier. A large design requires an increase in the chip footprint, enabling the M3D PDN to add more power MIVs between two tiers to deliver current from the top to the bottom. Hence, this scenario can be observed more clearly in large designs.

To further examine this behavior in test mode, we calculate the WSA for nets in the top tier only and make a comparison with voltage droop. The relation between the WSA of the top tier and the voltage droop for transition delay fault patterns is shown in the Fig. 2. From these results, we conclude that there is a high positive correlation between the voltage droop in test mode and the switching activities in the top tier.

Table II provides a comparison between three test patterns



Fig. 2. WSA and voltage droop for the top tier for various test patterns: (a) LDPC; (b) Tate.



Fig. 3. Minimum slack for pattern for the LDPC benchmark: (a) without voltage droop; (b) with voltage droop.



Fig. 4. Minimum slack for patterns with excessive WSA (top tier) for the netcard benchmark: (a) without voltage droop; (b) with voltage droop.

for LDPC that have similar total WSA but different WSA for the top tier. Pattern 41 and Pattern 6 have almost the same WSA for the whole design but have a 17 mV difference in voltage droop. The total switching activities are even larger for Pattern 14. However, the voltage droop is worse in Pattern 41 than in Pattern 14. Therefore, a key contributor to the voltage droop is the switching activities in the top tier, instead of total switching activities for the full M3D design. ATPG tools for 2D designs typically choose the easiest way to sensitize target faults and run dynamic compaction to reduce test set size and test power without considering layout information. Prior work on capture-power reduction for scan testing use WSA for the whole design as the quality metric and optimization goal [10]–[12]. Those methods do not provide an optimal solution for M3D designs since they do not guarantee the minimum WSA for the top tier. Therefore, an M3D-specific pattern generation algorithm is needed mitigate the voltage droop problem during scan capture for M3D designs; this method must consider the M3D layout information.

#### C. Identification of Patterns that Cause Yield Loss

We next extract test patterns that cause excessive voltage droop, resulting in negative slack on sensitized paths. To calculate the additional delay due to voltage droop, we utilize a scale factor under the assumption that gate delay is computed using a first-order model based on which it varies with the supply voltage and the velocity saturation  $\alpha \approx 1$  in the nanometer regime [22]. Let  $V_{droop}$  be the voltage droop obtained during the dynamic rail analysis. The scaled delay  $T_{droop}$  is calculated as follows:

$$T_{droop} = T_{nom} \times \frac{1 - V_{th}/V_{nom}}{1 - V_{th}/(V_{nom} - V_{droop})} \quad (1)$$

TABLE II  
RELATIONSHIP BETWEEN WSA AND VOLTAGE DROOP FOR THE LDPC BENCHMARK.

| Pattern ID      | WSA (total) | WSA (top tier) | Voltage Droop (mV) |
|-----------------|-------------|----------------|--------------------|
| 41              | 130941      | 64149          | 264                |
| 6               | 130980      | 62253          | 247                |
| 14              | 131871      | 62695          | 247                |
| Functional mode |             |                | 238                |

TABLE III  
YIELD LOSS ANALYSIS OF ATPG-GENERATED PATTERNS.

| Design  | # Patterns (Total) | # Patterns to be reshaped | Percentage of patterns to be reshaped |
|---------|--------------------|---------------------------|---------------------------------------|
| LDPC    | 194                | 74                        | 38 %                                  |
| Tate    | 558                | 117                       | 21 %                                  |
| netcard | 43103              | 26                        | 0.06 %                                |
| leon3mp | 17087              | 37                        | 0.22 %                                |

where  $T_{nom}$  is the delay without the voltage droop,  $V_{nom}$  is the nominal supply voltage, and  $V_{th}$  is the threshold voltage. In our experiments,  $V_{nom}$  is 1.1 V and  $V_{th}$  is 0.15 V according to the Nangate 45 nm Open Process Design Kit. Increased delay of sensitized paths for a pattern leads to a reduction in the minimum slack for these paths. Once the slack becomes negative, the corresponding path violates the setup time violation, which may cause an erroneous response at the output (or scan FF) and hence result in yield loss. Fig. 3-4 compare the minimum slack of patterns with and without the impact of voltage droop during scan capture, for LDPC and netcard, respectively. Note that for large designs with more than 10K test patterns, patterns with a low WSA for the top tier are unlikely to result in yield loss due to the voltage droop problem. Therefore, our experiments focus on a subset of patterns that have a relatively large WSA for the top tier. Since it is time-consuming to obtain the voltage droop for each pattern, we apply the largest voltage droop during testing to every pattern assuming a worst-case scenario. This is a conservative strategy that minimizes voltage droop during scan capture. The slack reports are obtained by conducting timing analysis using Synopsys PrimeTime. As shown in Fig. 3(b), 38% of the 193 patterns for LDPC generated by a commercial ATPG tool have a negative slack after the delay is scaled. Such patterns are identified to be susceptible to yield loss and need to be reshaped. Table III shows the number and percentage of patterns that are likely to lead to yield loss for each design. At least 21% of original patterns for Tate cause yield loss, which is clearly unacceptable. Therefore, a pattern reshaping procedure is necessary to obtain a new set of patterns with low dynamic voltage droop.

#### V. PATTERN RESHAPING

In this section, we describe our pattern reshaping algorithm based on integer linear programming (ILP). We first remove the extracted patterns from the original set and update the fault list. Next, ATPG process is carried out to generate new patterns for undetected faults with don't-care bits unfilled. During pattern reshaping, our goal is to fill don't-care bits in each test pattern such that the voltage droop is minimized during scan capture.

TABLE IV  
CONSTRAINTS THAT INCORPORATE THE FUNCTIONALITY OF LOGIC GATES.

| Gate Type | Linear Inequalities                                | Gate Type | Linear Inequalities                                                                      |
|-----------|----------------------------------------------------|-----------|------------------------------------------------------------------------------------------|
| AND       | $y \geq x_1 + x_2 - 1$<br>$y \leq x_1, y \leq x_2$ | XOR       | $y \leq x_1 + x_2$<br>$y \geq x_1 - x_2$<br>$y \geq x_1 + x_2$<br>$y \leq 2 - x_1 - x_2$ |
| OR        | $y \leq x_1 + x_2$<br>$y \geq x_1, y \geq x_2$     | INV       | $y = 1 - x_1$                                                                            |



Fig. 5. An example circuit used to illustrate the ILP model.

The first step in ILP modeling is to declare all the variables in the circuit to be binary variables. Next, the functionality of each Boolean logic gate is realized by a set of linear constraints. Table IV shows the linear inequalities for four kinds of basic logic gates, where the inputs are denoted as  $x_i$  and the output is denoted as  $y$ . All standard cells in benchmark designs can be realized by a combination of the listed gate types. However, modeling large designs completely with variables in this manner requires an enormous number of constraints and consumes considerable runtime during optimization. To relax the constraints, we first carry out a forward implication with test patterns. Only inputs and gate outputs with an unknown signal are included in the ILP model. Note that for a delay-fault test pattern, both the initial state and the launch state need to be taken into consideration. An example circuit is shown in Fig. 5. With test vectors  $V_1$  and  $V_2$ , we perform forward implication twice by applying two vectors contiguously. Let  $n^i$  be the signal of net  $n$  with vector  $V_i$ . In this case, only  $(b^1, d^1, f^1, g^1, h^1, d^2)$  will be defined as binary variables in our ILP model, and the constraints corresponding to them are formulated using the linear inequalities mentioned above.

After the circuit is modeled using the constraints described above, the objective function is formulated. To evaluate the switching activity of net  $n$ , we define a variable  $n_{\text{toggle}} = n^1 \oplus n^2$ . Whenever  $n_{\text{toggle}}$  equals 1, a transition occurs at net  $n$ . As discussed in Section IV-B, the dynamic voltage droop is greatly influenced by switching activities in the top tier. Therefore, the objective function for our ILP model is as shown below:

$$\text{Minimize } \sum_i i_{\text{toggle}} (1 + N_{\text{fanout},i}), \text{ for all top tier nets } i,$$

where  $N_{\text{fanout},i}$  is the fanout of net  $i$ . Consider the example shown in Fig. 5. In this example, nets  $a, b, e, g$  are located in the top tier. After constraints relaxation, it is obvious that  $a_{\text{toggle}}$  and  $e_{\text{toggle}}$  have already been determined. The inclusion of these two variables in the objective function is unnecessary. Therefore, the objective function is formulated as: Minimize  $\{2b_{\text{toggle}} + 2g_{\text{toggle}}\}$ . The solution to this ILP

problem provides a fully specified test pattern with minimum WSA for the top tier. This ILP model is invoked for every test pattern that needs to be reshaped.

## VI. EXPERIMENTAL RESULTS

We utilized the C++ application programming interface and the ILP solver of the Gurobi optimizer. Even though the number of variables and constraints in the ILP model is linear in circuit size, runtime is a major concern for large designs, e.g., netcard and leon3mp. To find a trade-off between performance and efficiency, we define a runtime threshold depending on the size of the target circuit. With a negligible loss of fault coverage, patterns that exceed the runtime limit are removed during optimization. The CPU time required to generate the ILP model is negligible.

Table V shows the test pattern results obtained after reshaping. It is expected that the number of patterns increases slightly due to the lack of pattern compaction when we generate a new set with don't-care bits before the ILP optimization. For the Tate benchmark, our solution leads to a reduction in the number of test patterns, with a negligibly small decrease in fault coverage. For all our benchmarks, the average WSA of patterns decreases with pattern reshaping without any adverse impact on fault coverage.

For the original ATPG-generated pattern set, many sensitized paths have negative slack due to voltage droop. These paths are likely to lead to yield loss. In Table 5, we list one path for each test pattern; this is the path with the minimum slack for the corresponding pattern. We consider the worst-case scenario and extract patterns with negative minimum slack, and then use our approach to reshape such patterns. If such a path has positive slack, we can ensure that no sensitized path has negative slack, hence yield loss is eliminated. We leave the patterns that only sensitize paths with positive slack unchanged and extract the other patterns from the original pattern set. Next, we reshape the extracted patterns. A timing-analysis verification step is carried out to compute the slack after pattern reshaping. No sensitized path has negative slack for the reshaped patterns (Table V).

The reduction in average WSA value is not so noticeable. This is because the average WSA is dominated by patterns in the original test set that have a large WSA but only sensitize paths without a small slack. Those patterns are unlikely to fail a good chip due to the voltage droop problem and thus these patterns do not have to be reshaped.

We next compare the reshaped patterns with a 2D baseline X-filling algorithm; see Table VI. We first carry out timing analysis to evaluate the reduction in yield loss after pattern reshaping. We also consider sensitized paths with the minimum slack for each pattern, and record the number of paths whose slack is no more than 3% of the functional clock period.

In Table VI, the number of reshaped patterns is listed on ATPG. We update the undetected fault list after extracting patterns from the original set and regenerate a new set of patterns with don't-care bits unfilled. Therefore, this number is different from the number of patterns-to-be-reshaped listed in

TABLE V  
COMPARISONS BETWEEN PATTERNS WITH AND WITHOUT RESHAPING.

| Design  | Results Without Pattern Reshaping |                     |                                      |                |                            | Results With Pattern Reshaping |                     |                                      |                |                             |
|---------|-----------------------------------|---------------------|--------------------------------------|----------------|----------------------------|--------------------------------|---------------------|--------------------------------------|----------------|-----------------------------|
|         | # Patterns                        | Mean WSA (top tier) | Standard deviation of WSA (top tier) | Fault coverage | #Paths with negative slack | # Patterns                     | Mean WSA (top tier) | Standard deviation of WSA (top tier) | Fault coverage | # Paths with negative slack |
| LDPC    | 194                               | 58830               | 5818.91                              | 99.73%         | 74                         | 204                            | 57324.99            | 5470.82                              | 99.73%         | 0                           |
| Tate    | 558                               | 99629.05            | 11763.54                             | 98.37%         | 117                        | 460                            | 97398.82            | 14169.17                             | 98.16%         | 0                           |
| netcard | 43103                             | 42529.51            | 19153.94                             | 96.71%         | 26                         | 43163                          | 42461.70            | 19156.23                             | 96.71%         | 0                           |
| leon3mp | 17087                             | 35040.37            | 8985.69                              | 98.87%         | 37                         | 17110                          | 34984.42            | 9002.94                              | 98.85%         | 0                           |

TABLE VI

COMPARISONS BETWEEN PROPOSED ILP-BASED METHOD FOR RESHAPING PATTERNS WITH A 2D BASELINE X-FILLING ALGORITHM [11].

| Design  | Proposed ILP-based optimization |                |                                             |                                                         | [11]                |                |                                             |                                                         |
|---------|---------------------------------|----------------|---------------------------------------------|---------------------------------------------------------|---------------------|----------------|---------------------------------------------|---------------------------------------------------------|
|         | # Patterns reshaped             | Fault coverage | # Patterns having paths with negative slack | # Patterns with min. slack below 3% of the clock period | # Patterns reshaped | Fault coverage | # Patterns having paths with negative slack | # Patterns with min. slack below 3% of the clock period |
| LDPC    | 84                              | 99.73%         | 0                                           | 0                                                       | 84                  | 99.73%         | 0                                           | 7                                                       |
| Tate    | 19                              | 98.16%         | 0                                           | 3                                                       | 228                 | 98.37%         | 1                                           | 18                                                      |
| netcard | 86                              | 96.71%         | 0                                           | 1                                                       | 86                  | 96.71%         | 3                                           | 10                                                      |
| leon3mp | 60                              | 98.85%         | 0                                           | 1                                                       | 440                 | 98.88%         | 16                                          | 32                                                      |

Table III due to the lack of pattern compaction. Note that for Tate and leon3mp, the number of patterns is lower in Table VI due to optimizations carried out to reduce runtime. During ILP optimization, we remove patterns that require high runtime for analysis (using a timeout) and evaluate the remaining patterns. Since the loss of fault coverage is negligible, further optimization does not have to be conducted.

In [11], a probabilistic method is presented to ensure low capture power without any loss of fault coverage. However, this method tends to occasionally sensitize paths with a small slack. Moreover, because [11] does not take the M3D layout and PDN into account, it can lead to considerable voltage droop for some test patterns during scan capture. As a result, the patterns obtained from [11] can lead to either negative slack or considerably reduced slack margin. Negative slack will always lead to yield loss, while reduced slack will magnify the detrimental impact of small-delay defects and also is likely to lead to the failure of a good chip with small process variations. In the proposed method, the reshaped patterns do not cause negative slack under voltage-droop conditions on sensitized paths and the problem of slack-margin reduction is also mitigated. Table 6 shows that the proposed method leads to fewer sensitized paths with reduced slack margins. In some cases, the baseline solution [11] also results in a few paths with negative slack.

## VII. CONCLUSION

We have proposed a framework to identify test patterns that are likely to cause yield loss due to the droop-induced added delay on sensitized paths. We have presented an ILP-based X-filling algorithm for pattern reshaping. Experimental results show that the proposed method eliminates the PSN-induced yield loss problem during scan capture for M3D designs.

## REFERENCES

[1] P. Batude et al. 3D sequential integration: A key enabling technology for heterogeneous co-integration of new function with CMOS. *IEEE JETCAS*, December 2012.

- [2] K. Chang et al. System-level power delivery network analysis and optimization for monolithic 3D ICs. *IEEE Trans. VLSI Systems*, 2019.
- [3] J. Saxena et al. A case study of IR-drop in structured at-speed testing. In *Int. Test Conf.*, 2003.
- [4] X. Lin et al. High-frequency, at-speed scan testing. *IEEE Design & Test of Computers*, 20(5):17–25, 2003.
- [5] P. Pant and J. Zelman. Understanding power supply droop during at-speed scan testing. In *IEEE VLSI Test Symposium*, 2009.
- [6] R. M. Chou, K. K. Saluja, and V. D. Agrawal. Scheduling tests for VLSI systems under power constraints. *IEEE Trans. VLSI Systems*, 1997.
- [7] B. Yang et al. A clock-gating based capture power droop reduction methodology for at-speed scan testing. In *DATE*, 2011.
- [8] S. Kajihara, K. Ishida, and K. Miyase. Test vector modification for power reduction during scan testing. In *Proc. IEEE VLSI Test Symposium*, pages 160–165, 2002.
- [9] Y. Bonhomme et al. Power driven chaining of flip-flops in scan architectures. In *Proc. Int. Test Conf.*, 2002.
- [10] X. Wen et al. On low-capture-power test generation for scan testing. In *23rd IEEE VLSI Test Symposium*, pages 265–270, 2005.
- [11] S. Remersaro et al. Preferred fill: A scalable method to reduce capture power for scan based designs. In *IEEE Int. Test Conf.*, 2006.
- [12] X. Wen et al. A novel scheme to reduce power supply noise for high-quality at-speed scan testing. In *IEEE Int. Test Conf.*, 2007.
- [13] S. Wong et al. Monolithic 3D integrated circuits. In *International Symposium on VLSI Technology, Systems and Applications*, 2007.
- [14] S. Panth et al. Design and CAD methodologies for low power gate-level monolithic 3D ICs. In *ISLPED*, pages 171–176, 2014.
- [15] S. Panth et al. Placement-driven partitioning for congestion mitigation in monolithic 3D IC designs. *IEEE Trans. CAD*, 2015.
- [16] A. Koneru and K. Chakrabarty. An inter-layer interconnect BIST and diagnosis solution for monolithic 3D ICs. *IEEE Trans. CAD*, 2019.
- [17] A. Koneru, A. Todri-Sanial, and K. Chakrabarty. Reliable power delivery and analysis of power-supply noise during testing in monolithic 3D ICs. In *IEEE VLSI Test Symposium*, 2019.
- [18] S. K. Samal et al. Full chip impact study of power delivery network designs in monolithic 3D ICs. In *ICCAD*, 2014.
- [19] J. A. Waicukauski et al. Transition fault simulation. *IEEE Design Test of Computers*, 4(2):32–38, 1987.
- [20] B. W. Ku, K. Chang, and S. K. Lim. Compact-2D: A physical design methodology to build commercial-quality face-to-face-bonded 3D ICs. In *Proc. International Symposium on Physical Design*, 2018.
- [21] P. Girard. Survey of low-power testing of VLSI circuits. *IEEE Design Test of Computers*, 19(3):82–92, 2002.
- [22] A. Ramalingam et al. Robust analytical gate delay modeling for low voltage circuits. In *Asia and South Pacific Conference on Design Automation*, 2006.