

Εθνικό Μετσοβίο Πολύτεχνειο Σχολή Ηλεκτρολογών Μηχανικών και Μηχανικών Υπολογιστών Τομέας Τεχνολογίας Πληροφορικής Και Υπολογιστών

### **3D Integrated Circuits IR-Drop Estimation:** Characterization, Extraction & Synthesis

### ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

Δημήτριος Γ. Αναγνωστός

Επιβλέπων : Δημήτριος Σούντρης Επίκουρος Καθηγητής

Αθήνα, Οκτώβριος 2012



ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ Σχολή Ηλεκτρολογών Μηχανικών και Μηχανικών Υπολογιστών Τομέας Τεχνολογίας Πληροφορικής Και Υπολογιστών

### **3D Integrated Circuits IR-Drop Estimation:** Characterization, Extraction & Synthesis

### ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

Δημήτριος Γ. Αναγνωστός

Επιβλέπων : Δημήτριος Σούντρης Επίκουρος Καθηγητής

Εγκρίθηκε από την τριμελή εξεταστική επιτροπή τη<br/>ν $12^{\eta}$ Οκτωβρίου 2012.

..... Κιαμάλ Πεκμεστζή Καθηγητής ..... Δημήτριος Σούντρης Επίκουρος Καθηγητής ..... Γεώργιος Οικονομάκος Επίκουρος Καθηγητής

Αθήνα, Οκτώβριος 2012

..... Δημήτριος Γ. Αναγνωστός

Διπλωματούχος Ηλεκτρολόγος Μηχανικός και Μηχανικός Υπολογιστών Ε.Μ.Π.

Copyright © Δημήτριος Αναγνωστός, 2012.

Με επιφύλαξη παντός δικαιώματος. All rights reserved.

Απαγορεύεται η αντιγραφή, αποθήκευση και διανομή της παρούσας εργασίας, εξ ολοκλήρου ή τμήματος αυτής, για εμπορικό σκοπό. Επιτρέπεται η ανατύπωση, αποθήκευση και διανομή για σκοπό μη κερδοσκοπικό, εκπαιδευτικής ή ερευνητικής φύσης, υπό την προϋπόθεση να αναφέρεται η πηγή προέλευσης και να διατηρείται το παρόν μήνυμα. Ερωτήματα που αφορούν τη χρήση της εργασίας για κερδοσκοπικό σκοπό πρέπει να απευθύνονται προς τον συγγραφέα.

Οι απόψεις και τα συμπεράσματα που περιέχονται σε αυτό το έγγραφο εκφράζουν τον συγγραφέα και δεν πρέπει να ερμηνευθεί ότι αντιπροσωπεύουν τις επίσημες θέσεις του Εθνικού Μετσόβιου Πολυτεχνείου.

## Abstract

Three-dimensional circuit integration is a promising technology, able to ensure the continuation of Moore's Law and the production of highly dense silicon systems. Performance and power consumption metrics profit from the reduction of wire lengths in the die, while silicon yield increases as the total surface of the integrated circuit is reduced in favour of vertical manufacturing. Yet 3D technology is not mature enough to support massive production. Cost issues and the intrinsic problems of heat dissipation and vertical interconnection reliability are combined with the lack of available, 3D specific, design automation and verification tools from major software vendors. Only recently, with the introduction of commercial 2.5D ICs, the industry has started to develop 3D oriented EDA tools to assist designers.

This thesis describes specific details from the development of an *IR*-Drop estimation tool, for memory-on-processor systems, as part of a collaborative, six month project funded by Integrated Systems Laboratory, EPFL. Reliable power delivery becomes an important issue when moving to 3D topologies, since all currents have to traverse the stack of dies before reaching the real power nodes. This effect leads to voltage drops that may surpass the margins for reliable operation. Moreover, memory-on-processor systems are expected to be some of the first 3D circuits to hit the market, offering unparalleled performance. At the same time though memory circuits suffer greatly from reduced voltages, especially when in sleep mode.

The target of this tool is to offer designers an early estimation of the cells which are more prone to failure due to unexpected drops in power distribution. For that reason the tool utilizes models of devices and power delivery networks which are close to the actual physical design, resulting in fine-grained voltage distribution maps. The tool is also thermal-aware, meaning that it captures the effect of Joule heating on power delivery and adjusts all affected devices accordingly.

In the beginning, aspects of the tool creation process are discussed, followed by a presentation of the simulated systems. Extensive results are presented for 3D memory topologies and their effect on *IR*-Drop of large systems is explored. The thesis concludes with summarizing comments and some suggestions for future improvements of the tool.

*Key Words*: 3D integration, TSV, power delivery networks, *IR*-Drop, memory-on-processor, circuit characterization, benchmark synthesis, SRAM.

## Acknowledgements

First I would like to thank Prof. Dimitrios Soudris and Dr. Vasilis Pavlidis for their trust and cooperation. They both gave me valuable lessons and the chance to work in a fruitful environment, supporting my work in every step. I also have to thank the researchers of LSI, EPFL and MICROLAB, NTUA for their aid and company during my internship and before that. A special thanks to Mr. Chaudhary, with whom we worked together on the development of the tool.

The support of all my beloved friends and co-students cannot be neglected. Their presence has always been a potent catalyst in the formation of my character.

Finally and most importantly I express my deepest gratitude to my family. Without my parents' infinite trust and my brother's invaluable assistance none of this would be possible.

# Table of Contents

| Abstract                                 |
|------------------------------------------|
| Acknowledgements                         |
| Table of Contents                        |
| List of Figures                          |
| List of Tables                           |
| Chapter 1: Introduction11                |
| 1.1 Problem Statement11                  |
| 1.2 Purpose of Thesis                    |
| 1.3 Importance of Study14                |
| 1.4 Scope of Study15                     |
| Chapter 2: Related Work                  |
| Chapter 3: Characterization & Extraction |
| 3.1 Introduction                         |
| 3.2 Characterization                     |
| 3.3 PDN Extraction                       |
| Chapter 4: Synthesis                     |
| 4.1 Memory Topologies                    |
| 4.2 Synthesis Rules                      |
| 4.3 Synthesis Options                    |
| Chapter 5: Results                       |
| 5.1 Introduction                         |
| 5.2 Topology Verification                |
| 5.3 System Exploration                   |
| 5.4 Comments on Thermal Effects          |
| Chapter 6: Conclusions                   |
| References                               |
| Appendix                                 |

# List of Figures

| Figure 1.1  | An abstract, heterogeneous, TSV based 3D Integrated Circuit [5]                           | 12 |
|-------------|-------------------------------------------------------------------------------------------|----|
| Figure 1.2  | Flow diagram of the developed tool                                                        | 14 |
| Figure 1.3  | Schematic of the impeding 3D DRAM modules                                                 | 15 |
| Figure 2.1  | Relative dimensions of two TSVs and two memory cells in different                         |    |
|             | technologies. The area overhead makes vertical connections expensive in terms             | 17 |
|             | of used silicon.                                                                          |    |
| Figure 2.2  | Flow utilized in [9], supporting thermo-electrical co-analysis                            | 19 |
| Figure 2.3  | (a) Ideal solutions of TSV tapering for electrical and thermal purposes (b)               | 20 |
|             | Combination of solutions                                                                  | 20 |
| Figure 3.1  | Modeling of active power grids for DC (a) and AC (b) conditions                           | 21 |
| Figure 3.2  | Characterization process                                                                  | 23 |
| Figure 3.3  | (a) Layout of a MeMaker produced memory (b) Abstract top view of a memory                 | 24 |
|             | array. The arrows represent the dependencies between the circuits                         | 24 |
| Figure 3.4  | (a) Topology used for row circuitry characterization, load of 32 cells wide. (b)          |    |
|             | Signals forced on the circuits. From top to bottom : Decoder signal, global               | 25 |
|             | word-line, local word-line, global bit-line, global complementary bit-line                |    |
| Figure 3.5  | (a) Topology used for column circuitry characterization, load of 32 cells tall. (b)       |    |
|             | Signals forced on the circuits. From top to bottom: Pre-charge signal,                    | 27 |
|             | multiplexing signal, word-line signal, sense enable signal, write enable signal           |    |
| Figure 3.6  | (a) Layout of the used "wide" cell. (b) Layout of a $32 \times 16$ cell array for leakage | 20 |
|             | estimation purposes                                                                       | 20 |
| Figure 3.7  | Leakage currents for the explored voltage-temperature variable space: (a) cell            | 20 |
|             | leakage (b) row circuits leakage (c) column circuits leakage, 16:1 MUX                    | 29 |
| Figure 3.8  | Active currents for two interesting cases: (a) operation at 166 MHz where the             | 21 |
|             | circuits fail (b) operation at 100 MHZ with no failings                                   | 51 |
| Figure 3.9  | Detail of the power delivery network and the estimation of IR-Drop direction,             | 37 |
|             | indicated by the arrows                                                                   | 52 |
| Figure 3.10 | Pattern of contact sharing between adjacent cells targeting at compact designs            | 32 |
| Figure 4.1  | STACK topology, two tiers                                                                 | 34 |
| Figure 4.2  | 3DWL topology, two tiers                                                                  | 35 |
| Figure 4.3  | 3DBL topology, two tiers                                                                  | 36 |
| Figure 4.4  | YY topology, two tiers                                                                    | 37 |
| Figure 4.5  | XX topology, two tiers                                                                    | 38 |

| Figure 4.6  | Separation of the components of a power grid in order to enable independent                          |    |
|-------------|------------------------------------------------------------------------------------------------------|----|
|             | analysis for the two power nodes. (a) Original network (b) Transformed                               | 40 |
|             | network, notice the divided current sources                                                          |    |
| Figure 4.7  | Part of a memory floorplan for different block dimensions. Block width and                           |    |
|             | length can be unequal if a different aspect ratio is required. (a) $16 \times 16$ (b) $32 \times 32$ | 43 |
|             | (c) 64×64                                                                                            |    |
| Figure 5.1  | Floorplans of STACK system. (a) Tier 0 (b) Tier 1, closest to $V_{DD}$                               | 46 |
| Figure 5.2  | Floorplans of 3DWL system. (a) Tier 0 (b) Tier 1, closest to $V_{DD}$                                | 47 |
| Figure 5.3  | Floorplans of YY system. (a) Tier 0 (b) Tier 1, closest to $V_{DD}$                                  | 48 |
| Figure 5.4  | IR-Drop on ground power grids. (a) STACK (b) 3DWL (c) YY                                             | 50 |
| Figure 5.5  | Total IR-Drop for a STACK system. (a) Tier 0 (b) Tier 1, closest to $V_{DD}$                         | 52 |
| Figure 5.6  | Total <i>IR</i> -Drop for a 3DWL system. (a) Tier 0 (b) Tier 1, closest to $V_{DD}$                  | 53 |
| Figure 5.7  | Total <i>IR</i> -Drop for a YY system. (a) Tier 0 (b) Tier 1, closest to $V_{DD}$                    | 54 |
| Figure 5.8  | Relative maximum IR-Drop for the explored systems                                                    | 55 |
| Figure 5.9  | Improved voltage drop metrics through TSV density doubling                                           | 56 |
| Figure 5.10 | Voltage differences on TSVs of 8-tiered systems                                                      | 57 |
| Figure 5.11 | Temperature map of an operating tier                                                                 | 58 |

# List of Tables

| Table 4.1 | Examples of coordinate based, SPICE name generation | 39 |
|-----------|-----------------------------------------------------|----|
| Table 4.2 | Important synthesis options, user defined           | 41 |

### 1. Introduction

### **1.1 Problem Statement**

With the dawn of the new millennia, a new dimension is added in Integrated Circuits (IC) design, literally. The term 3D Interconnect is introduced for the first time in ITRS Roadmaps in 2001 [1], signalling the beginning of intense research that aims to bring 3D circuits to the market as soon as possible. The reasons for this radical change are briefly outlined in [1], as do the main obstacles the scientific community has to face.

Since scaling down beyond 45nm technological nodes requires special fabrication techniques due to the slow progress in lithography, such as double or even triple patterning [2], designers seek a cheaper way to increase the density of active devices in an IC. The trend that the industry has adopted, often called Moore's Law [3], requires that every 18 months the number of active devices in a single IC has to double. In nodes prior to 100nm this was simply achieved by scaling down the dimensions of the transistors. Though, as mentioned already, this strategy is not as appealing and money efficient as it used to be, mainly because of extra costs in fabrication, as well as severe reliability issues [4].

By exploiting the third dimension, designers are able to produce dense ICs, while at the same time they benefit from the reduced wire lengths [5]. This reduction leads to considerable savings in power consumption and signal delay, as the necessity for repeaters in a signal path is decreased, while at the same time the wire loads of the drivers are decreased. Another advantage of 3D integration is the possible heterogeneity of the different tiers. As presented in Figure 1.1, a 3D system can be comprised of layers with diverse functionality, from analog to digital, and most importantly manufactured with different technologies. For example, a system of memory-on-processor could include a tier of processors in a 45nm process, while the memories are fabricated in a smaller node for improved capacity. On the same basis, 3D technology also permits separate manufacturing and testing of the different tiers, thus improving the total yield [5].



Figure 1.1 An abstract, heterogeneous, TSV based 3D Integrated Circuit [5]

Apart from the aforementioned advantages, 3D integration also exhibits intrinsic difficulties, which are recognized intuitively even in [1] and by the present day have been verified through simulations and measurements on fabricated samples. The major source of concern is heat flow through the stack of layers, followed by reliability issues in the electrical and mechanical domains of the IC. Another rising obstacle is reliable power delivery through the stack, especially for 3D systems exploiting TSV-based vertical interconnects.

As the IC grows in the vertical direction, new silicon layers are added to the heat flow path. The result is modules which are no more adjacent to the heat sink, but instead feed their produced heat to the next tier. Depending on the number of tiers and the implicated circuits, gradients in temperature up to hundreds of degrees may manifest [5], leading to failures in operation and enhanced reliability phenomena, such as electro-migration.

Binary to the above is the power delivery problem, since in a 3D stack the tiers close to the heat sink lay far from the nodes where power is delivered from the exterior of the IC, as presented in Figure 1.1. As a result, power is delivered after the current has traversed all the previous tiers, which are close to the real  $V_{DD}$ . The effect is accumulative and leads to significant voltage drops as the number of tiers is increased, therefore causing unexpected failures of the circuitry and decreased reliability metrics.

### **1.2 Purpose of Thesis**

This thesis describes the development of a Thermal Aware, *IR*-Drop Estimation Tool, for 3D IC Analysis (simply referred as tool for the purposes of this document). This tool is the result of a six month internship in the Integrated Systems Laboratory (LSI) of École

Polytechnique Fédérale de Lausanne (EPFL), under the supervision of Dr. Vasileios Pavlidis and in collaboration with fellow intern Muhammad Waqas Chaudhary, from KTH Sweden. Since this is a joint work only parts of it will be presented thoroughly.

The purpose of the tool is to help designers locate *IR*-Drop related reliability issues in their circuits during design time and under varying temperature and voltage conditions. Although the estimation is performed on an early stage, its proximity to the physical design guarantees a reasonable accuracy for the results.

A flow diagram of the final tool is illustrated in Figure 1.2. Purpose of the tool is to help designers estimate *IR*-Drop in 3D systems on an early design stage. A short description of the tool is as follows :

- First comes a preparatory stage of characterization and extraction, where the current values of each block utilized are calculated through simulations, for various operating voltages and temperatures. At the same time wire resistivity is extracted to be included in the Power Delivery Network (PDN). More details on these topics are given in Chapter 3.
- At the beginning of the tool, a netlist is created, replacing wires with equivalent resistors and active devices with current sources, so that the *IR*-Drop can be estimated for the desired conditions. This synthesis step is elaborated in Chapter 4, where also details for the simulated 3D systems are provided.
- The electro-thermal simulations take place iteratively, updating power and temperature values for the system. It is worth mentioning that Mr. Chaudhary embeds a technique known as algebraic multi-grid into the simulators, reducing simulation times by many factors.
- In the end a multitude of results is reported and plotted , such as *IR*-Drop distribution throughout all the circuits, temperature distribution in the stack *etc*.



Figure 1.2 Flow diagram of the developed tool

The block named "Characterization and Extraction" implies that the simulations are performed on a level of abstraction very close to the actual physical design of the system. Another important detail is the nested electro-thermal iterations. As mentioned in the previous section, temperature gradients affect the power delivery path by changing the resistivity and power consumption of the circuits, while at the same time the produced power changes heat distribution. This intertwined relationship is attempted to be captured by the two nested loops presented in Figure 1.2, until a point of convergence is reached.

### **1.3 Importance of Study**

3D fabrication is still a hot topic in research, but also a reality in the industry. Numerous university teams report results from fabricated 3D ICs with satisfactory results [6], Xilinx is offering products in "2.5D" since 2011, in what can be considered a step before real 3D integration. Tezzaron Semiconductors is also delivering 3D ICs, mainly memory-on-processors. Great expectations are placed on the imminent launch of Wide I/O DRAM memories [7], which are predicted to dominate the market thanks to their large bandwidth and compact size, combined with a processor (Figure 1.3).



Figure 1.3 Schematic of the impeding 3D DRAM modules

Despite the progress in fabrication, tools for 3D design, early verification and simulation are still immature in a commercial level. Software developers have recognized this lack and are in the verge of releasing 3D IC tools, as it was made clear in D43D 2012 conference in Lausanne. Nevertheless, research teams are also assisting by providing either design space explorations for various 3D systems [8], [9], or tools and methods to extract various electro-thermal results [10]. The developed tool contributes in both areas by providing a systematic way to estimate *IR*-Drop for 3D ICs including thermal effects, while at the same time exploring various partitioning options for 3D memory-on-processor systems.

### **1.4 Scope of Study**

As mentioned in the previous section, one of the most promising applications for 3D fabrication is the memory-on-processor system. The proximity of large quantities of memory to a processor ensures unparalleled performance in terms of power consumption and delay due to the reduced wire-lengths. At the same time the system offers a very compact size which is also suitable for mobile applications. There have been many candidates for the part of the memory, from volatile DRAM [7] to non-volatile Resistive

RAM (ReRAM) [11] but in this study the memory tiers are assumed to be SRAM. This choice is supported by the fact that layouts are easier to find and characterize than those of the other kinds.

Another clarification concerns the term "3D IC". For the purposes of this study, when referring to 3D, usage of Through Silicon Vias (TSV) is implied. There are many techniques for 3D packaging, like wire bonding, but the one that fully exploits the third dimension is TSV usage. Extensive research has reduced the diameter of the TSV down to some micrometers, allowing dense and reliable vertical interconnects. An excellent overview of the packaging options for 3D systems is presented in [5].

## 2. Related Work

Out of the multitude of studies that can be found in bibliography, four are selected for presentation. This choice cannot be considered exhaustive, though it suits the purposes of the current thesis.

In [8] a design space exploration for 3D SRAM cache is performed with the assistance of an extended version of Cacti [12], called 3D-Cacti. Fine level 3D partitioning, down to the transistor level, is rejected for reasons regarding the excessive area overhead imposed by vertical interconnections. Although TSV fabrication technology has improved radically since the publication of the paper, the same problem is also encountered in the current thesis as illustrated in Figure 2.1.



Figure 2.1 Relative dimensions of two TSVs and two memory cells in different technologies. The area overhead makes vertical connections expensive in terms of used silicon.

The major contribution of [8] arises from the introduction of novel topologies for 3D cache partitioning on a sub-array level and the topologies explored in the current work are heavily based on those presented in [8]. By utilizing 3D-Cacti the authors of [8] report on metrics such as delay and energy per cycle between different topologies, though there is no

mentioning of *IR*-Drop simulations. Another difference with the proposed work lies on the way of circuit modeling. In [8] all the results are produced analytically by means of hardcoded equations. On the other hand, the described tool makes use of extracted and characterized circuits to build systems that are afterwards simulated, providing accuracy very close to post-layout simulations but with significant time savings.

Finally on [8], there is a mention to thermal simulations which illustrate changes to the maximum temperature as the number of tiers is increased, up to three times when going from one to sixteen tiers. This prediction does not account for a processor layer, therefore providing underrated results, a fact that is also recognized in the paper.

A more PDN related study is conducted in [13], where an electrical TSV model able to simulate the differences in current distribution inside the via is developed. This model is then inserted into 3D PDNs and results for *IR*-Drop are reported. Although *IR*-Drop maps are provided for the PDN of a 3D system, there is no mention for the inclusion of thermal effects in the process. Moreover the explored 3D systems include only two tiers, limiting the scope of the results, whereas in the proposed tool the tiers are a free variable for the user to choose.

The remaining two publications both attempt an electro-thermal approach to the problem. In [10] the tight relation of the electrical properties of the PDN and the temperature conditions of the tiers is expressed through the governing differential equations. A multigrid method is then proposed for the simulation of the 3D system and temperature / IR-Drop maps are provided, along with details concerning the partitioning of the system for simulation purposes. A change of 29% is also reported on maximum IR-Drop due to thermal effects, which justifies the additional effort for a co-simulation tool.

The major similarity between this work and [10] is the ability to simulate systems with non-uniform power distribution maps and the utilization of a multigrid method. On the contrary, no comment is made on the structure of the PDNs and the TSV distribution is uniform for all tiers, while in the proposed tool the PDN is clearly defined, mimicking that of a real SRAM, and the TSVs can be placed in any coordinate. As with the current tool, a problem is encountered in [10] regarding the resolution of the grid for thermal simulations: In general the dimensions of TSV blocks are much smaller than those of the other circuit block of the systems. Consequently, high grid resolutions or non-uniform grids have to be imposed on the system during thermal analysis, in order to capture details concerning the TSVs.

The last work mentioned in this chapter is [9]. An extensive study of 3D systems, dealing with static and dynamic phenomena as well as reliability issues is conducted and various results are published. The flow used in [9] is illustrated in Figure 2.2, where many similarities can be observed with that of Figure 1.2. The electrical / thermal co-simulation is performed until a convergence state is reached, with the difference that in [9] the process also takes into account dynamic electrical phenomena, which is not the case for the presented tool.



Figure 2.2 Flow utilized in [9], supporting thermo-electrical co-analysis

The systems explored in [9] contain multiple tiers, up to ten, and are connected through TSVs which are modeled in high detail. This is a major differentiation from the previous works and describes more accurately a real 3D system. Apart from the recalculation of resistances in the PDN due to temperature changes, the effects of decoupling capacitances and inductances on the current paths are explored and trends are reported. This dynamic behavior allows the writers to also explore activity scenarios, where depending on the active tiers *IR*-Drop changes drastically, even up to seven times.

Other metrics reported deal with Mean Time To Failure (MTTF) of the TSVs due to excessive current density, where power gating is proposed as a way to improve MTTF, and reasonant frequency effects on the PDN. Lastly the authors investigate the potential improvement of both electrical and thermal behavior of the system through the usage of tapered TSVs. A simple observation on the duality of *IR*-Drop and temperature issues leads to the TSVs illustrated in Figure 2.3, which assist in alleviating both effects by almost 30%.



Figure 2.3 (a) Ideal solutions of TSV tapering for electrical and thermal purposes (b) Combination of solutions

In spite of the overall completeness of the study in [9], there are two important details in tier creation. Firstly the PDN grids are small ( $300\mu m \times 300\mu m$ ) and rectangular when the proposed tool has enhanced capabilities for size and complexity. Secondly the tiers are identical, containing only inverter circuits, whereas the proposed tool can contain combination of circuits with differences between tiers.

## 3. Characterization & Extraction

### **3.1 Introduction**

Power Delivery Networks (PDN) are modeled by means of conductances and current sources. The netlists created by these elements are simple enough for efficient simulation, yet offer relatively accurate results and reveal the nodes more prone to reliability failures.

A simple example of a PDN is presented in Figure 3.1(a). The resistances capture the *IR* -Drop on the power delivery wires and metal vias, while the current sources emulate the active devices, connecting the  $V_{DD}$  rail to the ground rail. Voltage to known nodes has to be forced with the assistance of independent voltage sources in order for the system to be solvable by simulators. Usually this voltage is the real  $V_{DD}$  and is forced on the nodes where power comes from the package pins.







Figure 3.1 Modeling of active power grids for DC (a) and AC (b) conditions  $% \left( {{\left( {{\bf{b}} \right)} \right)_{\rm{cond}}} \right)$ 

The desribed model of Figure 3.1(a) deals with steady state conditions, hence the current sources use the average value. In cases where transient effects need to be captured,

the model in Figure 3.1(b) is more appropriate. Wire capacitances are also included as shunted capacitors and the current sources are time varying, usually in the form of a triangle pulse whose total area equals to the average value, its peak is equivalent with the maximum value and an offset accounts for possible leakage currents. There are some cases, like in [9], when a smoother pulse is used, *e.g.* a Weibul distribution, purely for avoiding the discontinuity of the derivative at the peak of the triangle, therefore enabling iterative simulation tools to converge witout issues.

One very importanta detail is the additional inductance on the left of the circuit in Figure 3.1(b). This inductance represents the inductive behavior of the wire bond or the microbump and although is usually in the range of nano Henries, plays a significant role in dynamic *IR*-Drops. Voltage difference between the edges of an inductance is given by the well known equation :

$$V(t) = -L \frac{di}{dt}$$

and since the slew rate for current is usually in the order of some Mega Amperes per second, voltage drops of even hundreds of mV can occur.

Other dynamic elements, such as coupling capacitances and mutual inductances, may be modelled if metrics such as noise or crosstalk need to be calculated but as the complexity of the grid is increased so does the need for simulation time. For this particular tool only steady state simulations are performed and dynamic capabilities are a feature which will be included in the future. In this work the grids utilize only resistances and current sources.

Regarding the values of the current sources, two options are available: The use of analytical expressions derived from the circuit topologies and transistor equations, or utilization of characterization data from measurements / simulations. The main target of this tool is flexibility so the second choice is preferred. For every new topology the user wishes to try, only a table with current values has to be provided for the tool to work properly.

### **3.2 Characterization**

The process used to get the current values for all the implicated circuits in this work is summarized in Figure 3.2. Layouts and netlists are automatically generated through Faraday Technologies Memory Maker in UMC90nm technology. The layouts and netlists of the desired blocks are then edited in Cadence Virtuoso 5.1.41 and extracted by Assura 4.1. The final extracted (post-layout) netlists are simulated in HSpice 2010.12 and the measured currents are kept in Look-Up-Tables for further use. Since this work targets systems which operate under variable voltage <u>and</u> temperature conditions, the characterization expands in both domains. For example in this work, voltages are swept from 1V to 1.2V and temperatures from  $0^{\circ}$  C to  $110^{\circ}$  C. This translates into several measurements per circuit (approximately 600) and dictates automation of the final step, which is achieved by Perl scripting.



Figure 3.2 Characterization process

MeMaker is a commercial tool offered by Faraday Technology Corporation, which enables effortless automated design of SRAM memories in a variety of levels. It can produce layouts, netlists, hardware description files, abstract placement files, *etc.* in many well-known formats and for memory sizes ranging from 256 bits to 512 kbits. The utilized version cooperates with a 90nm technology process from UMC to deliver layouts similar to that in Figure 3.3(a).

The abstract view of the layout in Figure 3.3(b) is more suitable to describe the functionality of the produced memories. The whole memory is divided into four sub-arrays and surrounded by a power ring that delivers the necessary power to the circuits. In between the sub-arrays, two sets of row and column circuits are placed while the center area is reserved for timing circuitry and row pre-decoders. Row circuits contain decoders and

word-line drivers. Column circuits combine a tree multiplexer for column multiplexing, sense amplifiers, pre-charging circuits and write circuits, all of which connect to the bitlines of the sub-arrays.



Figure 3.3 (a) Layout of a MeMaker produced memory (b) Abstract top view of a memory array. The arrows represent the dependencies between the circuits

One interesting detail about the auxiliary circuits is that they are shared between subarrays, in a way similar to [14]. Row circuits are shared between left and right sub-arrays and similarly, column circuits serve both bottom and top sub-arrays. This relationship is represented in Figure 3.3(b) by bidirectional arrows. This technique assists in area gains and produces more compact layouts.

The following three sub-sections describe the details concerning the post-layout simulations and measurements of the three main circuit blocks. All the waveforms are replicated from the timing circuits of a fully operating memory on a clock of 100 MHz.

#### A. Row Circuits

The circuits that drive the word lines of the memory have two modes of operation: idle, in which only leakage currents contribute to the total drawn current and active. In the latter case the local decoders choose a driver and it changes the state of the word line from low to high, in order for the access transistors of the cells to operate. Therefore the drawn current is expected to increase by many factors in comparison with the idle case, mainly due to the power hungry drivers. One difference of row circuitry in respect with the other blocks is that the active mode encapsulates both reading and writing operations of the memory, since in both cases the access pattern to the row is similar.

Figure 3.4(a) is a screenshot of the actual layout extracted for characterization purposes. The row circuits on the left have a separate power source from the load on the right, so that current information is not mixed. The load on the right changes from 32 to 512 cells in powers of two (32, 64, 128, 256, 512) and for each case the measurements are repeated.



Figure 3.4 (a) Topology used for row circuitry characterization, load of 32 cells wide. (b) Signals forced on the circuits. From top to bottom : Decoder signal, global word-line, local word-line, global bit-line, global complementary bit-line

For idle measurements the circuits are kept inactive for the first 15 ns of simulation, so that all transient effects diminish and then, for the next 5 ns, the average current of the row circuitry is measured. This measurement corresponds to leakage currents. Alternatively, for active measurements, the waveforms presented in Figure 3.4(b) are forced on the netlist. As mentioned before, these waveforms are produced from a sample memory operating at 100 MHz and are replicated during characterization, resulting in accurate measurements. One important detail is that the dummy array is first set to the desired initial conditions, as shown by the two access cycles in the waveforms. During the first access '0' is stored in every cell, followed by a pre-charging of the bit-lines into '1' before the second access. This way the gate leakage current of the access transistors takes expected values. Finally, maximum and average current drawn by the row circuits are measured during the second access (4 ns to 6.5 ns).

#### B. Column Circuits

Faraday MeMaker offers three modes of column multiplexing, namely 4, 8 and 16 to 1 multiplexing. This means that the following process is repeated for every one of those three cases. Another point is that info regarding access currents for the cells is also gathered during column circuit characterization, in writing and reading modes. The column circuits are subsequently characterized for idle, writing and reading conditions.

The layout of a testcase is presented in Figure 3.5(a), with the column circuit on the bottom and the load on top of it. The load cells once more range from 32 to 1024 in powers of two (32, 64, 128, 256, 512, 1024) and have a separate power source, allowing concurrent measurements on the array and the column circuits.

Idle measurements are performed in a way similar to the row case, keeping the circuits inactive for 20 ns and measuring the average current during the last 5 ns of simulation after transient effects have seized. A second set of measurements captures operating currents, with the inputs shown in Figure 3.5(b). This process has three steps: initializing all the cells, writing in a column and finally reading the same column. In the last two phases maximum and average currents are measured, both for the column circuit and the implicated memory cell. Total time for each simulation is 11.15 ns.



(a)

Figure 3.5 (a) Topology used for column circuitry characterization, load of 32 cells tall. (b) Signals forced on the circuits. From top to bottom: Pre-charge signal, multiplexing signal, word-line signal, sense enable signal, write enable signal.

A simplification regarding the patterns of stored data and bit-lines state is necessary in the described process. Generally, leakage and / or access currents of a cell differ between data patterns. For example, if a cell holds a '1' and a '0' is written in it, current values will increase comparing to a case where a '1' is written. Many dependencies like this exist but cannot be modelled accurately because they would require a detailed record of all the data in the memory.

#### C. SRAM Cell

All the memories created by MeMaker for the purposes of this work make use of a sixtransistor, wide SRAM cell, the layout of which is illustrated in Figure 3.6(a). The designs are supposed to be included in high performance systems, so there is no need for a cell suitable for low-voltage operation. Moreover, the wide topology of the layout is essential to the manufacturability and yield of the memory [15], reducing at the same time the loads on both word and bit-lines. Access currents for the memory cell are determined as described above in subsection B, so only leakage currents have to be measured in separate simulations. For this purpose, layouts like that in Figure 3.6(b) are created, which contain multiple cells. Simulations follow the same idea as in the other circuits, so the cells are first initialized and after sufficient time has passed and transient effects virtually disappear, the average current is measured for the whole array. This value is then divided by the number of cells to result in the leakage current per cell. Multiple simulations on arrays of different size, as well as cross-checking with measurements on individual cells, show that the described method experiences no accuracy loss due to the final division.



Figure 3.6 (a) Layout of the used "wide" cell. (b) Layout of a 32×16 cell array for leakage estimation purposes.

The measured leakage currents are plotted in Figure 3.7, where each sub-figure corresponds to one circuit block. Temperature and voltage are swept as described before, leading to the shown curves. One immediate observation is the exponential character of the curves, which is expected and highly justified [16]. In fact the quality of the curves in Figures 3.7(a) and (b) verifies that the used characterization technique successfully suppressed transient effects. On the contrary, in Figure 3.7(c) some inaccuracies manifest, probably due to unresolved transient charges, but again an exponential trend is adequately followed. Additionally, operating voltage has a linear impact on all cases.



(a)



(b)



Figure 3.7 Leakage currents for the explored voltage-temperature variable space: (a) cell leakage (b) row circuits leakage (c) column circuits leakage, 16:1 MUX

The quantity of data for active circuits is impractical for full representation but two plots are attached in order to reveal a significant problem which was encountered during this phase. The first characterization attempt for read/write conditions was performed for the nominal<sup>1</sup> frequency of 166 MHz and a plot is illustrated in Figure 3.8(a). For each operating voltage there is a temperature where a sudden discontinuity for the current is observed (and *vice versa*) signaling a failure of operation. Since in this work the full range of voltage and temperature is absolutely necessary, the only possible solution is down-scaling the frequency of operation. The corrected curve, captured at 100 MHz, is shown in Figure 3.8(b) with a slight decrease in average current, due to the reduced clock frequency.

### **3.3 PDN Extraction**

During the extraction process for simulation purposes, resistivity information is also captured in the netlists. This fact, along with the regularity of the design, especially in the arrays, allows the reproduction of the PDN. Another important target set before the development of the tool is capturing power drop down to the cell level, which means that all the wires and vias of a power delivery path have to be taken into account.

First, a high level view of the PDN of a memory is presented in Figure 3.9. Column and row circuits form a chain that starts on the outer power distribution ring and ends in the center of the memory, containing wide wires with negligible resistance to support the increased need for current of those blocks. Alternatively, the grid of the array differs for power and ground distribution.  $V_{DD}$  is distributed by vertical wires only and does not contain any significant sub-grids, whereas ground wires create a mesh of both vertical and horizontal wires in multiple metal levels. Based on the previous description, an estimation for the direction of the IR-Drop is noted by the arrows in the figure for both power nodes.

<sup>&</sup>lt;sup>1</sup> According to the datasheet provided by Faraday.





Figure 3.8 Active currents for two interesting cases: (a) operation at 166 MHz where the circuits fail (b) operation at 100 MHZ with no failings



Figure 3.9 Detail of the power delivery network and the estimation of *IR*-Drop direction, indicated by the arrows

The power sub-grids of the arrays mentioned in the previous paragraph have an effect on current source modeling too. For layout compactness reasons, several vias are shared between adjacent cells, creating the pattern illustrated in Figure 3.10. As a result all the cells, and their respectful current sources, share their  $V_{DD}$  and ground contacts, creating dependencies between electrical nodes. Special attention is provided during system synthesis to preserve those relations.



Figure 3.10 Pattern of contact sharing between adjacent cells targeting at compact designs

## 4. Synthesis

### 4.1 Memory Topologies

This work focuses on memory-on-processor 3D systems with the partitioning of SRAM in various 3D topologies in a block level. As a result, certain partitioning strategies have to be employed and applied, in order to report differences in *IR*-Drop behaviour. These partitioning schemes are heavily based on [8] but also reflect on the restrictions the characterized circuits impose.

In [8] a generic sub-array is utilized and all the auxiliary circuits are available to every memory sub-block, thus providing the designer with flexibility. In this case though, circuit sharing prohibits the usage of many combinations of partitioning, as described in [8], but on the same time the symmetries of the original 2D layout permit the introduction of two new schemes. All the supported topologies are described in the following paragraphs, along with their expected advantages and disadvantages.

### A. Stacking Topology (STACK)

The first option is a rough extension of the 2D design into 3D, which can be used as a baseline for all other topologies. In this case each tier contains a separate, fully operational memory, along with all the required circuits that does not share any blocks or signals with other tiers. An abstract view of a 2-tiered STACK memory is presented in Figure 4.1. Although tiers do not share signals or circuits, it is apparent that power TSVs traverse all the tiers in the Z direction, providing current to all of them.

The employed strategy does not make full usage of 3D capabilities, adopting a simplistic migration to 3D integration of 2D circuits and so no gains are to be expected in terms of *IR*-Drop. On the contrary, such a naïve approach will possibly lead to worse voltage metrics as the number of tiers is increased and additional current is introduced to the power paths.



Figure 4.1 STACK topology, two tiers

On the advantages, the lack of signal TSVs (except for those that carry address and word bits) means that layouts can be used without the major changes 3D integration requires, due to TSV design rules. Also for the same reasons the area overhead is negligible.

#### B. Word-Line Sharing (3DWL)

In the original work of [8], 3DWL, which stands for 3D Word-Line sharing, is a strategy to split each sub-array into two or more parts along the direction of the word-line and then partition them into two or more different tiers (or active layers). Row drivers are then decoded by the same signals across several tiers, so when a word is read or written, each part of it exists in a different sub-array. The choice of splitting row drivers or keeping them concentrated and using TSVs to operate on the word-lines depends on the area constraints of the design.

In the current work sub-arrays are not split into more than two tiers and row circuits are concentrated on one tier for the following reasons : Firstly, the characterized row circuits are shared between sub-arrays, effectively reducing the possible topologies of 3DWL. Furthermore the captured row-circuit currents include a decoder and four drivers (there is a 4:1 row multiplexing scheme) making a separation into smaller sources impossible. Even when deciding to keep row circuits intact on a tier, a problem concerning signal TSV loading arises. According to [17], each TSV (approximate diameter of 1-5  $\mu$ m) has a load which is equivalent to 30 cells in this case and expected to increase for smaller technological nodes. Additionally, the characterized circuits can drive up to 512 cells. These

two reasons combined lead to a decision of no sharing beyond two tiers for practical reasons.

An abstract view of the used 3DWL topology is illustrated in Figure 4.2. The vertical arrows represent the signal TSVs that drive the word-lines of the bottom tier, while row circuits exist on one layer only. Improvement in *IR*-Drop is expected to be both intra and inter tier. Since the form factor of the memory changes, power wires in the X direction are halved and so each sub-array experiences less effective resistance in its ground path. On top of that, active blocks are split between tiers so currents are distributed in a more uniform way on power TSVs and cause smaller *IR*-Drops on them.



Figure 4.2 3DWL topology, two tiers

#### C. Bit-Line Sharing (3DBL)

In the original work of [8], 3DBL, which stands for 3D Bit-Line sharing, is a strategy to split each sub-array into two or more parts along the direction of the bit-lines and then partition them into two or more different tiers (or active layers). The choice of splitting auxiliary column circuits or keeping them concentrated and using TSVs to operate on the bit-lines depends on the area constraints of the design.

In the current work sub-arrays are not split into more than two tiers and column circuits are concentrated on one tier for the same reasons described above. Another major problem is the required area overhead for signal TSVs. Each column contains both the normal and complimentary bit-line, which translates into two signal TSVs per column. The total area required by the TSVs to vertically connect two sub-arrays is, in this case, the size of the sub-

array itself! This is the main reason why in the following chapters 3DBL, along with its counterpart XX, is not considered an efficient implementation and is not simulated at all.

An abstract view of the used 3DBL topology is illustrated in Figure 4.3. The vertical arrows represent the signal TSVs that drive the bit-lines of the bottom tier, while column circuits exist on one layer only. Improvement in *IR*-Drop is expected to be both intra and inter tier. Since the form factor of the memory changes, power wires in the Y direction are halved and so each sub-array experiences less effective resistance in its  $V_{DD}$  path. On top of that, active blocks are split between tiers so currents are distributed in a more uniform way on power TSVs and cause smaller *IR*-Drops on them.



Figure 4.3 3DBL topology, two tiers

#### D. Symmetrical Word-Line Sharing (YY)

This topology is presented for the first time in this work but is directly derived from 3DWL. It is observed that the original 2D layout of the memory is symmetrical along the YY' and the XX' axis and this property leads to the proposed design. Instead of splitting the sub-arrays and moving them to different tiers, the layout is effectively folded along the YY' axis and each half is placed in a different active layer. Again, as in 3DWL, row circuitry is shared between tiers and both tiers are active during operation. Also for the same reasons as before, the number of tiers for each memory is restrained to two.

An abstract view of YY topology is illustrated in Figure 4.4. The vertical arrows represent the signal TSVs that drive the word-lines of the bottom tier, while row circuits exist on one layer only. Improvement in *IR*-Drop is expected to follow the trends of 3DWL but with small variations.


Figure 4.4 YY topology, two tiers

#### E. Symmetrical Bit-Line Sharing (XX)

This topology is presented for the first time in this work but is directly derived from 3DBL. It is observed that the original 2D layout of the memory is symmetrical along the YY' and the XX' axis and this property leads to the proposed design. Instead of splitting the sub-arrays and moving them to different tiers, the layout is effectively folded along the XX' axis and each half is placed in a different active layer. Again, as in 3DBL, column circuitry is shared between tiers and both tiers are active during operation. Also for the same reasons as before, the number of tiers for each memory is restrained to two.

An abstract view of XX topology is illustrated in Figure 4.5. The vertical arrows represent the signal TSVs that drive the bit-lines of the bottom tier, while column circuits exist on one layer only. Improvement in *IR*-Drop is expected to follow the trends of 3DBL but with a small variations.



Figure 4.5 XX topology, two tiers

## **4.2 Synthesis Rules**

The generated netlists should follow a format which allows easy simulation and verification. For this work the SPICE-like format introduced in [18], for the creation of large power grid benchmarks, is preferred. Conventions from [18] concerning naming of devices and nodes are kept almost unchanged, as they provide a systematic method of name generation. Table 4.1 contains examples for all the devices that exist in a netlist, taken from a real benchmark.

By observing the table below, two of the three important rules become obvious: All names are coordinate based and the active devices are split into two parts. The first rule is very important when using solvers such as those described in [19], where coordinates provide the info of row and column inside the grid.<sup>2</sup> The downside of this method is that for SRAM, a densely packed circuit, defining unique coordinates becomes problematic because of overlapping elements. On the contrary, the second rule does not add any difficulties in netlist creation, while on the same time simplifies simulation procedure. The power grid is separated into a ground part and a  $V_{DD}$  part, each with its related devices and are then analysed separately, reducing computational effort. In this case currents are divided evenly between ground and  $V_{DD}$  grids but this should not affect the final result (Figure 4.6).

<sup>&</sup>lt;sup>2</sup> Before moving to a multi-grid solver, even this work used a previously coded iterative row-based solver for the first simulations. It was later replaced with the more efficient multi-grid solver but the naming conventions were not changed

| Description       | SPICE Name       | Node A     | Node B     | Value (SI units)  |
|-------------------|------------------|------------|------------|-------------------|
| Grid Resistance   | R_vdd2_4_16_3    | n2_4_16_3  | n2_4_17_3  | 0.4752175         |
| Grid Via          | V_via3_12_3_2    | n1_12_3_2  | n3_12_3_2  | 8.342317          |
| Cell Device (GND) | iBL_gnd_35_49_1  | 0          | n1_35_50_1 | 0.2064453125e-09  |
| Cell Device (VDD) | iBLB_vdd_53_57_2 | n0_53_58_2 | 0          | 0.2064453125e-09  |
| CPU Device        | iBCPU_vdd_68_2_0 | n2_68_2_0  | 0          | 0.0625            |
| TSV               | V_TSV_68_2_1     | n2_68_2_1  | n2_68_2_2  | 0.046951871657754 |

Table 4.1 Examples of coordinate based, SPICE name generation

The third rule is not enforced by any standard, rather by the need for simplicity during the construction of the power grid simulator: The netlists are not hierarchically oriented, meaning that all devices are flattened prior to analysis. Commercial tools, such as HSPICE, provide the ability to create large netlists with the help of sub-circuit modules, thus saving time and code lines for the designer. The netlist is then flattened just before simulation. Such a scheme, though practical, is not easy to be implemented for a university lab simulator as it requires lots of effort and time. Therefore the simulator in this work requires an already flat netlist.

The troubling disadvantage of this restriction is that for large netlists an equally large text file has to be created from scratch. Experiments in this work include netlists with up to ten million electrical nodes, which correspond to a spice file of about one gigabyte. Apart from the time overhead of parsing such a file during simulation, there is also an efficiency issue at the time of creation. For this reason a modular approach is adopted, in which the netlist for each sub-block is first created with generic coordinates and current values that are afterwards replaced and appended in the final file.



Figure 4.6 Separation of the components of a power grid in order to enable independent analysis for the two power nodes. (a) Original network (b) Transformed network, notice the divided current sources

## 4.3 Synthesis Options

The user interacts with the tool through an options file, the current version of which can be found in the appendix. A list of all the important variables together with their value range in this work is grouped in Table 4.2 below:

| Total Size            | In KB, ranges from 1 KB to 64 KB per tier.                                                            |
|-----------------------|-------------------------------------------------------------------------------------------------------|
| Word Length           | In bits, typical values of 16, 32, 64.                                                                |
| Mux Factor            | Column multiplexing factor, values of 4, 8 and 16.                                                    |
| Topology              | Stack, 3DWL, 3DBL, XX, YY.                                                                            |
| BlockX                | Width of basic building block in cells, typical values of 4 to 64.                                    |
| BlockY                | Length of basic building block in cells, typical values of 4 to 64.                                   |
| Number of Dies        | Number of memory tiers, usually 2 to 8.                                                               |
| TSVS (array)          | Array with number of power TSVs per tier, minimum of 8 per tier.                                      |
| Temperature           | Initial temperature for thermal simulation.                                                           |
| Voltage               | Ideal V <sub>DD</sub> from the package.                                                               |
| IR-Drop<br>Iterations | Number of iterations in the inner electrical simulation loop, at least 2 are needed.                  |
| Thermal<br>Iterations | Number of iterations in the outer thermal simulation loop, at least 1 is needed.                      |
| CPU Current           | Total current drawn from the CPU tier at the bottom of the system, usually 10 to 50 A/cm <sup>2</sup> |
| Ref.<br>Temperature   | <i>Reference temperature for resistivity recalculation, 20°C or 25°C.</i>                             |
| Hotspot               | Resolution of the grid imposed on the layout for thermal simulation purposes, depends on the          |
| Resolution            | layout size, typical value of 512×512.                                                                |
| TSV Diameter          | Diameter of the (cylindrical) power TSVs, starting from 1 $\mu$ m.                                    |
| Package<br>Resistance | The equivalent DC resistance of the conductor connecting the chip with the ideal power supply.        |

#### Table 4.2 Important synthesis options, user defined

The total size of an independent 3D memory system ranges from 1 KB to 64 KB per tier mainly because of the restrictions imposed by the characterized circuits, which do not offer the freedom of horizontal partitioning in more than four sub-arrays. Therefore for a 8-tiered IC, the tool can create the full power grid of a 512 KB memory. Word length and multiplexing factor have an impact mainly on the grouping of column circuits and the width of the memory. The only restriction for these variables is that their product should not exceed the maximum cell driving capacity of the row circuits, which in this case is 512 cells (or bits).

Block dimension parameters are very crucial during the creation of the netlist, as well as for thermal simulation purposes. These two define the size of the pre-constructed blocks that are appended iteratively in order for the netlist to be created, so in general small blocks lead to smaller memory usage during execution. More importantly the size of the block defines the resolution of the thermal analysis, since the block is considered the smallest unit for which a separate temperature can be defined. The examples in Figure 4.6 contain the same part of a system floor plan for different block sizes. Finer modeling of the circuit leads to better accuracy followed by intensified simulations (Figure 4.7(a)), comparing to less detailed but faster blocks of bigger size (Figure 4.7(c)).

Regarding the number of memory dies (or tiers) the tool is not limited in any way, so the user can request any integer value. For this work a reasonable limit of eight memory tiers plus one CPU tier is set, reflecting expectations in industry and research for 3D integration. Another variable significant to the final *IR*-Drop value is the TSV number array. Through this array the tool connects adjacent tiers with the defined number of TSVs, partitioning them evenly in the periphery of the power ring. For example, a 4-tiered system with an array equal to [16,16,16] would use 16 TSVs between each tier and the vertical connections would eventually form a pillar. On the other hand, if the array is [8,16,32] then the number of TSVs would increase as the tiers go from the one furthest from ideal  $V_{DD}$  to the one closest to it, effectively creating a form of tapering that would assist in improving levels of *IR*-Drop but with less TSVs and area overhead.

Voltage and temperature variables are necessary in order to set the initial conditions in all blocks for the electrical and thermal solvers, even if after the simulations all ideal values in the system have been replaced by the real ones. As explained in previous chapters, voltages in this work range from 1V to 1.2V and temperatures from 0 °C to 110 °C, but the tool can support any values as long as the correct characterization files are provided.

|       | TSV                   | 0,516 0      |              |              |              | TSV 65       | 516 0        |              | TSV 130 516 0 |      |
|-------|-----------------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|---------------|------|
|       | array_0_31_0          | array_1_31_0 | array_2_31_0 | array_3_31_0 | array_4_31_0 | array_5_31_0 | array_6_31_0 | array_7_31_0 | row_0_31_0    | arra |
|       | array_0_30_0          | array_1_30_0 | array_2_30_0 | array_3_30_0 | array_4_30_0 | array_5_30_0 | array_6_30_0 | array_7_30_0 | row_0_30_0    | arra |
|       | array_0_29_0          | array_1_29_0 | array_2_29_0 | array_3_29_0 | array_4_29_0 | array_5_29_0 | array_6_29_0 | array_7_29_0 | row_0_29_0    | arra |
|       | array_0_28_0          | array_1_28_0 | array_2_28_0 | array_3_28_0 | array_4_28_0 | array_5_28_0 | array_6_28_0 | array_7_28_0 | row_0_28_0    | arra |
|       | array_0_27_0          | array_1_27_0 | array_2_27_0 | array_3_27_0 | array_4_27_0 | array_5_27_0 | array_6_27_0 | array_7_27_0 | row_0_27_0    | arra |
|       | array_0_26_0          | array_1_26_0 | array_2_26_0 | array_3_26_0 | array_4_26_0 | array_5_26_0 | array_6_26_0 | array_7_26_0 | row_0_26_0    | arra |
|       | array_0_25_0          | array_1_25_0 | array_2_25_0 | array_3_25_0 | array_4_25_0 | array_5_25_0 | array_6_25_0 | array_7_25_0 | row_0_25_0    | arra |
| TSV_Q | array_0_24_0<br>386_0 | array_1_24_0 | array_2_24_0 | array_3_24_0 | array_4_24_0 | array_5_24_0 | array_6_24_0 | array_7_24_0 | row_0_24_0    | arra |
|       | array_0_23_0          | array_1_23_0 | array_2_23_0 | array_3_23_0 | array_4_23_0 | array_5_23_0 | array_6_23_0 | array_7_23_0 | row_0_23_0    | arra |
|       | array_0_22_0          | array_1_22_0 | array_2_22_0 | array_3_22_0 | array_4_22_0 | array_5_22_0 | array_6_22_0 | array_7_22_0 | row_0_22_0    | arra |
|       | array_0_21_0          | array_1_21_0 | array_2_21_0 | array_3_21_0 | array_4_21_0 | array_5_21_0 | array_6_21_0 | array_7_21_0 | row_0_21_0    | arra |
|       | array_0_20_0          | array_1_20_0 | array_2_20_0 | array_3_20_0 | array_4_20_0 | array_5_20_0 | array_6_20_0 | array_7_20_0 | row_0_20_0    | arra |
|       | array_0_19_0          | array_1_19_0 | array_2_19_0 | array_3_19_0 | array_4_19_0 | array_5_19_0 | array_6_19_0 | array_7_19_0 | row_0_19_0    | arra |
|       | array_0_18_0          | array_1_18_0 | array_2_18_0 | array_3_18_0 | array_4_18_0 | array_5_18_0 | array_6_18_0 | array_7_18_0 | row_0_18_0    | arra |
|       | array_0_17_0          | array_1_17_0 | array_2_17_0 | array_3_17_0 | array_4_17_0 | array_5_17_0 | array_6_17_0 | array_7_17_0 | row_0_17_0    | arra |
|       | array_0_16_0          | array_1_16_0 | array_2_16_0 | array_3_16_0 | array_4_16_0 | array_5_16_0 | array_6_16_0 | array_7_16_0 | row_0_16_0    | arra |
| TSV_Q | 258_0                 |              |              |              |              |              |              |              |               |      |
|       | column_0_0_0          | column_1_0_0 | column_2_0_0 | column_3_0_0 | column_4_0_0 | column_5_0_0 | column_6_0_0 | column_7_0_0 |               | colu |
|       |                       |              |              |              |              |              |              |              |               |      |
|       |                       |              |              |              |              |              |              |              |               |      |

## (a)

|       | TSV 0. 516 0           | )            | 516 0        | TSV_130_516_0 |            |   |  |
|-------|------------------------|--------------|--------------|---------------|------------|---|--|
|       | array_0_15_0           | array_1_15_0 | array_2_15_0 | array_3_15_0  | row_0_15_0 | a |  |
|       | array_0_14_0           | array_1_14_0 | array_2_14_0 | array_3_14_0  | row_0_14_0 | a |  |
|       | array_0_13_0           | array_1_13_0 | array_2_13_0 | array_3_13_0  | row_0_13_0 | a |  |
| [SV_0 | array_0_12_0<br>_386_0 | array_1_12_0 | array_2_12_0 | array_3_12_0  | row_0_12_0 | a |  |
|       | array_0_11_0           | array_1_11_0 | array_2_11_0 | array_3_11_0  | row_0_11_0 | a |  |
|       | array_0_10_0           | array_1_10_0 | array_2_10_0 | array_3_10_0  | row_0_10_0 | a |  |
|       | array_0_9_0            | array_1_9_0  | array_2_9_0  | array_3_9_0   | row_0_9_0  | ; |  |
|       | array_0_8_0            | array_1_8_0  | array_2_8_0  | array_3_8_0   | row_0_8_0  | ; |  |
| [SV_0 | _258_0<br>column_0_0_0 | column_1_0_0 | column_2_0_0 | column_3_0_0  |            | С |  |

|       | TSV 0, 516 0           | TSV 65 516 0 | TSV 130 516 0 |
|-------|------------------------|--------------|---------------|
|       | array_0_7_0            | array_1_7_0  | row_0_7_0     |
| TSV_0 | array_0_6_0            | array_1_6_0  | row_0_6_0     |
|       | array_0_5_0            | array_1_5_0  | row_0_5_0     |
|       | array_0_4_0            | array_1_4_0  | row_0_4_0     |
| TSV_0 | _258_0<br>column_0_0_0 | column_1_0_0 |               |

(c)

Figure 4.7 Part of a memory floorplan for different block dimensions. Block width and length can be unequal if a different aspect ratio is required. (a) 16×16 (b) 32×32 (c) 64×64

## 5. Results

### **5.1 Introduction**

This chapter focuses on the presentation of the output files produced by the tool, as well as on a small study of estimated *IR*-Drop for several systems. First the floorplans of the simulated topologies (STACK, 3DWL, YY) are verified against the schematics from chapter 4 and all the implicated circuits are outlined. Voltage drop distribution for the same systems is also illustrated through the means of contour-like images, emphasizing mainly on intra-tier *IR*-Drop. The above systems are all 2-tiered, with 16 KB per tier and no CPU current contribution, with an initial simulation temperature of 60 °C and TSV diameter of 1µm. Block dimensions are  $64 \times 64$ . As for the operating conditions, each memory system writes a 32-bit word, which is the worst case scenario and the one with the highest probability of failure.

After the analysis of each topology an exploration of the impact of different topologies and options on the maximum *IR*-Drop is performed, where multi-tiered systems of sizes up to 256 KB are simulated. Reliable operation issues are reported along with possible solutions. An extra figure visualizing the electrical strain on the TSVs and the subsequent electro-migration effects is also discussed.

Finally some comments are given regarding temperatures among the tiers and their effect on operation. Conclusions and future work follow in a separate chapter. Apart from the files described below, the tool also produces layer description files, power trace files and temperature trace files. Samples of those ccan be found in the appendix.

### **5.2 Topology Verification**

The visualized floorplans of the three topologies (STACK, 3DWL, YY) are illustrated in Figures 5.1, 5.2 and 5.3 respectively, with each one containing two sub-figures, one for the top and one for the bottem tier. In STACK topology the two independent memories include all of the required circuits for operation, whereas in 3DWL and YY the bottom tiers (Tier 0) lack the row circuits since the signals originate on the top tier. Instead of leaving the space blank, a dummy block "row\_gap" is inserted for thermal simulation reasons. Another major difference is the change of aspect ratio between layouts. STACK

floorplan has a larger width because each memory line is operated by column circuits in the same tier. On the contrary word-line sharing topologies utilize column circuits in both tiers, resulting in narrower layouts.

|     | TS  | V_0_260_0                      |              | TSV_129_260_0 |              |                  | 50_0         | TSV_383      | 260_0        |                               |
|-----|-----|--------------------------------|--------------|---------------|--------------|------------------|--------------|--------------|--------------|-------------------------------|
| TST | V O | array_0_3_0                    | array_1_3_0  | array_2_3_0   | array_3_3_0  | row_0_3_0        | array_4_3_0  | array_5_3_0  | array_6_3_0  | array_7_3_0<br>TSV_51         |
|     | *_0 | array_0_2_0                    | array_1_2_0  | array_2_2_0   | array_3_2_0  | row_0_2_0        | array_4_2_0  | array_5_2_0  | array_6_2_0  | array_7_2_0                   |
| rs  | V_0 | _1 <u>30_0</u><br>column_0_0_0 | column_1_0_0 | column_2_0_0  | column_3_0_0 |                  | column_4_0_0 | column_5_0_0 | column_6_0_0 | TSV_51<br>column_7_0_0        |
| TS  | v_0 | _ <b>6a6<u>m</u>a0</b> y_0_1_0 | array_1_1_0  | array_2_1_0   | array_3_1_0  | row_0_1_0        | array_4_1_0  | array_5_1_0  | array_6_1_0  | array_7 <u>T<b>\$</b>V0</u> 5 |
| T   | sv_ | $p_2^{anay_0_0_0}$             | array_1_0_0  | array_2_0_0   | array_3_0_0  | row_0_0_0        | array_4_0_0  | array_5_0_0  | array_6_0_0  | array_7_0_0_5                 |
|     | 1   | <u>SV_0_0_0</u>                |              | 15V_129_0_0   |              | <u>18V_258_(</u> | _0           | 15V_3        | 0_0_0        |                               |

#### (a)

| TS    | V_0_260_1                    |              | TSV_129_260_1 |              | TSV_258_26 | 50_1         | TSV_387      | 260_1        |                                |
|-------|------------------------------|--------------|---------------|--------------|------------|--------------|--------------|--------------|--------------------------------|
| TSV 0 | array_0_3_1<br>194_1         | array_1_3_1  | array_2_3_1   | array_3_3_1  | row_0_3_1  | array_4_3_1  | array_5_3_1  | array_6_3_1  | array_7_3_1<br>TSV_516         |
|       | array_0_2_1                  | array_1_2_1  | array_2_2_1   | array_3_2_1  | row_0_2_1  | array_4_2_1  | array_5_2_1  | array_6_2_1  | array_7_2_1                    |
| TSV_0 | <u>130_1</u><br>column_0_0_1 | column_1_0_1 | column_2_0_1  | column_3_0_1 |            | column_4_0_1 | column_5_0_1 | column_6_0_1 | TSV_516<br>column_7_0_1        |
| TSV_0 | _666r_aty_0_1_1              | array_1_1_1  | array_2_1_1   | array_3_1_1  | row_0_1_1  | array_4_1_1  | array_5_1_1  | array_6_1_1  | array_7 <u>T<b>S</b>V1</u> 510 |
| TSV_  | 0_2_1                        | array_1_0_1  | array_2_0_1   | array_3_0_1  | row_0_0_1  | array_4_0_1  | array_5_0_1  | array_6_0_1  | array_7_0_1_51                 |
| Т     | SV 0 0 1                     |              | TSV 129 0 1   |              | TSV 258 (  | 1            | TSV 38       | 7 0 1        |                                |

#### (b)

Figure 5.1 Floorplans of STACK system. (a) Tier 0 (b) Tier 1, closest to  $V_{\text{DD}}$ 

| Т    | SV 0.516 0               | TSV 65 516 0  | TSV 130 510       | 5 0 TSV 195  | 516 0                    |      |
|------|--------------------------|---------------|-------------------|--------------|--------------------------|------|
|      | array_0_7_0              | array_1_7_0   | <br>row_gap_0_7_0 | array_2_7_0  | array_3_7_0              |      |
| V 0  | array_0_6_0<br>386_0     | array_1_6_0   | row_gap_0_6_0     | array_2_6_0  | array_3_6_0<br>TSV 26    | 038  |
|      | array_0_5_0              | array_1_5_0   | row_gap_0_5_0     | array_2_5_0  | array_3_5_0              |      |
|      | array_0_4_0              | array_1_4_0   | row_gap_0_4_0     | array_2_4_0  | array_3_4_0              |      |
| V_0, | _258_0<br>column_0_0_0   | column_1_0_0  |                   | column_2_0_0 | TSV_26<br>column_3_0_0   | 0_25 |
|      | array_0_3_0              | array_1_3_0   | row_gap_0_3_0     | array_2_3_0  | array_3_3_0              |      |
| V_0  | _130 <u>n</u> m0ay_0_2_0 | array_1_2_0   | row_gap_0_2_0     | array_2_2_0  | array_3_ <b>⊉</b> S_№_26 | 0_13 |
|      | array_0_1_0              | array_1_1_0   | row_gap_0_1_0     | array_2_1_0  | array_3_1_0              |      |
| SV_  | 0_2_0<br>0_2_0           | array_1_0_0   | row_gap_0_0_0     | array_2_0_0  | array_3_0_0<br>TSV_2     | 60_2 |
|      | <u>13v_0_0_0</u>         | <u>00_</u> 00 | 131 130 0         | 0 151 19     |                          |      |

#### (a)

| T   | SV_0_516_1               | TSV_65_516_1 |           | 5_1 TSV_195     | _516_1                   |      |
|-----|--------------------------|--------------|-----------|-----------------|--------------------------|------|
|     | array_0_7_1              | array_1_7_1  | row_0_7_1 | array_2_7_1     | array_3_7_1              |      |
| V 0 | array_0_6_1              | array_1_6_1  | row_0_6_1 | array_2_6_1     | array_3_6_1<br>TSV_26    | 0 38 |
|     | array_0_5_1              | array_1_5_1  | row_0_5_1 | array_2_5_1     | array_3_5_1              | ,    |
|     | array_0_4_1              | array_1_4_1  | row_0_4_1 | array_2_4_1     | array_3_4_1              |      |
| V_0 | _258_1<br>column_0_0_1   | column_1_0_1 |           | column_2_0_1    | TSV_26<br>column_3_0_1   | 0_25 |
|     | array_0_3_1              | array_1_3_1  | row_0_3_1 | array_2_3_1     | array_3_3_1              |      |
| V_0 | _130 <u>an</u> iay_0_2_1 | array_1_2_1  | row_0_2_1 | array_2_2_1     | array_3_ <u>72S</u> ¥_26 | 0_13 |
|     | array_0_1_1              | array_1_1_1  | row_0_1_1 | array_2_1_1     | array_3_1_1              |      |
| SV  | 2 array_0_0_1            | array_1_0_1  | row_0_0_1 | array_2_0_1     | array_3_0_1<br>TSV 2     | 602  |
| -   | <br>[SV_0_0_1            | TSV_65_0_1   | TSV_130_0 | <u>1 TSV_19</u> | 5_0_1                    | -    |

#### (b)

Figure 5.2 Floorplans of 3DWL system. (a) Tier 0 (b) Tier 1, closest to  $V_{\text{DD}}$ 

| Т   | SV_0_516_0               |              |              | 0 TSV_195_   | 516_0                       |
|-----|--------------------------|--------------|--------------|--------------|-----------------------------|
|     | aıray_0_7_0              | array_1_7_0  | array_2_7_0  | array_3_7_0  | row_gap_0_7_0               |
| . 0 | array_0_6_0              | array_1_6_0  | array_2_6_0  | array_3_6_0  | row_gap_0_6_0<br>TSV_260_38 |
|     | array_0_5_0              | array_1_5_0  | array_2_5_0  | array_3_5_0  | row_gap_0_5_0               |
|     | array_0_4_0              | array_1_4_0  | array_2_4_0  | array_3_4_0  | row_gap_0_4_0               |
| _0  | _258_0<br>column_0_0_0   | column_1_0_0 | column_2_0_0 | column_3_0_0 | TSV_260_25                  |
|     | aıray_0_3_0              | array_1_3_0  | array_2_3_0  | array_3_3_0  | row_gap_0_3_0               |
| _0  | _130 <u>an</u> 0ay_0_2_0 | array_1_2_0  | array_2_2_0  | array_3_2_0  | row_gap_705¥_0260_13        |
|     | array_0_1_0              | array_1_1_0  | array_2_1_0  | array_3_1_0  | row_gap_0_1_0               |
| V_  | 0_2_0 array_0_0_0        | array_1_0_0  | array_2_0_0  | array_3_0_0  | row_gap_0_0_0<br>TSV_260_2  |
|     | TSV_0_0_0                | TSV_65_0_0   |              | TSV_195      | _0_0                        |

#### (a)

| $\begin{array}{c ccccccccccccccccccccccccccccccccccc$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |      |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| $\begin{array}{c ccccccccccccccccccccccccccccccccccc$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 1    |
| array_0_6_1         array_1_6_1         array_2_6_1         array_3_6_1         row_0_6_T           iv_0_386_1         array_0_5_1         array_1_5_1         array_2_5_1         array_3_5_1         row_0_5_T           array_0_4_1         array_1_4_1         array_2_4_1         array_3_4_1         row_0_4_T           array_0_5_1         array_1_4_1         array_2_4_1         array_3_4_1         row_0_4_T           array_0_5_1         array_1_4_1         array_2_4_1         array_3_4_1         row_0_4_T           array_0_5_1         array_1_4_1         array_2_4_1         array_3_4_1         row_0_4_T           v_0_1258_1         column_1_0_1         column_2_0_1         column_3_0_1         TS           array_0_3_1         array_1_3_1         array_2_3_1         array_3_3_1         row_0_3_T           v_0_130anday_0_2_1         array_1_2_1         array_2_1_1         array_3_1_1         row_0_125_T           array_0_1_1         array_1_1_1         array_2_1_1         array_3_1_1         row_0_1_1 |      |
| $ \begin{array}{c ccccccccccccccccccccccccccccccccccc$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 1    |
| array_0_5_1         array_1_5_1         array_2_5_1         array_3_5_1         row_0_5_1           array_0_4_1         array_1_4_1         array_2_4_1         array_3_4_1         row_0_4_1           array_0_258_1         column_0_0_1         column_1_0_1         column_2_0_1         column_3_0_1           array_0_3_1         array_1_3_1         array_2_3_1         array_3_3_1         row_0_3_1           array_0_1_1         array_1_2_1         array_2_2_1         array_3_2_1         row_0_125           array_0_1_1         array_1_0_1         array_2_1_1         array_3_0_1         row_0_125                                                                                                                                                                                                                                                                                                                                                                                                                                | V_26 |
| array_0_4_1       array_1_4_1       array_2_4_1       array_3_4_1       row_0_4_1         iv_0_258_1       column_0_0_1       column_1_0_1       column_2_0_1       column_3_0_1         array_0_3_1       array_1_3_1       array_2_3_1       array_3_3_1       row_0_3_1         array_0_11       array_1_2_1       array_2_1       array_3_2_1       row_0_12_1         array_0_1_1       array_1_1_1       array_2_1_1       array_3_1_1       row_0_12_1         array_0_1_1       array_1_0_1       array_2_0_1       array_3_0_1       row_0_0_1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 1    |
| IV_0_258_1       column_0_0_1       column_1_0_1       column_2_0_1       column_3_0_1         array_0_3_1       array_1_3_1       array_2_3_1       array_3_3_1       row_0_3_1         IV_0_130mmay_0_2_1       array_1_2_1       array_2_2_1       array_3_2_1       row_0_25         array_0_1_1       array_1_1_1       array_2_1_1       array_3_1_1       row_0_1_2         array_0_1_1       array_1_0_1       array_2_0_1       array_3_0_1       row_0_1_2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 1    |
| column_0_0_1         column_1_0_1         column_2_0_1         column_3_0_1           array_0_3_1         array_1_3_1         array_2_3_1         array_3_1         row_0_3_1           iV_0_130mtay_0_2_1         array_1_2_1         array_2_2_1         array_3_2_1         row_0_25           array_0_1_1         array_1_1         array_2_1_1         array_3_1_1         row_0_15           array_0_1_1         array_1_0_1         array_2_1_1         array_3_0_1         row_0_15                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | V_26 |
| array_0_3_1       array_1_3_1       array_2_3_1       array_3_3_1       row_0_3_1         iV_0_130mtay_0_2_1       array_1_2_1       array_2_2_1       array_3_2_1       row_0_25         array_0_1_1       array_1_1_1       array_2_1_1       array_3_1_1       row_0_15         array_0_1_1       array_1_1_1       array_2_1_1       array_3_1_1       row_0_15         array_0_1_1       array_1_1_1       array_2_1_1       array_3_1_1       row_0_15                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |      |
| IV_0_130amtay_0_2_1       array_1_2_1       array_2_2_1       array_3_2_1       row_0_125         array_0_1_1       array_1_1_1       array_2_1_1       array_3_0_1       row_0_15         array_0_1_1       array_1_0_1       array_2_0_1       array_3_0_1       row_0_15                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 1    |
| V_0_130mtay_0_2_1       array_1_2_1       array_2_2_1       array_3_2_1       row_0_05         array_0_1_1       array_1_1_1       array_2_1_1       array_3_1_1       row_0_1         array_0_1_1       array_1_0_1       array_2_1_1       array_3_0_1       row_0_0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |      |
| IV_0_130amay_0_2_1       array_1_2_1       array_2_2_1       array_3_2_1       row_0_1_5         array_0_1_1       array_1_1_1       array_2_1_1       array_3_1_1       row_0_1_5         array_0_1_1       array_1_1_1       array_2_1_1       array_3_1_1       row_0_1_5         array_0_1_1       array_1_1_1       array_2_1_1       array_3_1_1       row_0_1_5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |      |
| array_0_1_1         array_1_1_1         array_2_1_1         array_3_1_1         row_0_1_           array_0_1_1         array_1_0_1         array_2_0_1         array_3_0_1         row_0_0_1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | ¥_26 |
| $\begin{array}{ c c c c c c c c c c c c c c c c c c c$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 1    |
| array 0.01 array 1.01 array 2.01 array 3.01 row 0.0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 1    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 1    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | sv_2 |

#### (b)

## Figure 5.3 Floorplans of YY system. (a) Tier 0 (b) Tier 1, closest to $V_{\text{DD}}$

A little more informative about voltage drop distribution are the images in Figure 5.4 where the ground networks for the bottom tier are represented as colored maps. For all three

cases a high voltage drop spot starts from the middle of the periphery and extends throughout the rest of the sub-arrays while at the same time the two upper sub-arrays exhibit worse behavior comparing to their bottom counterparts. This effect is due to the following reasons: First of all, the memories are operating which means that all column circuits are active on the tier, thus drawing current in relatively large amounts. The point where the column circuit power grid meets the outer power ring is exactly the same spot with the observed maximum voltage drop. It is because of the aggregated current that this drop manifests itself. Contributing to this effect is the fact that, for these particular systems, TSVs are scarcely distributed, therefore larger amounts of current pass through them. Additionally the written word is located on the top part of the memory, which means that current is also drawn by the active cels in that area, subsequently leading to a difference between the top and bottom parts.

Regarding the absolute values of voltage drop, the following can be observed: 3DWL has a reduced maximum comparing to STACK, about 50% less. This is excused by the partitioning of the power demanding circuits in word sharing topologies, when in STACK all the active circuits are located on the same tier, leading to increased *IR*-Drop on the power TSVs. Although YY should be also benefited by this effect, the fact that bigger arrays are used together with the less accessible TSVs actually result in performance close to that of STACK topology. Later it will be shown that for bigger systems, with more TSVs, this effect is reversed.





(c)

Figure 5.4 IR-Drop on ground power grids. (a) STACK (b) 3DWL (c) YY

A final set of figures is that of 5.5, 5.6 and 5.7 which illustrate the total experienced *IR*-Drop by the cells, meaning that they combine both  $V_{DD}$  and ground values. Most of the previous comments also apply on these ones with the exception of tier 1 of the STACK topology in Figure 5.5(b). Due to the partitioning, this layer stays idle while the one below it operates. This gives a unique opportunity to examine intra-tier voltage drop for a non-operating memory. Appart from a distinguishable offset of approximately 1.5 mV between top and bottom sub-arrays, which is fully explained by the previous notes on operating layers, the tier seems to exhibit almost negligible drops. Unfortunately though this fact cannot be generalised, since it is directly related to the utilized process node. It is reminded that this work uses a 90nm process, one of the last which has a adequate  $I_{ON}$  to  $I_{OFF}$  ratio. For more recent technologies it is possible that this idle voltage drop can become more important to the reliability of the circuit, especially if the operating voltage is also reduced.





(b)

Figure 5.5 Total IR-Drop for a STACK system. (a) Tier 0 (b) Tier 1, closest to  $V_{DD}$ 



(b)

X Coordinate Figure 5.6 Total IR-Drop for a 3DWL system. (a) Tier 0 (b) Tier 1, closest to  $V_{DD}$ 



(a)



(b)

Figure 5.7 Total IR-Drop for a YY system. (a) Tier 0 (b) Tier 1, closest to  $V_{DD}$ 

### **5.3 System Exploration**

All previous results refer to small systems with no participation of a CPU layer, which can drastically alter the simulation conditions. In this section a different approach is attempted, dealing with systems with up to eight tiers and 256 KB of total memory, also including a CPU with a power density of  $25 W_{/cm^2}$  according to [20]. The memories are all supposed to operate in writing mode, dealing with a worst case scenario and the maximum power TSV pitch is 120 µm. The results for the three examined topologies are summarized in Figure 5.8, where the dashed line represents the usual margin of 10% for voltage drop.



Figure 5.8 Relative maximum IR-Drop for the explored systems

Several conclusions can be deduced from the histograms. First, an increase in system size will inadvertly lead to an increase in the maximum *IR*-Drop and in possible reliability issues. This increase can be seperated into two factors: additional circuits and aggregated TSV resistivity in the current path. The former manifests as an increase in systems with the same number of tiers but different size, *e.g.* a 4-tiered 64 KB system has approximately 1% less drop from a 4-tiered 128 KB system because there are less circuits drawing current. The latter reason though seems to dominate voltage drops, as it can be observed by the sudden change between a 4-tiered and an 8-tiered system of the same total memory size. This bahevior is explained by the fact that power TSVs are the actual current bottlenecks of the system. The total current of one tier crowds through them to reach the next layer and finally

the package connections. Consequently, stacking more TSVs in the path of such a large current definitely raises the voltage drop in these nodes.

On the actual numbers, all three topologies seem to perform in a similar fashion, with differences between them staying below 1%. Of all the systems only the largest one loses the margin of 10%, but if the conditions are more strict, 5% in some cases, half of the systems will face reliable operation issues. In order to alleviate this phenomenon, two strategies are investigated, increasing the size or density of the TSVs. In general both techniques offer better results, as presented by the new lines in Figure 5.9, which report on a total TSV area increase of two times. The difference between the two lies in the total area overhead and manufacturability. In general, increasing the density of the TSVs offers a smoother distribution of the currents, whereas simply increasing the TSV diameter decreases its resistivity but does not deal with current crowding around it. Moreover, integration technologies usually support one diameter of TSV for the whole IC, meaning that if power TSVs become larger so do signal TSVs too.



Figure 5.9 Improved voltage drop metrics through TSV density doubling

Another interesting result is based on Figure 5.10. This plot captures voltage difference between the nodes of the TSVs in an 8-tiered system, for all three topologies. Apparently, word-line sharing and auxilary circuit partitioning has another posistive effect on the system. Since the active circuits are partitioned equally between tiers, current loads also have a smoother distribution. On the other hand, STACK systems may include tiers which are completely active or inactive. All the above translate into the linear behavior exhibited by 3DWL and YY, whereas STACK topologies seem to strain a number of TSVs with additional current which can reach 50%. Electro-migration effects are enhanced by



excessive currents and the Mean Time To Failure metric (MTTF) decreases. Conlusively, depending on the case, smart partitioning may actually increase the lifetime of the IC.

Figure 5.10 Voltage differences on TSVs of 8-tiered systems

A final scenario is tested on a 4-tiered, 3DWL memory system, where the bottom tier is operating in write mode and all others are in sleep mode with  $V_{DD_SLEEP}$  equal to half the nominal of 1.2V. The target of this study is capturing the effect of *IR*-Drop on reliable retention of the data in the memory. For that purpose the Read Static Noise Margin (RSNM) [21] of the used SRAM cell is related to its power supply value. Eventually a degradation of 9% is reported for the RSNM due to static *IR*-Drop. It should be noted that the characterized SRAM circuits do not fully support low power operation, hence this result may not be completely accurate, but surely proves the capabilities of the tool.

### **5.4 Comments on Thermal Effects**

In general this tool takes advantage of Hotspot [22] in order to capture the effect of Joule heating on the power grid, meaning that it can also report on the temperature of each block. Not surprisingly, throughout the multitude of performed simulations temperatures tend to remain in the same levels inside the stack of silicon dies and no vertical gradient exceeding 6 °C is reported. A solid reason justifying this observation is that all implicated

circuits, although extensively modeled for power generation, do not produce excessive heat. The only part with enough heat dissipation to differentiate things is the CPU part, which remains the same. This could change in a future version of the tool, where larger systems will be simulated.

Regarding intra-tier temperature distribution, areas close to the TSVs show a small increase in temperature as expected, because of the increased heat conductivity of copper. Also operating areas are characterized by an increase in temperature, originating form the high power consumption. An example of an operating tier is presented in Figure 5.11, where the horizontal gradient does not surpass 3%.



Normalized Temperature Distribution in a Tier

Figure 5.11 Temperature map of an operating tier

## 6. Conclusion

As it was made obvious in Design For 3D (D43D) 2012 conference in Lausanne, semiconductors industry is expecting to prolong the established dominance of silicon through 3D integration, before moving to More-Than-Moore devices. At the same time though the lack of tools targeting 3D design is being mentioned repeatedly, in spite of the many academic and industry teams revealing prototypes of what is to come. The presented tool aims to cover a small part of this gap, by offering designers a chance to test PDN reliability of their design in early stages, but with as much accuracy as possible.

This thesis, apart from giving specific details on the creation of the tool, also introduces memory-on-processor systems which are partitioned in a way that takes advantage of 3D integration. Trends of voltage drops across those systems are illustrated through detailed voltage distribution maps for multi-tiered topologies. Additionally an exploration of bigger systems, including a CPU tier, is performed, offering valuable results and advices on how to improve the reliability of the circuits.

Although the tool has reached a satisfactory point of development, being flexible and accurate enough to support all the presented results, it provides great room for improvement. First it is essential that more advanced memory circuits are to be characterized, in a technological node of less than 45nm, so that leakage currents can have a meaningful participation in *IR*-Drop. Moreover, auxiliary circuits with no dependencies should be utilized if possible, which would increase the flexibility of the created systems. Another idea for future extension is the introduction of an interposer in the system, allowing the creation of hybrid 2.5D - 3D systems. To end with, by putting some effort on the thermal simulation part larger systems should be able to be explored, leading to even more useful results and conclusions. This addition would most probably have to be followed by changes in the *IR*-Drop simulator, so that dynamic simulations and hierarchically built netlists are supported.

## References

[1] The International Technology Roadmap for Semiconductors 2001: Interconnect,[Online], Available at <u>www.itrs.net</u>.

[2] B. Yu, K. Yuan, B. Zhang, D.Ding *et al.* "Layout Decomposition for Triple Patterning Lithography," *Proceedings of the International Conference on Computer-Aided Design*, pp. 1-8, 2011.

[3] G. E. Moore "Cramming more components onto integrated circuits," *Electronics*, Volume 38, Number 8, April 1965.

[4] D. Frank, R. Dennard, E. Nowak, P. Solomon *et al.*, "Device Scaling Limits of Si MOSFETs and Their Application Dependencies," *Proceedings of the IEEE*, Vol 89, No. 3, March 2001.

[5] V. F. Pavlidis and E. G. Friedman, *Three-Dimensional Integrated Circuit Design*, Morgan Kaufmann Publishers, 2009.

[6] D. Kim, K. Athikulwongse, M. Healy, M. Hossain *et al.*, "3D-MAPS: 3D Massively Parallel Processor with Stacked Memory," *Proceedings of the IEEE International Solid-State Circuits Conference*, 2012.

[7] J. Kim, C. Oh, H. Lee, D. Lee *et al.* "A 1.2 V 12.8 GB/s 2 Gb Mobile Wide-I/O DRAM With 4 128 I/Os Using TSV Based Stacking," *IEEE Journal of Solid-State Circuits*, Vol. 47, No. 1, pp. 107-116, January 2012.

[8] Y.-F. Tsai *et al.*, "Design Space Exploration for 3-D Cache," *IEEE Transactions on Very Large Scale Integration (VLSI)Systems*, Vol. 16, No. 4, pp. 444-455, April 2008.

[9] A. Todri *et al.*, "A Study of Tapered 3-D TSVs for Power and Thermal Integrity," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, in press.

[10] J. Xie and M. Swaminathan "Fast Electrical-Thermal Co-simulation using Multigrid Method for 3D Integration," *Proceedings of IEEE Electronic Components and Technology Conference (ECTC)*, pp. 651-657, June 2012.

[11] M.-F. Chang *et al.*, "Challenges and Trends in Low-Power 3-D Die-Stacked IC Designs Using RAM, Memristor Logic, and Resistive Memory (ReRAM)," *Proceedings of IEEE International Conference on ASIC*, pp. 299-302, October 2011.

[12] S. Wilton and N. Jouppi, "CACTI: An Enhanced Cache Access and Cycle Time Model," *IEEE Journal of Solid-State Circuits*, Vol.31, No. 5, pp. 677-688, May 1996.

[13] X. Zhao, M. Scheuermann, and S. K. Lim, "Analysis of DC Current Crowding in Through-Silicon-Vias and its Impact on Power Integrity in 3-D ICs," *Proceedings of ACM/EDAC/IEEE Design Automation Conference*, pp. 157-162, June 2012.

[14] K. Zhang, U. Bhattacharya, Z. Chen, F. Hamzaoglu *et al.*, "SRAM Design on 65-nm CMOS Technology With Dynamic Sleep Transistor for Leakage Reduction," *IEEE Journal of Solid-State Circuits*, Vol. 40, No. 4, pp. 895-901, April 2005.

[15] A. Pavlov, CMOS SRAM Circuit Design and Parametric Test in Nano-Scaled Technologies: Process-Aware SRAM Design and Test, Springer, 2008.

[16] O. Semenov, A. Vassighi, M. Sachdev, "Impact of technology scaling on thermal behavior of leakage current in sub-quarter micron MOSFETs: perspective of low temperature current testing," *Microelectronics Journal*, Vol. 33, pp. 985–994, 2002.

[17] I. Savidis and E. G. Friedman, "Electrical Modeling and Characterization of 3-D Vias," *Proceedings of IEEE International Symposium on Circuits and Systems*, pp. 784-787, May 2008.

[18] S. Nassif, "Power Grid Analysis Benchmarks," *Proceedings of the 2008 Asia and South Pacific Design Automation Conference*, pp. 376-381, 2008.

[19] Y. Zhong, M. Wong, "Fast algorithms for IR drop analysis in large power grid," *Proceedings of IEEE/ACM International Conference on Computer-Aided Design*, pp. 351-357, 2005.

[20] G. Taylor, "Energy Efficient Circuit Design and the Future of Power Delivery", [Online], Available: cseweb.ucsd.edu/classes/wi10/cse241a/slides/Energy.pdf.

[21] Z. Guo, A. Carlson, L. Pang, K. Duong *et al.*, "Large-Scale SRAM Variability Characterization in 45 nm CMOS," *IEEE Journal of Solid-State Circuits*, Vol. 44, Is. 11, pp. 3174-3192, November 2009.

[22] K. Skadron *et al.*, "Temperature-Aware Microarchitecture," *Proceedings of Annual International Symposium on Computer Architecture*, pp 2-13, June 2003.

# Appendix

#### Options File, "options.pl":

```
#### GENERAL OPTIONS
our $TOTAL SIZE = 128; # (KB)
our $WORD LENGTH = 32; # (bits)
our $MUX FACTOR = 16; # [4,8,16]
####
#
# Topology options :
# "default" --> normal SRAM stacking
# "3dwl"
           --> word line sharing
# "3dbl"
           --> bit-line sharing
# "xx"
          --> xx' flipping
           --> yy' flipping
# "уу"
#
####
our $TOPOLOGY = "3dwl";
our \ = 64;
our SCELLY = 64;
our $NUM DIES = 8;
#### Num. of power TSVs per tier
our @TSVS = (16,16,16,16,16,16,16,16);
#### Initial Temperature (C)
our $TEMPERATURE = "80";
#### Initial Voltage (V)
our \$VOLTAGE = "1.20";
#### Voltage Drop Iterations
our $IRDROP ITERATIONS = 1;
#### Thermal Iterations
our $THERMAL_ITERATIONS = 1;
#### CPU Total Current (Ampere)
our $CPU Current = 0.2;
#### CPU Power Density (W/m^2)
our \CPU PD = 25e4;
#### Each row circuit drives so many word lines, row multiplexing
our $Row2WLRatio = 4;
#### Resistances extracted for 25 C (Ohms)
our $RingRes = 0.001;
our $RowCircRes = 0.01;
our $ColCircRes = 0.01;
our $ArrVDDUpMetRes = 0.433;
our $ArrVDDViaRes = 17.7616;
our $ArrGNDUpMetRes = 0.615;
our $ArrGNDUpViaRes = 1.3;
our $ArrGNDMidMetRes = 0.433;
our $ArrGNDMidViaRes = 15.2024;
```

```
#### Coefficients for resistance recalculation
our $alpha = 0.0039; #Copper
our \$Tref = 25;
#### Hotspot GRID MODE Resolution
our $HOTSPOT_resolution = 512;
#### Current Unit for Sources
our $unit = "n";
#### Circuit Dimensions (meters)
our $CellWidth = 0.00000184;
our $CellHeight = 0.000001;
our $RowWidth = 60 * $CellWidth;
our $RowHeight = $Row2WLRatio * $CellHeight;
our $ColumnWidth = $MUX_FACTOR * $CellWidth;
our ColumnHeight = 50 \times CellHeight;
#### TSV info (meters)
our $TSVDiameter = 0.000001;
our $TSVPitch = 0.00001;
our $TSVLength = 10 * $TSVDiameter;
#### TSV Resistance for copper (Ohms)
our $TSVRes = 1.68e-08 * $TSVLength / (3.1416 * $TSVDiameter *
$TSVDiameter / 4);
```

```
#### Package Connection Resistance (Ohms)
our $PKGRes = 0.05;
```

#### Wrapper File, "main.pl":

```
#use warnings;
require 'create maps.pl';
require 'create netlist.pl';
require 'update voltages.pl';
require 'round maps.pl';
require 'stats.pl';
require 'options.pl';
require 'create_lcf.pl';
require 'update_temperatures.pl';
our $ir iteration = 0;
our $temp iteration = 0;
my $start time = time();
my $netlisting time = 0;
my $ir time = 0;
my hotspot time = 0;
print "Creating Maps\n";
create maps();
for ($temp iteration = 0; $temp iteration <= $THERMAL ITERATIONS;
$temp iteration++)
{
  print "Thermal Iteration : $temp iteration\n";
  for ($ir_iteration = 0; $ir_iteration <=</pre>
$IRDROP_ITERATIONS;$ir_iteration++)
  {
    print "\tVoltage Drop Iteration : $ir iteration\n";
    print "\t\tQuantizing Maps\n";
    round maps();
    my $start = time();
    print "\t\tCreating Floorplans, Traces, Netlist\n";
    create netlist();
    $netlisting time += int((time() - $start)/60);
    if ($ir iteration < $IRDROP ITERATIONS)
    {
      my $start = time();
      print "\t\tExecuting Voltage Drop Tool\n";
     my @args = ("tcsh", "-c", "./irdrop.exe memory.pg $NUM DIES $VOLTAGE
$TSVRes $PKGRes > irdrop.log");
      system(@args) == 0 or die "system @args failed: $?";
      $ir time += int((time() - $start)/60);
    }
    print "\t\tUpdating Voltage Maps\n";
    update_voltages();
  }
  print "\tCreating Layer File\n";
  create lcf();
  my $start = time();
  my $temp = $TEMPERATURE + 273;
```

```
print "\tExecuting Thermal Analysis Tool\n";
 my @args = ("tcsh","-c","./hotspot -c hotspot.config -f memory0.flp -p
memory.ptrace -grid_layer_file memory.lcf -steady_file memory.ttrace -
model_type grid -grid_rows $HOTSPOT_resolution -grid_cols
$HOTSPOT_resolution -ambient $temp > hotspot.log");
  system(@args) == 0 or die "system @args failed: $?";
  $hotspot time += int((time() - $start)/60);
 print "\tUpdating Temperature Maps\n";
 update temperatures();
}
print "Collecting stats\n";
stats();
my $finish =int((time() - $start time)/60);
print "\nTotal Execution Time : $finish min.\n";
print "-----\n";
print "Netlisting Time : $netlisting_time min.\n";
print "IR Tool Time : ir_time min.\";
print "Hotspot Time : $hotspot_time min.\n";
```

```
Example of Layer Configuration File, "memory.lcf":
```

```
#Layer 0 : Silicon
0
Y
Y
1750000
0.01
9e-06
memory1.flp
#Layer 1 : Thermal Interface Material
1
Y
Ν
4000000
0.25
1e-06
memory1.flp
#Layer 2 : Silicon
2
Y
Y
1750000
0.01
9e-06
memory0.flp
#Layer 3 : Thermal Interface Material
3
Y
Ν
4000000
0.25
1e-06
memory0.flp
#Layer 4 : Silicon
4
Y
Y
1750000
0.01
500e-6
cpu.flp
#Layer 5 : Thermal Interface Material
5
Y
Ν
4000000
0.25
20e-6
cpu.flp
```

## Example of Power Trace File, "memory.ptrace":

| array_ | 0_0_1 array_   | 0_1_1 array_       | 0_2_1 array_    | 0_3_1 array_  | 1_0_1 array_1_1_1      |
|--------|----------------|--------------------|-----------------|---------------|------------------------|
|        | array_1_2_1    | array_1_3_1        | array_0_4_1     | array_0_5_1   | array_0_6_1            |
|        | array_0_7_1    | array_1_4_1        | array_1_5_1     | array_1_6_1   | array_1_7_1            |
|        | array_2_0_1    | array_2_1_1        | array_2_2_1     | array_2_3_1   | array_3_0_1            |
|        | array_3_1_1    | array_3_2_1        | array_3_3_1     | array_2_4_1   | array_2_5_1            |
|        | array_2_6_1    | array_2_7_1        | array_3_4_1     | array_3_5_1   | array_3_6_1            |
|        | array_3_7_1    | column_0_0_1       | column_1_0_1    | column_2_0_1  | column_3_0_1 row_0_0_1 |
|        | row_0_1_1      | row_0_2_1          | row_0_3_1       | row_0_4_1     | row_0_5_1 row_0_6_1    |
|        | row_0_7_1      | TSV_0_0_1          | TSV_0_516_1     | TSV_65_0_1    | TSV_65_516_1           |
|        | TSV_130_0_1    | TSV_130_516_1      | TSV_195_0_1     | TSV_195_516_1 | TSV_0_2_1              |
|        | TSV_260_2_1    | TSV_0_130_1        | TSV_260_130_1   | TSV_0_258_1   | TSV_260_258_1          |
|        | TSV_0_386_1    | TSV_260_386_1      | array_0_0_0     | array_0_1_0   | array_0_2_0            |
|        | array_0_3_0    | array_1_0_0        | array_1_1_0     | array_1_2_0   | array_1_3_0            |
|        | array_0_4_0    | array_0_5_0        | array_0_6_0     | array_0_7_0   | array_1_4_0            |
|        | array_1_5_0    | array_1_6_0        | array_1_7_0     | array_2_0_0   | array_2_1_0            |
|        | array_2_2_0    | array_2_3_0        | array_3_0_0     | array_3_1_0   | array_3_2_0            |
|        | array_3_3_0    | array_2_4_0        | array_2_5_0     | array_2_6_0   | array_2_7_0            |
|        | array_3_4_0    | array_3_5_0        | array_3_6_0     | array_3_7_0   | column_0_0_0           |
|        | column_1_0_0   | column_2_0_0       | column_3_0_0    | row_gap_0_0_0 | row_gap_0_1_0          |
|        | row_gap_0_2_0  | row_gap_0_3_0      | row_gap_0_4_0   | row_gap_0_5_0 | row_gap_0_6_0          |
|        | row_gap_0_7_0  | TSV_0_0_0          | TSV_0_516_0     | TSV_65_0_0    | TSV_65_516_0           |
|        | TSV_130_0_0    | TSV_130_516_0      | TSV_195_0_0     | TSV_195_516_0 | TSV_0_2_0              |
|        | TSV_260_2_0    | TSV_0_130_0        | TSV_260_130_0   | TSV_0_258_0   | TSV_260_258_0          |
|        | TSV_0_386_0    | TSV_260_386_0      | CPU             |               |                        |
| 2 1050 | (200000022- 0) | 2 1050             |                 | 0 0014        | 0024000000000          |
| 3.1836 | 2 19566200000  | 0220 06            | 2 19566200000   |               | 2 1956520000023        |
|        | 0 001/003/088  | 900023 3 1856      | 2.10200232-00   | 3 1 2 5 6     | 6399999330-06          |
|        | 2 10566200000  | 000023 3.1030      | 3 1956630000    | 0330-06       | 2 195663000000330-06   |
|        | 3 18566399999  | 9330-06            | 3 18566399999   | 9330-06       | 3 185663999999332-06   |
|        | 3 18566399999  | 9330-06            | 3 18566399999   | 9330-06       | 3 18566399999933e-06   |
|        | 0 00149934988  | 800023 3 1856      | 63999999933e-06 | 5 3 1856      | 6399999933e=06         |
|        | 3 18566399999  | 933e-06            | 0 00149934988   | 800023 3 1856 | 6399999933e-06         |
|        | 3 18566399999  | 933e-06            | 3 18566399999   | 1933e-06      | 3 18566399999933e-06   |
|        | 3.18566399999  | 933e-06            | 3.18566399999   | 933e-06       | 3.18566399999933e-06   |
|        | 3.18566399999  | 933e-06            | 3.18566399999   | 933e-06       | 0.0014688 0.0014688    |
|        | 0.0014688      | 0.0014688          | 2.05584e-06     | 2.05584e-06   | 0.001435927352.05584e- |
| 06     | 2.05584e-06    | 2.05584e-06        | 2.05584e-06     | 2.05584e-06   | 0.00154085861887501    |
|        | 0.00154085861  | 887501 0.0015      | 4085861887501   | 0.00154085861 | 887501                 |
|        | 0.00154085861  | 887501 0.0015      | 4085861887501   | 0.00154085861 | 887501                 |
|        | 0.00154085861  | 887501 0.0015      | 4085861887501   | 0.00154085861 | 887501                 |
|        | 0.00154085861  | 887501 0.0015      | 4085861887501   | 0.00154085861 | 887501                 |
|        | 0.00154085861  | 887501 0.0015      | 4085861887501   | 0.00154085861 | 887501                 |
|        | 2.93017599999  | 953e-06            | 2.93017599999   | 953e-06       | 0.00139957428199982    |
|        | 2.93017599999  | 953e-06            | 2.93017599999   | 953e-06       | 2.93017599999953e-06   |
|        | 0.00139957428  | 199982 2.9301      | 7599999953e-06  | 5 2.9301      | 7599999953e-06         |
|        | 2.93017599999  | 953e-06            | 2.93017599999   | 953e-06       | 3.18566399999933e-06   |
|        | 2.93017599999  | 953e-06            | 2.93017599999   | 953e-06       | 3.18566399999933e-06   |
|        | 3.18566399999  | 933e-06            | 3.18566399999   | 933e-06       | 2.93017599999953e-06   |
|        | 0.00139957428  | 199982 2.9301      | 7599999953e-06  | 5 2.9301      | 7599999953e-06         |
|        | 2.93017599999  | 953e-06            | 0.00139957428   | 199982 2.9301 | 7599999953e-06         |
|        | 2.93017599999  | 953e-06            | 3.18566399999   | 933e-06       | 3.18566399999933e-06   |
|        | 3.18566399999  | 933e-06            | 2.93017599999   | 953e-06       | 2.93017599999953e-06   |
|        | 3.18566399999  | 933e-06            | 3.18566399999   | 933e-06       | 0.001389745            |
|        | 0.001389745    | U.UU1389745        | 0.001389745     | U U           | U U U O O              |
|        | U U            | 0.000/0260134      | 5499954         | 0.000/0260134 | 5499954                |
|        | 0.000/0260134  | 5499954            | 0.000/0260134   | 5499954       | 0.000702601345499954   |
|        | 0.00070260134  | 5499954            | 0.00070260134   | 5499954       | 0.000702601245499954   |
|        | 0.00070260134  | 5499954            | 0.00070260134   | 5499954       | 0.000702601245499954   |
|        | 0.00070260134  | 5499934<br>5400057 | 0.00070260134   | 5499934       | 0.000/020013434999934  |
|        | 0.000/0200134  | 5499904            | 0.000/0200134   | 5499904       | 0.00220304             |

#### Example of Temperature Trace File, "memory.ttrace":

layer 0 array 0 0 1 336.86 layer\_0\_array\_0\_1\_1 336.84 layer\_0\_array\_0\_2\_1 336.95 layer\_0\_array\_0\_3\_1 336.86 layer\_0\_array\_1\_0\_1 336.86 layer\_0\_array\_1\_1\_1 336.82 layer\_0\_array\_1\_2\_1 336.91 layer\_0\_array\_1\_3\_1 336.83 layer\_0\_array\_0\_4\_1 336.81 layer\_0\_array\_0\_5\_1 336.77 layer 0 array 0 6 1 336.76 layer\_0\_array\_0\_7\_1 336.77 layer\_0\_array\_1\_4\_1 336.78 layer\_0\_array\_1\_5\_1 336.74 layer\_0\_array\_1\_6\_1 336.73 layer\_0\_array\_1\_7\_1 336.80 layer\_0\_array\_2\_0\_1 336.80 layer\_0\_array\_2\_1\_1 336.81 layer\_0\_array\_2\_2\_1 336.91 layer\_0\_array\_2\_3\_1 336.83 layer\_0\_array\_3\_0\_1 336.82 layer\_0\_array\_3\_1\_1 336.83 layer\_0\_array\_3\_2\_1 336.94 layer\_0\_array\_3\_2\_1 336.94 layer\_0\_array\_3\_3\_1 336.85 layer\_0\_array\_2\_4\_1 336.78 layer\_0\_array\_2\_5\_1 336.74 layer\_0\_array\_2\_6\_1 336.73 layer 0 array 2 7 1 336.74 layer\_0\_array\_3\_4\_1 336.81 layer\_0\_array\_3\_5\_1 336.76 layer\_0\_array\_3\_6\_1 336.75 layer\_0\_array\_3\_7\_1 336.73 layer 0 column 0 0 1 336.95 layer\_0\_column\_1\_0\_1 336.91 layer\_0\_column\_2\_0\_1 layer\_0\_column\_3\_0\_1 layer\_0\_row\_0\_0\_1\_3 layer\_0\_row\_0\_1\_1\_3 336.90 336.94 336.83 336.80 layer\_0\_row\_0\_2\_1 336.87 layer\_0\_row\_0\_3\_1 layer\_0\_row\_0\_4\_1 layer\_0\_row\_0\_5\_1 layer\_0\_row\_0\_6\_1 336.79 336.75 336.73 336.73 layer 0 row 0 7 1 336.77 layer\_0\_TSV\_0\_0\_1 337.56 layer\_0\_TSV\_0\_516\_1\_337.41 layer\_0\_TSV\_65\_0\_1\_337.62 layer\_0\_TSV\_65\_516\_1 337.57 layer 0 TSV 130 0 1 337.65 layer\_0\_TSV\_130\_516\_1 337.59 layer\_0\_TSV\_195\_0\_1\_337.44 layer\_0\_TSV\_195\_516\_1 layer\_0\_TSV\_0\_2\_1\_337.65 layer\_0\_TSV\_260\_2\_1\_337.57 337.38 layer 0 TSV 0 130 1 337.61 layer\_0\_TSV\_260\_130\_1 337.60 layer\_0\_TSV\_0\_258\_1\_337.77 layer\_0\_TSV\_0\_258\_1 layer\_0\_TSV\_260\_258\_1 layer\_0\_TSV\_0\_386\_1\_337.61 337.77 layer\_0\_TSV\_260\_386\_1 337.60 layer\_1\_array\_0\_0\_1\_336.85 layer\_1\_array\_0\_1\_1 336.83 layer\_1\_array\_0\_2\_1 336.93 layer\_1\_array\_0\_3\_1 336.85 layer\_1\_array\_1\_0\_1 336.85

layer\_1\_array\_1\_1\_1 336.81 layer\_1\_array\_1\_2\_1 336.90 layer\_1\_array\_1\_3\_1 336.83 layer\_1\_array\_0\_4\_1 336.81 layer\_1\_array\_0\_5\_1 336.77 layer 1 array 0 6 1 336.76 layer\_1\_array\_0\_7\_1 336.77 layer\_1\_array\_0\_7\_1 336.77 layer\_1\_array\_1\_4\_1 336.78 layer\_1\_array\_1\_5\_1 336.74 layer\_1\_array\_1\_6\_1 336.73 layer\_1\_array\_1\_7\_1 336.79 layer\_1\_array\_2\_0\_1 336.79 layer\_1\_array\_2\_1\_1 336.81 layer\_1\_array\_2\_1\_1 336.81 layer\_1\_array\_2\_2\_1 336.90 layer\_1\_array\_2\_3\_1 336.82 layer\_1\_array\_3\_0\_1 336.82 layer\_1\_array\_3\_1\_1 336.82 layer\_1\_array\_3\_2\_1 336.93 layer\_1\_array\_3\_3\_1 336.85 layer\_1\_array\_2\_4\_1 336.78 layer\_1\_array\_2\_5\_1 336.74 layer 1 array 2 6 1 336.73 layer\_1\_array\_2\_7\_1 336.74 layer\_1\_array\_3\_4\_1 336.80 layer\_1\_array\_3\_5\_1 336.76 layer\_1\_array\_3\_6\_1 336.75 layer\_1\_array\_3\_6\_1 336.73 layer\_1\_column\_0\_0\_1 336.93 layer\_1\_column\_1\_0\_1 336.89 layer\_1\_column\_2\_0\_1 336.89 layer\_1\_column\_3\_0 layer\_1\_row\_0\_0\_1 layer\_1\_row\_0\_1\_1 1 336.93 336.82 336.80 layer\_1\_row\_0\_2\_1 336.86 layer\_1\_row\_0\_3\_1 336.79 layer\_1\_row\_0\_4\_1 layer\_1\_row\_0\_5\_1 layer\_1\_row\_0\_6\_1 336.75 336.73 336.73 layer\_1\_row\_0\_7\_1 336.77 layer\_1\_TSV\_0\_0\_1 337.33 layer\_1\_TSV\_0\_516\_1 337.19 layer\_1\_TSV\_65\_0\_1 337.38 layer\_1\_TSV\_65\_516\_1 layer\_1\_TSV\_130\_0\_1 337.39 337.33 layer\_1\_TSV\_130\_516\_1 337.34 layer\_1\_TSV\_195\_0\_1\_337.23 layer\_1\_TSV\_195\_516\_1 layer\_1\_TSV\_0\_2\_1 337.42 layer\_1\_TSV\_260\_2\_1 337.35 337.17 layer 1 TSV 0 130 1 337.40 layer\_1\_TSV\_260\_130 1 337.39 layer\_1\_TSV\_0\_258\_1\_337.51 layer\_1\_TSV\_260\_258\_1 layer\_1\_TSV\_0\_386\_1\_337.36 337.50 layer\_1\_TSV\_260\_386\_1 337.35 layer\_2\_array\_0\_0\_0 336.83 layer\_2\_array\_0\_1\_0 336.82 layer\_2\_array\_0\_2\_0\_336.90 layer\_2\_array\_0\_3\_0\_336.83 layer\_2\_array\_1\_0\_0\_336.83 layer\_2\_array\_1\_1\_0 336.80 layer\_2\_array\_1\_2\_0\_336.87 layer\_2\_array\_1\_3\_0\_336.81 layer\_2\_array\_0\_4\_0\_336.79 layer\_2\_array\_0\_5\_0\_336.76 layer 2 array 0 6 0 336.75 layer\_2\_array\_0\_7\_0 336.76 layer\_2\_array\_1\_4\_0 336.77

layer 2 array 1 5 0 336.74 layer\_2\_array\_1\_6\_0 336.73 layer\_2\_array\_1\_7\_0 336.77 layer\_2\_array\_2\_0\_0 336.79 layer\_2\_array\_2\_1\_0 336.80 layer\_2\_array\_2\_2\_0 336.87 layer 2 array 2 3 0 336.81 layer\_2\_array\_3\_0\_0 336.81 layer\_2\_array\_3\_1\_0 336.81 layer\_2\_array\_3\_2\_0 336.89 layer\_2\_array\_3\_2\_0 336.89 layer\_2\_array\_2\_4\_0\_336.77 layer\_2\_array\_2\_5\_0\_336.77 layer\_2\_array\_2\_5\_0\_336.73 layer\_2\_array\_2\_6\_0\_336.73 layer\_2\_array\_2\_7\_0\_336.73 layer\_2\_array\_3\_4\_0\_336.79 layer\_2\_array\_3\_5\_0 336.75 layer\_2\_array\_3\_6\_0 336.74 layer\_2\_array\_3\_7\_0\_336.73 layer\_2\_column\_0\_0\_0 layer\_2\_column\_1\_0\_0 336.89 336.86 layer 2 column 2 0 0 336.86 layer\_2\_column\_3\_0\_0 336.89 layer\_2\_row\_gap\_0\_0\_0 336.81 layer\_2\_row\_gap\_0\_1\_0 layer\_2\_row\_gap\_0\_2\_0 layer\_2\_row\_gap\_0\_3\_0 336.79 336.82 336.78 layer\_2\_row\_gap\_0\_4\_0 336.75 layer\_2\_row\_gap\_0\_5\_0 336.73 layer\_2\_row\_gap\_0\_6\_0 layer\_2\_row\_gap\_0\_7\_0 layer\_2\_TSV\_0\_0\_0\_337.18 layer\_2\_TSV\_0\_516\_0\_337.07 336.73 336.75 layer\_2\_TSV\_65\_0\_0 337.20 layer\_2\_TSV\_65\_516\_0 337.14 layer\_2\_TSV\_130\_0\_0\_337.21 layer\_2\_TSV\_130\_516\_0 layer\_2\_TSV\_195\_0\_0\_337.10 337.15 layer\_2\_TSV\_195\_516\_0 337.05 layer\_2\_TSV\_0\_2\_0 337.23 layer\_2\_TSV\_260\_2\_0 337.18 layer\_2\_TSV\_0\_130\_0 337.22 layer\_2\_TSV\_0\_130\_0 337.22 layer\_2\_TSV\_0\_258\_0 337.29 337.21 layer 2 TSV 260 258 0 337.29 layer\_2\_TSV\_0\_386\_0\_337.17 layer\_2\_TSV\_260\_386\_0 layer\_3\_array\_0\_0\_0\_336.82 layer\_3\_array\_0\_1\_0\_336.81 337.16 layer\_3\_array\_0\_2\_0 336.87 layer\_3\_array\_0\_3\_0 336.82 layer\_3\_array\_1\_0\_0 336.82 layer\_3\_array\_1\_1\_0 336.80 layer\_3\_array\_1\_2\_0 336.85 layer\_3\_array\_1\_3\_0 336.80 layer\_3\_array\_0\_4\_0 336.79 layer\_3\_array\_0\_5\_0 336.76 layer\_3\_array\_0\_6\_0 336.75 layer\_3\_array\_0\_7\_0 336.75 layer\_3\_array\_1\_4\_0 336.77 layer\_3\_array\_1\_5\_0 336.74 layer\_3\_array\_1\_6\_0 336.73 layer\_3\_array\_1\_7\_0 336.76 layer\_3\_array\_2\_0\_0 336.78 layer\_3\_array\_2\_1\_0 336.79 layer 3 array 2 2 0 336.84 layer\_3\_array\_2\_3\_0 336.80 layer\_3\_array\_3\_0\_0 336.80

```
layer 3 array 3 1 0 336.80
layer_3_array_3_2_0_336.86
layer_3_array_3_2_0_336.81
layer_3_array_2_4_0_336.76
layer_3_array_2_5_0_336.73
layer_3_array_2_6_0_336.73
layer_3_array_2_7_0_336.73
layer_3_array_2_7_0_336.73
layer_3_array_3_4_0_336.78
layer_3_array_3_5_0_336.75
layer_3_array_3_6_0_336.74
layer_3_array_3_7_0_336.73
 layer_3_column_0_0_0
                                                        336.86
layer_3_column_1_0_0
layer_3_column_2_0_0
layer_3_column_3_0_0
layer_3_row_gap_0_0_0
                                                        336.84
                                                        336.83
                                                        336.86
                                                       336.80
layer_3_row_gap_0_1_0
                                                       336.78
 layer_3_row_gap_0_2_0
                                                        336.81
layer 3 row gap 0 2 0
layer 3 row gap 0 3 0
layer 3 row gap 0 4 0
layer 3 row gap 0 5 0
                                                        336.78
                                                        336.75
                                                        336.73
layer 3 row gap 0 6 0
                                                       336.73
layer_3_row_gap_0_7_0
layer_3_TSV_0_0_0__337.00
                                                        336.75
layer_3_TSV_0_516_0_336.91
layer_3_TSV_65_0_0_337.00
layer_3_TSV_65_516_0_336.95
 layer_3_TSV_130_0_0 337.00
layer_3_TSV_130_516_0 336.95
layer_3_TSV_195_0_0 336.95
layer_3_TSV_195_516_0 336.89
layer_3_TSV_0_2_0 337.04
layer_3_TSV_260_2_0 337.00
layer_3_TSV_0_130_0 337.03
layer_3_TSV_260_130_0 337.03
layer_3_TSV_0_258_0 337.06
layer_3_TSV_260_258_0 337.05
layer_3_TSV_0_386_0_336_97
 layer_3_TSV_0_386 0 336.97
 layer_3_TSV_260_386 0
                                                        336.96
layer_4_CPU 336.75
layer_5_CPU 334.93
hsp_CPU
                            333.10
hsink_CPU
                          333.02
 inode O
                         333.02
                          333.02
inode_1
inode_2
inode_3
                           333.02
                         333.02
inode 4
                          333.02
 inode 5
                         333.02
                         333.02
inode 6
inode_7
inode_8
inode_9
                         333.02
                            333.01
                           333.01
inode 10
                          333.01
 inode 11
                           333.01
```