<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Analog Transistor Placement Optimization Considering Nonlinear Spatial Variations</title></titleStmt>
			<publicationStmt>
				<publisher>IEEE</publisher>
				<date>03/25/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10569062</idno>
					<idno type="doi">10.23919/DATE58400.2024.10546584</idno>
					
					<author>Supriyo Maji</author><author>Sungyoung Lee</author><author>David Z Pan</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Analog circuit performance can degrade due to random and spatial variations. While random variations can be mitigated using larger-sized devices, such devices tend to have more spatial variations. To address this, a common technique involves employing symmetric layout like the common-centroid, which effectively reduces linear variations or first-order effect. However, achieving high performance in analog systems often necessitates mitigating nonlinear spatial variations, for which common-centroid layout is unsuitable. In response, this work introduces an efficient approach based on simulated annealing for transistor placement, with a particular focus on mitigating nonlinear spatial variations. Importantly, our proposed method can also handle important layout constraints, including routing complexity, layout-dependent effects, and diffusionsharing within the optimization. Experimental results show the proposed method beats state-of-the-art in all important parameters while minimizing nonlinear spatial variations. Moreover, our approach gives users better control over optimization objectives than existing methods.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>The performance of various analog circuits depends significantly on the precise matching of elements <ref type="bibr">[1]</ref>. Matching of elements can be impacted by two types of variations: random and spatial. Random variations can be reduced by increasing the area of the device <ref type="bibr">[2]</ref> <ref type="bibr">[3]</ref>. However, larger device may have more spatial variations as the distance between the devices increases <ref type="bibr">[3]</ref>. Several factors, including process variations, thermal distribution or uneven mechanical stress from other circuit layers, can contribute to spatial variations <ref type="bibr">[3]</ref>. The spatial variations can be modeled by Taylor series, where the lower order terms have greater impacts on the mismatch among devices <ref type="bibr">[3]</ref>[4] <ref type="bibr">[5]</ref> <ref type="bibr">[6]</ref>. Commoncentroid (CC) is a special layout technique which has been shown to be beneficial in reducing first-order variations <ref type="bibr">[6]</ref> <ref type="bibr">[7]</ref>[8] <ref type="bibr">[9]</ref>. However, for high performance systems, layout must be optimized considering nonlinear or higher order effects <ref type="bibr">[3]</ref>[4] <ref type="bibr">[5]</ref> <ref type="bibr">[6]</ref>[10] <ref type="bibr">[11]</ref>.</p><p>Implementing any type of symmetricity in the layout can bring various challenges. Routing becomes more complicated with CC type layout, circuit area can increase if diffusion region is not shared, layout dependent effect can affect device threshold voltage, mobility and circuit performance <ref type="bibr">[7]</ref> <ref type="bibr">[12]</ref> <ref type="bibr">[13]</ref>. Sharma et. al. <ref type="bibr">[14]</ref> <ref type="bibr">[7]</ref> have proposed custom algorithm for sym-metric transistor placement, which handles layout constraints. However, this method is limited to producing layouts of the CC type, making it inefficient in handling nonlinear variations. The method also lacks a welldefined objective in the problem formulation, hindering the ability to prioritize one constraint over another. Furthermore, the layout constraints are handled separately through post-processing steps after placement, which can lead to sub-optimal results.</p><p>Several prior studies have considered nonlinear spatial variations in device placement optimization. Lin et. al. <ref type="bibr">[4]</ref> perform global placement of capacitors by adopting a conjugate gradient method and then legalizing the placement with an integer linear programming formulation. McAndrew <ref type="bibr">[6]</ref> cancels the effect of nonlinear variations by swapping symmetrical devices. However, this approach can only manage two devices, and a dimension constraint requires the number of unit cells in each column to be a multiple of four. Vadipour <ref type="bibr">[10]</ref> has proposed a ring-like structure to compensate for quadratic error. However, the techniques reported in <ref type="bibr">[3][4]</ref>[10] <ref type="bibr">[11]</ref> may not apply to general transistor circuits, and some of the approaches <ref type="bibr">[4]</ref>[5] <ref type="bibr">[6]</ref>[10] dealing with device placement, focus on spatial variations and do not consider important layout constraints.</p><p>We present a simulated annealing based general transistor placement algorithm which can optimize nonlinear spatial variations while handling important layout constraints. Simulated annealing, a popular optimization technique <ref type="bibr">[15]</ref>, has found applications in both digital <ref type="bibr">[16]</ref>[17] and analog placement <ref type="bibr">[3]</ref>[18] <ref type="bibr">[19]</ref>. The general idea behind these approaches revolves around the random selection of sub-blocks, followed by swapping to enhance performance. However, these approaches have limitations -they are inherently customized to specific systems and are not well-suited for addressing the general transistor placement problem. This is particularly evident when considering the absence of critical layout constraints within their formulations.</p><p>We adopt a strategy similar to random selection and swap, yet our approach is more general. It can handle transistor placement of varied configurations, such as current mirror, differential input pair, load pair, so can find application across a wide spectrum of analog systems. Our approach starts with an initial placement of unit cells, subsequently fine-tuning it through a dynamic process involving random selection and swapping. At each iteration, the placement is evaluated with models, developed for this work, guiding the process. The major contributions are as follows.</p><p>&#8226; We propose a simulated annealing based general approach for analog transistor placement considering nonlinear spatial variations.</p><p>&#8226; Compared to the state of the art <ref type="bibr">[14]</ref> <ref type="bibr">[7]</ref>, where layout constraints are handled separately through post-processing steps, our approach considers the constraints within the optimization.</p><p>&#8226; Experimental results show the proposed approach beats the state-of-the-art in important parameters while minimizing nonlinear spatial variations.</p><p>&#8226; The proposed approach provides users with better control over the objectives than the state-of-the-art. The rest of the paper is organized as follows. Section II formulates the problem, Section III describes the proposed approach, Section IV presents experimental results, and Section V concludes the paper. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. Transistor Placement Problem Formulation</head><p>Spatial variations can cause mismatch issues in analog circuits. Spatial variations have linear and nonlinear components. In Fig. <ref type="figure">1(b)-(c</ref>), two different layouts of a differential pair (Fig. <ref type="figure">1(a)</ref>) are shown. While the layout in Fig. <ref type="figure">1(b</ref>) is effective when only considering the firstorder effect, it is not the best layout for minimizing nonlinear variations. To reduce nonlinear variations, as a general rule, devices must be placed with the highest possible dispersion <ref type="bibr">[6]</ref>, as shown in Fig. <ref type="figure">1(c</ref>). It is important to note that both these layouts are CC type. While it is possible to generate a CC layout that performs well for nonlinear variations involving a few units and devices, in general, as the number of units or devices increases, a non-CC layout consistently outperforms a CC layout. However, a layout that is designed to perform well for spatial variations may not align with critical layout constraints such as diffusion break, routing complexity, and layout-dependent effects. Therefore, our transistor placement formulation problem encompasses all these constraints to achieve a comprehensive and balanced optimization approach.</p><p>Based on the mismatch in spatial variations function &#119872;&#119881;(&#8226;), routing complexity function &#119877;&#119862;(&#8226;), mismatch in length of diffusion function &#119872;&#119868;&#119871;&#119863;(&#8226;), and diffusion break function &#119863;&#119861;(&#8226;), the optimization objective is defined as follows.</p><p>Here, &#119891; &#119901; is a feasible layout solution and &#120578; &#119888; , &#119888; &#8712; {&#119872;&#119881; , &#119877;&#119862;, &#119872;&#119868;&#119871;&#119863;} denotes the coefficients. A hard constraint is put on diffusion sharing as this parameter has the most impact on the quality of the layout, directly or indirectly. A layout that does not share diffusion region can lead to worse routing length and may degrade spatial variations due to increased layout area. Consequently, our proposed method can ensure diffusion-break-free layout which is not possible with other methods <ref type="bibr">[14]</ref>[20] <ref type="bibr">[8]</ref>. Importantly, <ref type="bibr">[14]</ref> is more focused on generating CC layout, constraints are handled separately through post-processing steps. We handle layout constraints directly within the optimization framework, this can lead to better results. Moreover, our formulation allows users control over the objectives through coefficients &#120578; &#119888; , which is not possible by other methods due to a fixed-type formulation that lacks a well-defined objective.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. The Proposed Approach to Transistor Placement</head><p>We first discuss the details on the funtions in Eqs. ( <ref type="formula">1</ref>) and ( <ref type="formula">2</ref>) before presenting the optimization algorithm in Section III-E to solve them.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Mismatch in Spatial Variations</head><p>Thermal-induced and technology-related spatial variations can be approximated using a Taylor series, which takes the following form <ref type="bibr">[3][4]</ref>[5] <ref type="bibr">[6]</ref>.</p><p>Here, (&#119909;, &#119910;) is the location of a unit cell and &#119862; is a constant independent of &#119909; and &#119910;. &#119866; &#119894; (&#119909;, &#119910;) is the &#119894; &#119905; &#8462; order variations component and &#119901; &#119899; (&#119909;, &#119910;) is the sum of such components. Our objective is to minimize the difference (or mismatch) in spatial variations for all the devices. We propose the following model.</p><p>Here, &#119873; is the number of devices and &#119899; &#119897; , &#119899; &#119898; is the number of units for the device &#119897; and &#119898;, respectively. &#119901; &#119906; (&#119909;, &#119910;) is the sum of first and second-order spatial variations with coefficients &#119892; 1,0 , &#119892; 0,1 , &#119892; 2,0 , &#119892; 1,1 , &#119892; 0,2 being modeled as multivariate Gaussian random variables. By iterating mismatch calculation following Eq. ( <ref type="formula">5</ref>), &#119872; times, &#119872; number of mismatch values are obtained and the standard deviation of these values represents the magnitude of difference in spatial variations over all devices, referred to as &#119872;&#119881;(&#8226;). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Routing Complexity</head><p>The routing complexity is affected by the position of the unit cells in the layout, making it essential to take routing considerations into account during the placement optimization. Sharma et al. <ref type="bibr">[14]</ref> address routing complexity, but the approach has a limitation in that it handles routing after placement step, which can lead to sub-optimal results. In our approach, routing model is integrated within the placement step.</p><p>We use rectilinear minimum spanning tree (RMST) to calculate routing complexity separately for every net in the netlist and sum the values. For illustration purpose, we use the circuit in Fig. <ref type="figure">2</ref>, where each device has 4 units. First, we construct an undirected graph in which a unit cell connected to the net is represented by a node, and rectilinear edges are used to connect every pair of nodes on the net. For example, net1 has unit cells from &#119872;2, so a total of 4 nodes and 6 edges, net3 has unit cells from &#119872;2 and &#119872;0, so a total of 8 nodes and 28 edges. We determine the RMST that connects all the units on the net with the fewest possible edges. For example, the routing complexity of net1 and net3 are 7 and 9 units, respectively. Total routing complexity is determined by summing the edge weights of the spanning trees for all the nets, which in this case is 56. A pseudocode is presented below. The time complexity of our algorithm is &#119874;(&#119899; 2 ) with &#119899; being the number of units. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Mismatch in Length of Diffusion</head><p>Among several layout dependent effects (LDEs), the most critical is length of diffusion (&#119871;&#119874;&#119863;), which captures the variations in the threshold voltage (&#119881; &#119905; &#8462; ) of a transistor due to stress effect <ref type="bibr">[12]</ref>[13] <ref type="bibr">[14]</ref>.</p><p>Here, &#119899; is the number of unit cells, &#119871; &#119892; is the gate length of the unit. &#119878;&#119860; &#119894; and &#119878;&#119861; &#119894; are the distances from polygate of the unit cell &#119894; to the diffusion/active edge on either side of the layout. Two unit cells with different &#119878;&#119860; &#119894; , &#119878;&#119861; &#119894; values will lead to the mismatch in threshold voltage shift (&#916;&#119881; &#119905; &#8462; ). To reduce variations induced by &#119871;&#119874;&#119863;, we aim to minimize the difference (i.e. mismatch) of the mean values of 1 &#119871;&#119874;&#119863; over all devices, referred to as &#119872;&#119868;&#119871;&#119863;(&#8226;). We propose the following model for &#119872;&#119868;&#119871;&#119863;(&#8226;).</p><p>&#119873; is the number of devices and &#119899; &#119894; is the number of unit cells of the device &#119894;. Hence, there are &#119873; &#119894;=1 &#119899; &#119894; number of unit cells in total. ( 1 &#119871;&#119874;&#119863; ) &#119894; is the 1 &#119871;&#119874;&#119863; value of the device &#119894;. For the comparison between different placement, it is not essential to calculate the exact value of 1 &#119871;&#119874;&#119863; , rather the following simplification of Eq. ( <ref type="formula">7</ref>) is sufficient.</p><p>(</p><p>Here, &#119909; &#119906; is the column of unit cell &#119906; and &#119908; is the number of unit cells in each row, and these values can be easily obtained with changing placements. For example, if the unit cell &#119906; is placed at the lowest left corner of 4 &#215; 6 placement grid, &#119909; &#119906; is 1 and &#119908; is 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Diffusion Break</head><p>The diffusion region can be shared between two units of the same device and between units from different devices if their drain/source regions are connected. This practice can significantly reduce layout area, thereby improving routing length and spatial variations. In cases where sharing is not possible, there is diffusion break. Our objective is to have zero diffusion break.</p><p>Below, we present pseudocode for counting the number of diffusion breaks. The algorithm iterates through each unit in the row following the layout pattern. In each iteration, we maintain two units: 'curr' and 'next,' each comprising three fields: 'name,' 'left' terminal, and 'right' terminal. The process begins by checking whether starting from a drain ('D') as the leftmost terminal would result in a diffusion-break-free solution. Following the netlist, if two consecutive units are found to be not connected, the process is repeated with the source ('S') as the leftmost terminal. If we still reach two consecutive units having no connection, it indicates the presence of a diffusion break. To resolve this, a dummy unit is inserted between the units. Now, the left terminal of the unit, next to the dummy, becomes the leftmost terminal. The process continues until the end of the row is reached. The process is repeated for every row, and the diffusion break numbers &#119899;_&#119889;&#119887;(&#119894;), &#119894; &#8712; {1, 2, ..., &#119903;&#119900;&#119908;} are summed up to determine the total number of diffusion breaks. The time complexity of our algorithm is &#119874;(&#119899;), where &#119899; is the number of units. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Algorithm 2 Diffusion Break Model</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E. Optimizing Placement</head><p>Simulated annealing (SA) is a probabilistic optimization technique based on a global search metaheuristic <ref type="bibr">[15]</ref>. SA is particularly useful when the search space is discrete. While SA typically performs well for small to medium size problems, it may have problems for large size problems, potentially getting stuck in local minima. Additionally, due to its iterative nature, SA can be time-consuming when applied to large problems. The motivation behind using SA is that the search space in our problem is not large enough that SA would be considered inefficient, and it is also not small enough that a brute force method would work. Note that there are</p><p>possible placement solutions, where &#119899; &#119894; is the number of units for the device &#119894; and &#119873; is the number of devices.</p><p>Let &#119891; : &#119982; &#8594; R be an objective function defined in the solution space &#119982;, and &#119977;(&#119904;) be the neighbor function for &#119904; &#8712; &#119982;. SA starts with an initial solution &#119904; &#8712; &#119982;. At each iteration, neighboring solution &#119904; &#8242; &#8712; &#119977;(&#119904;) of the current solution &#119904; is selected through a predefined perturbation strategy. The metropolis criterion determines the acceptance probability for the system's transition from the current solution &#119904; to a candidate counterpart &#119904; &#8242; according to the following.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#119875;(&#119904;, &#119904;</head><p>where &#119891; (&#8226;) is the objective function defined in Eq. ( <ref type="formula">1</ref>) and &#119905; &#119896; represents the temperature value at the &#119896; &#119905; &#8462; iteration. The metropolis criterion guides the system toward a state of lower cost.</p><p>1) Initial Solution: The initial step in our proposed algorithm involves determining the number of rows and columns based on the width and height of the unit cell and the total number of units from the netlist. It is understood that a square-like pattern offers the best matching performance <ref type="bibr">[14]</ref> <ref type="bibr">[11]</ref>; therefore, we strive to achieve such a pattern, even if it necessitates the inclusion of dummy devices. The initial pattern is then generated with a sequential placement of device (with units of the same device placed together) with two consecutive devices sharing either drain and source, or source and source, or source and drain in the netlist. For example, for the circuit in Fig. <ref type="figure">2</ref>, we can place units of &#119872;2, &#119872;0, &#119872;1, and &#119872;3 in the four rows, from top to bottom, respectively. This sequential arrangement ensures there is no diffusion break to begin with. We have observed that this technique of initial pattern generation yields better final results than starting from a completely random placement. In cases where we begin from a completely random placement, the algorithm may fail to converge to a diffusion-break-free solution.</p><p>2) Random Perturbation: A neighboring placement solution is produced by some predefined perturbation strategy. We follow the below rules for the perturbation.</p><p>&#8226; Swap only if the pair is from different devices.</p><p>&#8226; Swap only if the new solution does not degrade diffusion break. &#8226; Swap if there is an improvement in the objective.</p><p>&#8226; Swap a worse solution with some probability defined in Eq. <ref type="bibr">(10)</ref>. We measure the quality of a swap using models developed in prior sections. When we encounter a candidate that improves the current best objective (Eq. ( <ref type="formula">1</ref>)) without degrading diffusion break, best solution is updated. We update the current solution if a candidate improves the current objective without degrading diffusion break. We may also update the current solution with the candidate solution even if it deteriorates the objective. The likelihood of that happening decreases as the temperature decreases. Note that a solution is never accepted if it degrades diffusion break. Fig. <ref type="figure">3(a)</ref> shows the execution of the algorithm, how best, current, and candidate objectives change over the course of the iterations as temperature changes. Below, we present a pseudocode of the placement optimization algorithm. Algorithm 3 Optimize Placement Input : netlist, init_pattern, n_iteration, temp, &#120578; &#119888; , &#119888; &#8712; {&#119872;&#119881; , &#119877;&#119862;, &#119872;&#119868;&#119871;&#119863;} Output : best_pattern 1: best.obj = f(init_pattern, netlist, &#120578; &#119888; ); best.DB = &#119863;&#119861;(init_pattern, netlist) 2: curr = best 3: for i=1 to n_iteration do 4: Generate random number, x and y 5: cand = curr 6: tmp.pattern = cand.pattern 7: cand.pattern(x.row, x.col) = curr.pattern(y.row, y.col) 8: curr.pattern = tmp.pattern 9: cand.obj = f(cand_pattern, netlist, &#120578; &#119888; ); cand.DB = &#119863;&#119861;(cand_pattern, netlist) 10: if cand.obj &lt; cand.obj &#8743; cand.DB &#8804; cand.DB then best = cand 11: &#119905; &#119894; = temp/(i+1) 12: diff.obj = cand.obj -curr.obj 13: diff.DB = cand.DB -curr.DB 14: metropolis = exp(-diff.obj/&#119905; &#119894; ) 15: Generate random number, z 16: if (diff.obj &lt; cand.obj &#8743; diff.DB &#8804; 0) &#8744; (z &lt; metropolis &#8743; cand.DB &#8804; best.DB) then 17: curr = cand 18: return best_pattern</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. Experimental Results</head><p>We have implemented the proposed algorithm in Python, referred to as SA here. We contacted the authors of the paper <ref type="bibr">[14]</ref> and discussed the implementation of their algorithm, which we refer to as SOTA (State-of-the-Art). Some of the results presented for the SOTA approach are based on this discussion and others are taken directly from <ref type="bibr">[14]</ref>. Since it is difficult to obtain the exact values of the coefficients in Eq. ( <ref type="formula">6</ref>) <ref type="bibr">[3]</ref>, and the second order terms can be as significant as the first order terms <ref type="bibr">[6]</ref>, we define the five random variables in Eq. ( <ref type="formula">6</ref>), with a mu of [0 0 0 0 0], and a positive semidefinite covariance matrix of [0.9 0.8 0.7 0.6 0.5; 0.8 0.9 0.8 0.7 0.6; 0.7 0.8 0.9 0.8 0.7; 0.6 0.7 0.8 0.9 0.8; 0.5 0.6 0.7 0.8 0.9], following the multivariate statistics. Correlation of these variables is shown in Fig. <ref type="figure">3(c</ref>). &#120578; &#119888; for &#119888; &#8712; {&#119872;&#119881; , &#119877;&#119862;, &#119872;&#119868;&#119871;&#119863;}, and &#119905;&#119890;&#119898;&#119901; in the simulated annealing algorithm are set to 1, and 100, respectively. In all the tests, our algorithm converges within 1000 iterations (&#119899;_&#119894;&#119905;&#119890;&#119903; &#119886;&#119905;&#119894;&#119900;&#119899;). All experiments are conducted on a Linux environment with an Intel Core 3.3&#119866;&#119867;&#119911; CPU with 128&#119866;&#119861; memory.</p><p>To compare the proposed approach with SOTA, we use five configurations of the current mirror structure, as reported in <ref type="bibr">[14]</ref>, referred to as CM:1-5 in Table <ref type="table">I</ref>. Our algorithm outperforms SOTA. Both algorithms produce layout free of diffusion break (&#119863;&#119861;). We further validate quality of the layout using standard industry tool. We place the devices following the pattern generated by the algorithms and auto-route the circuit. The total route length ('Router' in Table <ref type="table">I</ref>) is estimated using a script provided by industry representative. Our algorithm consistently performs better than SOTA, thereby validating correctness of the routing model. It is important to note that a shorter route length should result in reduced parasitics and IR drop, an important requirement in analog design. However, standard industry tools do not factor in spatial variations, as foundry models lack sensitivity to the distance between devices. While these tools do consider &#119871;&#119874;&#119863; effect, quantifying its isolated impact is challenging due to the presence of other layout effects and random variations. LDEs are considered in post-layout simulation in <ref type="bibr">[14]</ref>, nevertheless, the improvement in circuit offset performance mainly comes from the dummies, placed around the CC structure. In Fig. <ref type="figure">4</ref>(b), we present the layout pattern of the circuit in Fig. <ref type="figure">4</ref>(a). Fig. <ref type="figure">4</ref>(c) presents histogram plot of the mismatch in spatial variations. The actual layout of this example is presented in Fig. <ref type="figure">4(d)</ref>.</p><p>We have created six more tests in Table <ref type="table">II</ref>, two from each of the three different configurations, CM (Fig. <ref type="figure">4</ref>(a)), DIPC (Fig. <ref type="figure">2</ref>), and DLPC (Fig. <ref type="figure">4</ref>(e)). The proposed algorithm outperforms SOTA. Importantly, the</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Authorized licensed use limited to: University of Texas at Austin. Downloaded on January 30,2025 at 20:21:13 UTC from IEEE Xplore. Restrictions apply.</p></note>
		</body>
		</text>
</TEI>
