 We have been discussing about the power dissipation in CMOS circuits and we also have discussed the relevance of actually designing circuits with low power and the current era when the most of the systems are handled, the power dissipation in this device or in a circuit is very relevant and to make it low power designs, we have actually looked into variety of power aware system designs. We also looked into saying that what causes the power dissipation in circuits and one we figured out there are three kinds of power, one is the dynamic power, the other is switching power or switch short circuit power and the last but not the least the leakage power. In earlier technologies it has been found that typically around maybe 75 percent power goes in dynamic, 20 percent goes in short circuit power and 5 percent in case of leakage power. But as the advancement in technology has taken over from last 10, 15 years we went from 0.25 micron process to now almost 28 nanometer process and may soon go into 16 nanometer process. Because of that the devices have become very very small both in lengths and widths and that has actually created some other problems particularly the power which is lost in the leakage power and I have last time shown you that in the 32 nanometer node or down it may be found that the standby leakage power may be larger than the dynamic power and in that case the major research should be done to actually control the leakage power. Now last time we did discuss but quickly I show you what I said last time this is the last slide I have shown you last time I started saying that leakage power is has following contributors for example the first and the foremost is the reverse leakage current of the diode of a source or drain junction with the substrate. The second we discussed is sub threshold current and I had last time said that even if the device is VGS is less than VT we are still in weak inversion and because of this state there is a current flowing still in the between source and drain even if VGS is less than VT and this we say a sub threshold current. Now that means we thought the device is off but in fact device is not fully off it is still leaking the current. The third possible we say is the oxide tunneling current this is essentially because as the scaling down of technologies have taken place the oxide thickness of the insulator in MOS itself has thinned down so much so that it is possible for carriers to tunnel through this thin oxide because the large electric field which vertically it creates. Now the fourth which is very relevant now has occurred which is called Giddle sorry fourth is of course there is a possibility that since the short channels have become very very small very very as I say nanometer technologies we are talking about and in that case the electric field at the drain end is so high because of crown carrier crowding at times sorry the current field crowding at that time that carriers can get injected across the grid and therefore it is called hot carrier injections. The fourth rather fifth is essentially occurring what we call gate induced drain leakage this is essentially the problem is happening is that as the source and drain have a stronger doping compared to substrate there is a large depletion layers both at the drain size as well as on the source side. Now these large electric field large depletion layer creates electric field but even if now let us say if your gate is having a zero bias or lower voltage then the there is an accumulation layer even if there is no inversion layer prior to inversion there is an accumulation layer which actually changes the substrate doping at the surface and because the substrate doping is higher there there is a excessive electric fields around the drain end and then the because of this there is a larger current flowing between source and the substrate through this inversion through this accumulation layer and this is called gate induced drain leakage. Griddle is very very relevant now because the as the scaled on technologies the doping are anyway increasing to adjust the threshold voltage and the finally, since the channel length are becoming extremely small the source drain depletion layer width can connect to each other without having the gate voltage and that means short circuiting the source drain even without the gate gate voltage applied we call a sponge through and it may actually create a large current because of the short channel short resistive path created between source and drain. So having told that this is a typical figure which essentially says the if you look at the I1 which is nothing but the diode leakage current same way of course will be true for this side as well. So I1 is the diode leakage current between substrate and the drain I2, I3 and I6 are essentially there are three currents one essentially is occurring simply because of threshold current is flowing. The third is essentially because there is a possibility that griddle currents sorry griddle current is of course I4, I4 is the griddle current, I6 sorry I7, I8 is essentially because of the oxide tunneling happening here and hot carrier effects and I3 essentially is the point through I6 is the point through possibilities and I3 may be because of drain induced barrier lowering which occurs because of the larger fields created here because of the source and drains. So these possible currents 8 types of current which may essentially contribute to a leakage current even if VGS is less than VT. These are the numbers which I have given and these are the things which I have already explained to you. Now obviously therefore one can see what affects the leakage the body effect change in substrate body bias affects the threshold voltage and so is the leakage current. We shall go into this little detail a few minutes later. The second issue which is occurring in the lower channel devices in particular and with the technologies as lower the 32 nanometers are even lower. We have a because there is a VDD is not scaling so much the higher the VDD we apply the source and drain depletion layer bits are very very large and because they are very large the bulk charges which creates the threshold voltage expression we shall see later. The bulk charges are already present even without VGS which means that to create an inversion channel now you will require smaller amount of threshold, smaller amount of VGS to create a inversion channel below because already part of the depletion layer is providing you the bulk charge. This that means the because of the larger VDD one sees reduction in threshold voltage and the finally of course all the currents except the tunneling part currents they have exponential depending on temperatures and therefore larger the temperature leakage currents are always higher. Please remember in most of the small short channel devices of less than 45 nanometers the major worry right now is the rise in temperatures as we already seen in case of earlier lectures the temperature may rise to as large as nuclear power reactor power or you know rocket nozzles. So, the minimum even what we call self heating experiments we did or people have done shows that the normal temperature on a chip is not 27 degree centigrade as one looks at it is essentially around 77 degree centigrade and which essentially enhances the leakage current proportionately in exponential term. And we already discussed last time to great extent we said larger the power or larger is the larger is the CMOS technology generation as we reduce there is a temperature rise in the junctions. You can see this is even for 90 nanometers we already at the very high and this is CMOS chip is increasing its temperature and because it is a normalized temperature. So, it is 4 times 5 times now already and in lower the technology it may become more than 10 times. Now, this means that if I want to dissipate this much power or rather if I want to keep temperature below 70 degree or something like this something 3 or 3 and half. So, I must remove the heat and that essentially what we called the thermal resistance of the substrate as well as that of the packaging has to be so adjusted that the temperature does not rise junction temperature does not rise high enough more than 70 degrees. Now, having told that there is a leakage problem at the particularly for sub threshold or some nano some 45 nanometer technologies. We want to know can we reduce this power leakage power by circuit technique one is of course device technique which we have discussed. Once we have a devices made chips are made there is nothing much we can do however during design and during working of the chip can we actually reduce the leakage current. So, we now go for the area which we talk about leakage current control by using circuit techniques. Now, what essentially is this word I am talking about is the fact there are number of ways sleep transistor is one method of doing reducing the leakage current leakage power the other is dual threshold voltage CMOS instead of dual it can be multi-threshold voltage CMOS as we may see later dual is a word initially we use, but then it was found that you can have a number of devices have different thresholds. Then there can be also variable threshold voltage because we can as continuously vary the threshold voltage which is also possible. Then there is a technique of body bias transistors by putting a substrate bias either reverse mostly reverse bias or some time followed bias we can actually change the threshold voltage and which essentially controls the leakage currents. They want of course the foremost way which can reduce the all kinds of power whether it is dynamic short circuit or even the leakage which is into power supply current into power supply voltage. So, obviously in leakage power to reduce power supply voltage must go down. So, if you scale it then obviously power goes down. So, is it possible in circuits to actually reduce the voltages particular voltage or particular times so that the net power reduction is possible when certainly you are in a standby mode it is not in a active mode. Then of course there is another method which has been tried successfully to many extent is transistor stacks and we will look into this multi-ratio transistor stacks as one way of reducing the leakage power. And of course if you have a technology possibilities already existing with you that you can modify the technology or rather at least during designs you need not use all transistor with the same minimum channel lengths. You can have larger length devices and we have already seen short channel effects occur only only if the device the dimension of the device is within less than say 100 nanometers or something I mean the short channel effect is very strong. Though it starts around 0.25, but the effects are very very strong when it goes below 90 nanometers. So, if you have a device which has a larger length channel lengths probably much of the problems can be solved. However, the effect of larger channel length will immediately go in the increasing of propagation delay and therefore reducing the speed. So, can we now adjust the channel lengths devices where the speed is not a criteria what we call the critical paths. So, one these are the techniques which we will use it from the circuit side to some extent from the device side, but mostly from the circuit side and we shall like to see how this leakage power can be controlled. Now, before we go to sleep transistors before we go to the more details about how it occurs, I may quickly go through the list which I talked to you. One method of reducing the leakage why is the leakage occurring from the power supply to the ground and if the devices are of normal lower threshold. Please remember normal transistors have to have lower voltage lower threshold voltage because we want to have a higher speeds. Now, for higher speed if you keep low V t the leakage is proportional in some sense inverse proportional to threshold voltage value larger the threshold smaller is the leakage we know that. So, because of that we now provide additional hardware we say there are one p channel transistor like a dynamic system we have one p channel transistor and one n channel transistor these are high area high W by L transistors which has a higher thresholds by design and they are given a signal sleep and sleep bar and we look into this specifically when we talk about sleep transistors. Basically you can think in the sleep is 0 p channel device conducts and when the sleep is 0 sleep bar is 1. So, n channel conducts and therefore, and since these are very large W by L transistors the voltage here and here is not very different voltage here and here is not very different. And therefore, a circuit when an active mode it behaves like a normal V t d V s s supplied here and here and circuit can function at high speeds. But in a standby mode when you are not operating by program one can make sleep bar 0 and sleep 1 both p channel and n channel can be cut off and during this cut off since there are higher threshold the leakages through these are very very small and therefore, the currents through this circuits will also be small and therefore, the leakage power can be minimized. Lower leakage as I said higher thresholds disadvantage of course, one can see once I said larger size devices and there is some finite drop across both p channel and n channel. Obviously, there will be smaller V d d and smaller larger V s s here which means there will be reduced voltage swings. Of course, this can be minimized, but there still will require higher penalty on the size itself because if size is larger the area is larger. And of course, that is since your power supply voltage may change to some x swing is smaller the drive current available from this actual logic will be little smaller. So, this is very popular technique however, these are its own advantages and disadvantages. We will commit to it little more detail in the later part. The other I will just first go through the slide and then talk about the theory behind. The other technique is of course, you can have transistors which you think are the critical paths that means we are slower and you want to improve the speed. We can have those transistors in the critical path as having a lower thresholds whereas, other areas where the speed was not so important anyway data has to wait for somewhere to reach those path need not run faster. And therefore, in those cases the transistor can have higher V t h and if those transistors are higher V t h please take from me in an off state they will provide lower leakages. To create this different V t h technology wise you will have to do another mask that means an extra implant step. This extra one step creates one mask plus additional process steps which in a way it is said it cost a million dollars. The third possibility as I say I will go to the theory little later, but quickly look into it. If you see clearly you have a substrate in a mask transistor substrate normally is grounded or either connected to source, but in the case here if I have a substrate bias in right now shown here say for a p substrate negative bias it can be even forward bias, but at least negative bias. Then if I apply negative bias there will be a depletion layer here initially created between drain and source to the substrate because of the applied negative bias. Now, this larger this essentially larger depletion layer at the source and drain end will require smaller gate voltage to create an inversion channel which essentially means that threshold voltage of a back bias device or body bias device will be smaller and those transistor will therefore start acting at lower threshold and therefore higher speeds. I mean the other way since forward bias will reduce the threshold and reverse bias will increase the threshold. Forward bias will improve the speed and reverse bias will actually increase the threshold and therefore please remember additional charge means larger V t lower charge in the bulk will have lower V t. So, when you have a forward bias the charges are smaller at the source and drain therefore V t goes down threshold voltage reduces and therefore one says that essentially you have a faster circuits. So, either FBB or RBB can be tried to modulate the threshold shear and because of that one can have the biggest disadvantage one sees in technology to create such thing unique could require separate wells in the CMOS they are related to in wells you may have third wells and sometimes the four wells as well. The bias circuitry will require additional area which you have to give and since you have to create a bias control circuit you need additional pins and what we see delta I of change in off current to the main main off current is proportional to e to the power one is gamma which is called double coefficient sorry it is the back body coefficient into delta VSP. So, change in this this is of course, threshold voltage KT by Q slightly confusing, but it is essentially thermal voltage KT by Q. So, depending on the VSB value I can change the off current that is the idea behind body bias transistor. So, either VSB can be positive or negative and depends on the way bias and therefore I can change the ratio of I of change in I of at my will. The third possibility of reducing the leakage current is supply voltage scaling has two fold advantage which we always know dynamic power goes as V d d square. So, there is no question of thinking that if power supply voltage is reduced the dynamic power is going to be reduced because it is follow square law. Now, we look at the leakage power it is nothing, but V d d into leak power I current I leak and if I can reduce I leak as I just did I find there is another if we have is we reduce the V d d then one says that the double coefficient we know double or drain induced barrier lowering occurs because of large V d s available to you. If your V d s is smaller because of V d d is smaller. So, obviously, double coefficient goes down drain induced barrier lowering is going down and because of that threshold voltage is actually can be adjusted which becomes higher and if that becomes higher this becomes smaller and therefore one can improve the leakage current leakage power reduction or we can improve the leakage power consumption as well as we can reduce the dynamic power if I scale scale down supply voltage. But the fact remains that there this or whatever scale law we are following as per what Moore thought we are unable to scale down supplies in the same node scaling as 0.7 times and because of that the fields are very high and the double coefficient is not very low. However, as soon as I say it increases the threshold voltage the current available to me which is called ion to ion on state currents or active mode currents increases decreases and therefore the speed goes down the fourth possibility and as I say I will come back to these each of them once again individually. Here is an interesting case if you have a single transistor and then you break into them into a series combination the way I am explaining you that two transistor in series is essentially has the if you have same channel length then one can say each have W you can see if these are to be series that means 1 upon W is equal to 1 upon W 1 plus 1 upon W 2. So, if I have 2 W here 2 W here by L then actually it is 1 W by L together. So, a single transistor W by L can be actually changed into 2 W by L 2 series transistors and now we can see from here that if the leakage current is flowing not necessarily if this is a drain in not V d d if the leakage current flows through down even when the gate voltage is at 0 then or you are near sub threshold value slightly higher but not larger than V t then one can see from here since the current is flowing N 2 source is grounded and therefore the there will be a voltage drop across V d s of this N 2 transistor or N 2 N 1 whatever number I get later and this voltage will rise and if this voltage rises then a lot of interesting effect it will give it to N 1 and particularly it will increase the threshold voltage of N 1 we shall see this little detail later and if I increase the threshold voltage of N 1 the leakage current through N 1 can be smaller has to go down and therefore the leakage power will go down and we shall look into this little detail as I come down. One can see from here in this case I will change in off current to the net off current can be proportional to exponential of I off into R off R off is the off current is the resistance in the off state of this transistor N 2 into 1 plus gamma gamma is called back bias coefficient and eta is essentially the Dible coefficient. So, one can see from here that I can have smaller off current provided I can adjust my values of V t s of N 2 transistors much more strongly. So, that the Dible effect is lowered and if the Dible effect is lowered for N 1 we shall see that we will reduce the leakage power. This is called stack effect a very very important method and we shall see in real circuits you need not divide your say one transistor 2 this is only to show you the point I am saying that if there is a series transistor the leakage current can be lower because of the Dible coefficient reducing. Now, this is the expression more details this is the this is the node voltage V x here at this point. So, if I see the expressions for I 1 which is proportional to 10 to the power delta which change in gate voltage change in the substrate bias voltage change in the drain voltage this is the Dible coefficient this is the back bias coefficient all three put together this is the gate voltage change s is called sub threshold slope the current through upper and lower transistors are given by this expression using the same x here. This is actually with you can see it is described for the width of the lower transistor width of the upper transistor and you solving this because these currents are equal in the two transistors one gets what will be the intermediate node voltage V x and one can see if you have a larger V x the Dible coefficient goes down and therefore, threshold rises to improve V x one can see from here eta has to be larger s has to be larger not 60 millivolt per decade it should be larger you should also have larger upper transistor should have larger width compared to lower transistors. If we do all this obviously one can see from here that I can increase V x and correspondingly if I find the value two single to this we get a typical ratio of 10 to power u is called universal constant which is given by from expression from this. So, I can figure it out what should be the size ratio of the W by L's of the upper and lower transistors and once I get to that value I will be able to adjust my Dible coefficient and therefore, the threshold increase and therefore, reduction in leakage currents. The fifth possibility is actually that if I already said it depends on the channel lengths threshold voltage is a function of channel length this is called V t roll off as you reduce the channel length this is a old slide, but does not matter this can be further extended down to 45 30 nanometers. So, V t further goes down this roll off of V t essentially means that if the threshold voltage goes down the leakage current will go up and therefore, somehow we must see that instead of using a short channel or a very small channel devices at least for the circuit transistors which are do not have a speed pressure that means they are not in critical paths their threshold can be higher and to improve their threshold one possibility is that for almost all possible data in which those transistors will not be in the critical path those transistors can be assigned larger channel lengths and therefore, larger threshold voltage and because of that change in off current will be adjusted corresponding. So, it will have lower leakage correspondingly. So, I repeat the basic idea in all the techniques I am suggesting is to improve your threshold voltage whichever way you can for the transistor and larger the threshold voltage the corresponding sub threshold slopes increases and because of that also the double coefficient is smaller and because of that the off current starts reducing which is essentially is constituted by sub threshold current which is becomes smaller in this case. So, before I go to summarize what I said let me let me go back again whatever I said so far I will re re talk about the same thing once again the first thing first effect we say when you stack the transistors. Now, here is a NAND gate which is two input NAND gate as an example shown here I have a power supply I have a ground I have two p channel devices in parallel for a NAND operation which has inputs A and B I have two n channel transistor in series which has inputs A and B and this is of my output for the NAND gate and this voltage which I am talking is the intermediate node voltage V m. Now, let us take when A and B are 0 so obviously these are on and the output is going high as the NAND function wants. However, M 2, M 1 are switched off as we think so since we thought that M 1, M 2 are switched off they are really in ideal case there should not have been any current, but we have just discussed that there are three many amount possibilities of currently case through M 1, M 2 even when V g s is less than V t or close to V t one sees this path M 2, M 1 the leakage flows through. So, there is a leakage path even if V g s is less than V t now what happens due to stacking of NMOS we are just discussed this transistor M 1 and M 2 which are N channel transistors at node M there is a intermediate voltage V m which is essentially occurring because of IR drop across this I leakage into R off of this will give a voltage here V m. So, V m is occurring because there is a lower transistor and the upper transistor there is a current flowing in the leakage current flowing which results in please remember since the current flow is like this this potential is always larger than the ground which means V m is positive. Now, what happens if V m is positive and it leads to three major effects. So, let me discuss about those three effects for the transistor two in my please remember I am having this as one and this as two I repeat I have this lower transistor as the named M 1 and the upper as M 2. So, in those cases the V g s 2 that is for the input A V g s V g s 2 is nothing but V g 2 minus V s 2 which is essentially equal to V g 2 minus V m, but since we are keeping V g 2 very small close to 0. So, I we assume the worst case 0 0 minus V m that means the V g s 2 is minus V m. Now, we know sub threshold current of M 2 will be smaller if the V g s is smaller small negative value and since sub threshold current of M t reduces obviously the net current a net leakage current will reduce. So, the first and the foremost simple thing has happened that the V g s effect has taken place V g s has reduced to minus value and because of that M 2 transistors have much smaller leakage and in a circuit only one current can flow if the M 2 transistor have a smaller leakage M 1 also can flow the same current and no more. Now, the second effect which is equally true for this if you look at the bulk bias which I have not shown here there is a bulk sitting here this B which is the bulk of the V n channel transistor. So, between bulk and source of this n channel M 2 there is a voltage now appearing 0 assuming that I actually ground the substrate then there is a minus voltage is again occurring as the reverse bias voltage is occurring at the substrate which is equal to minus V m and we know any reverse bias enhances the threshold voltage and since the reverse bias enhances the threshold voltage V t of M 2 increases V t or V t h whichever I think I sometimes said h sometimes C V t, but normally I say V t h of M 2 enhanced and we know larger the threshold voltage the leakage currents goes down we have just done the exponential that expressions which shows that the leakage current reduces if V t is larger. So, there are two effects the first of course is I say because V g s effect we say second is bulk bias effect and third which is not as simple as trivial like this third of course is if you look at the drain to source voltage of same transistor M 2 it is V d 2 minus V s 2. So, it is V d 2 minus V m since V m is positive the V d s 2 is now smaller since V d s 2 is smaller. Obviously, the effect due to the drain which is essentially because of the which say drain induced barrier lowering drain induced barrier lowering will go down because your V d s has gone down if D well coefficient has gone down by the same expression which I wrote earlier the threshold voltage rises. Which means larger the V m you get whatever way three of the reasons I said in either case the threshold voltage will rise and increase of threshold voltage of M 2 will M 2 will actually reduce the leakage currents. So, one can see from there that the leakage power due to sub threshold current can be minimized simply by saying stacking the two devices and in the NAND functions this is natural two of the transistor will always occur in series and because of that for worst case inputs of 0 0 one will see the smallest current going through it. Now, one can see from here in case the situation is that one of the transistor N 2 is 1. So, you may require actually breaking of the N 1 transistor into two series transistor of W and still create the stack effect the net difficulty will be larger area and therefore, probably some penalty will pay for reducing the leakage power. Now, other technique I discussed dual threshold I said dual threshold is an example is a specific example multiple threshold techniques. Now, threshold can be varied by number of ways one of course is by what we call as change in channel doping. Now, one can see from here I have given for you the recapitulate for you the expressions the threshold voltage of a N channel or a mass N p channel and mass transistor is 5 m s plus minus 2 5 minus q ox by c ox minus q v by c ox where 5 m s is the metal semiconductor or poly semi dope poly semiconductor work function difference 5 f is called the Fermi potential which is k t by q l n n b by n i it is plus for p substrate and minus for n substrate by n b substrate doping. The q ox is a fixed positive charges these days we are controlling extensively. So, minus q ox by c ox some is not so dominating, but it is still existing it is called the fixed charge density q ox is fixed charge density c ox absorbs is oxide capacitance per unit area which is epsilon ox by t ox with epsilon ox is oxide permittivity and t ox is the oxide thickness. Please remember this epsilon can be different for different dielectric insulators high k dielectrics will have larger epsilon ox and therefore, t ox can be proportionately increased to create the same c ox effect. And the finally, the bulk charges prior to the threshold is q n b x d max x d max essentially the maximum depletion width which if you wish I can write an expression for the step junction at least kind of approximation I can write x d max is under root of twice k s epsilon naught upon q n b. Substrate concentration into 2 phi f where 2 phi f is the twice the Fermi potential of course, if you increase the reverse bias it will become 2 phi f plus v s v and therefore, it will increase the depletion layer and therefore, increase the bulk charge. And in the threshold expression if you see if all of it you see the expression this increases with doping and therefore, larger the doping larger is the threshold and because it is a root value typically with q t h is directly proportion to increase in root of n b. So, if you change the doping of the substrate or change near the channel I am I can assure you that I can increase the threshold voltage. So, this exactly the first technique we say we can have different transistors have different doping you remember this is a additional masking going on additional cost going on, but all critical paths will have lower thresholds and non critical paths will higher thresholds and they can be adjusted through the doping and those transistors. The second possibility is the v t can be changed by c ox value and since c ox is epsilon ox by t ox the threshold voltage is proportion to t ox and since as of now t ox is reducing because of this scale threshold is reducing. So, change in threshold can be adjusted by this. So, you can have transistors which have multiple oxide thicknesses of the gate larger oxide thicknesses will have larger thresholds thinner oxide thicknesses will have thinner threshold smaller thresholds. So, one can keep as I say the theory is again and again the same we say it is assign lower threshold to transistors in critical path and assign higher threshold to a where which are not so much speed dependent. So, in any case those transistors which have higher threshold will lead to lower leakage currents. The other possibility we say is called substrate multiple body bias we already seen the bias effects. Now, here is another technique which we say you can have multiple body bias we know the threshold voltage with the back body bias if you apply it to the substrate bias positive or negative generally negative. We can see that the threshold voltage in any substrate bias is 0 bias threshold voltage plus gamma times which is called the back bias coefficient or body bias coefficient V S B plus 2 5 minus 2 5 where 5 is the from the potential K T by Q L and N A by N A or N D by N I. So, one can see larger the V S B value particularly the negative value that means plus add in to this one can see from here V T will rise on the contrary of this is negative value this will reduce the V T. So, by forward biasing or reverse biasing I can change my V T corresponding to V T O value initial value and accordingly can assign V T for different transistors. So, now you decide which transistors you should have larger thresholds apply larger reverse bias you apply a forward bias smaller forward bias or even 0 bias wherever you require lower thresholds and in that case you have can even have multiple body bias for different speeds requirements and one can create number of V T's for different transistors to for require, but this is something very crucial because if you do this way it is so much data dependent that every time for a different data someone has to do adaptively that technique to control and probably that is what the last I will show you that the end of the day the control will be more adaptive rather than fixed controls. Possibility of using multi threshold CMOS is known and is called empty CMOS here is an example shown here to you you have a P channel slip transistor correspondingly you have N channel slip transistor and both have higher thresholds and we already said higher threshold means lower leakage currents then there is additional circuitry of P channel and N channel which is kept here which all are high threshold with some kind of a logic which is can be turned on this resistance can be adjusted because there are multiple thresholds and one can create a typical R across this in fact by putting corresponding zeros and ones. Now what happens when the slip is 0 slip bar is 1 both N channel and M N and MP which are P channel and N channel slip transistor they are turned on and since their area is large large area transistor W is very large they are also higher thresholds there is larger area means larger W means smaller resistance so we actually create for this logic which is the real logic which we are using a V dd which is essentially called virtual V dd V dd V and that is essentially V dd minus draw across the slip transistor similarly there is a ground or V s s voltage which is virtual ground voltage which is again the this minus this much. So now I have figured out that I can change of course depending on the size and threshold adjust these voltage drops can be minimal they can be as close to V d in V s depending on the sizing you do correctly. However also in done this CMOS can have then adjusted V dd V and when the slip goes to 1 and slip bar goes to 0 these voltages are anyway available because of the drops across this and now this CMOS logic functions with virtual V dd and virtual V s s and therefore in the active mode you have a slightly lower V dd slightly higher V s s for the device. So it may reduce speed a bit because your swings are smaller but it will certainly reduce the leakage current when the slip is on that means when you are not actually using the logic slip can actually reduce the leakage currents. In nutshell what we will say is disadvantages of this MTCMOS is it requires larger areas we have already said and to some extent it reduces the performance speed it goes down we are just now discussed because the swing is smaller. So there is a reduction in speed. Now modification of so called multiple threshold CMOS is called variable Vth CMOS it is similar as what we did just now just not having the slip transistors we can also have back bias also variable to you in a active mode the back bias is normally set to 0 and in the case of standby mode back bias is given highest reverse bias and therefore the threshold of those transistors remain smaller in case of active mode but can increase when the back bias is reverse bias because threshold rises double coefficient goes down and because of that the leakage current goes down. So the problem is if I have a variable threshold which I can do by biasing I can have different reverse biases for different transistors and therefore I can adjust the leakage current differently or essentially if I connected with my MTCMOS additionally with this then there is an advantage that I can have a slip mode along with this back bias and two together probably can minimize the leakage current and can probably continue to have higher speeds in the active modes. So this is way of doing so dual threshold is also among the multiple or variable threshold if you only choose two values of threshold you say it is a dual threshold system if you use multiple as we are done there MTCMOS if you have a variable then you say it is variable VTCMOS all these are essentially similar basically we are doing two techniques one is using slip transistors the other is using back bias and if you do this together we have now a control on the leakage power. I just now said to you that to do this different data will have a different transistor on and off situation so the critical paths may not be all the time same for all kinds of data in a data path particularly if you are looking a processor you know with 64 bit inputs available to you design has cannot be a for a fixed threshold yeah if you keep a fixed threshold on and average power leakage power will go down but the better technique which essentially is now followed in the most processor like ARM and also in new Intel processors which are trying to reduce the low power or even the 686 equivalent from the AMD all these are low power processors or low standby power in particular but low power processors and since the effort all across the world is to reduce low power the technique which is now being adopted from the based on whatever I discussed so far is called adaptive biasing. The most common method which was first tried way back is called dynamic voltage and frequency scaling we know that dynamic power is proportional to V dd square and dynamic power is also proportional to the clock frequency c v square alpha c c effective V dd square into f so if dynamic power increases with f and also increases square of V dd so one is quite clear that of course I would not like to reduce f because clock frequency I want to improve because I want to have performance. If f is not the not to be scaled down then dynamic power is only this but reducing V dc of course will reduce leakage power also but we know if you reduce V dd threshold can also be changed through a double coefficients now how do we this is the technique which we apply so we say two closed loops are available in control one in the dvfs system direct dynamic voltage and frequency scaling the one is dynamic voltage control so depending on the data requirement and the power you are setting up the voltage can be scaled down or scaled up so it is called dvc loop the dynamic frequency control in which the speed which you have already assigned f is fixed for you so it actually find for given voltage the speed for that speed even calculates what voltage is required goes back and keeps doing the two loops so typically in nutshell I can say there are few things in few steps instead this control may be 1 2 3 I will say few of them not I am not giving full details they are available in mini papers of recent origin of 2009, 10, 11, 12 last 3, 4 years the dfc monitors chip activity so what is frequency after decides how when the data is different how much 1 0 transitions are taking place since it finds out the chip activity so it decides the frequency to work at now if you decide frequency to work at then the dvc that is the dynamic voltage control loop gets this information and which then allows we did we did it to change to corresponding to that frequency and the condition that it actually meets at least the critical part which is your slowest path now if that meets for this voltage change that I have on this delay is again fed back to dfc and again the frequency it finds out from the activity and till the two loops get stable for a one value of v d d and f you automatically the system will work at some lower v d d and a given speeds so this was the technique which was quite popular however please remember the cost here is there are two loops here and therefore little hardware cost and time it will slightly slow down because it has to go through two at least three or four times in the loop two loops adjust to its value and the clock frequency therefore has to be reduced because it has to happen within that now we have a other technique which is similar to what we said just now so we say it is called dynamic voltage scaling dvs now in this dynamic voltage scaling we have a single loop of the voltage v d d is adjusted for speed given speed is for fixed so v d d is just adjusted and voltage frequency relation how do we know that then what do you do is instead of online doing the frequency power supply readjustment we actually create a lookup table initially and for both voltage and frequencies v d d and frequency values and as I change the v d d I look I go into the lookup table and find what is the frequency I am operating and using this technique we can probably arrive at a reasonable value of v d which will have given frequency requirements now this is called dvs which is till date very commonly used technique for adaptive power biasing please remember this voltage is for two ways one is the voltage we are giving it to power supply the other voltage is it to the back bias so both voltages are talked about so I am only giving in one word both back bias as well as the power supply voltage are actually modulated as per the frequency requirements finally there is a not really new last three four years may be it is a dynamic voltage and threshold scaling now it is a improved version of what we just talked about dvs that is direct dynamic voltage scaling as it can be achieved at why this was tried that in this technique the volt v d d also changes the threshold using the lookup table we know what voltages are required for substrate bias as well as for v d d to get the frequency of operation and for those voltages we know thresholds are varying so you adjust your threshold for leakage power for those you figure out what frequency range you can attain and for this how much voltages you should apply between substrate bias and v d d of course this is again a loop system you need to go it is a large hardware it is called power management unit which essentially creates different v d ds different voltages and therefore allow you to have different thresholds at different points dynamically depending on the data as well as the architecture that one uses the biggest advantage of this algorithm dvds algorithm which does this you need a small processor to do this or small controller unit to do this but if you achieve that that gives something great advantage because it is then becomes independent of technology you know this technique can be applied for almost any kind of technology you go from 45, 32, 28 or 22, 16 so this is more likely of course whenever things are very good you have extra additional hardware and the catch there is the cost of or the power dissipation in this additional control hardware should not exceed the power you are trying to save in the whole hardware otherwise if that happens then the whole purpose gets defeated. So, having shown you variety of techniques circuit techniques to reduce these leakage power as well as to see deduction in dynamic power please also feel that this short circuit power is also proportional to vds minus v d d minus v t and also proportional to this w bile of n channel to p channel ratio. So, one figures out that the short circuit current or short circuit power can be also minimized if the threshold rises. So, short circuit power is not separately controlled if you adjust your v d d or you increase your threshold then in either case both dynamic power and switching power can go down. Of course, catch there is there is something to see that the rise and fall of the input pulse should be fast enough compared to the propagation delay which of course is a technology which of course is the device dependent phenomena which is essentially decided by the full threshold technology control. So, coming back in the shell what we can say if you are designing deep sub micron that is below 45 or 65 nanometers we are forced to scale down voltages in interest of device reliability and power with supply voltage being reduced threshold voltages also needs to be reduced as currents in the function of gate drive. The threshold voltage cannot be arbitrarily reduced to increase current drive since the device must have good turn off characteristics. The other possibility of worry which essentially one sees in turn off is a parameter sub threshold swing is defined as the efficiency of a device to turning on to off is called sub threshold swing or slope S which can be given as 2.3 k T by q 1 plus c g by c ox where c j is the junction capacitor source to drain to substrate c ox is oxide capacitance. Typically this S is 60 millivolt per decade of voltages in millivolts 60 millivolt per decade of current change one can see from here if I want faster turn on to turn off ratio I must as larger and if that occurs one has to have some better device to because otherwise this short circuit currents will be larger in the case because turn off to turn on is not very fast. This is called please remember sub threshold this larger is also means that the sub threshold current actually also is reduced at low voltages and therefore the leakage power further goes down. So, this is another issue where one has to worry about in your DVFS or DVS or DVTS either of the regimes one must take care of in our algorithms how to adjust this S values. If we lower threshold to extremely the device will exhibit severe leakage currents at VGS equal to 0 we are just discussed this. However, we must keep the threshold at or below one fourth of the supply voltage in order to maintain acceptable current drives because on current is also required. So, you cannot reduce too much or you cannot increase too much because if you want drive which is on current then you must have some amount of VGS minus VT remain available to you. The leakage current model we have not earlier I have given you that expression, but a simpler model now I give to you is I leakage is 10 micro ampere micron into width of the transistor into 10 to the power VT threshold by 95 millibolt. Now, what essentially I am trying to say that by adjusting the widths of the transistors one probably can have and threshold voltages one can have both threshold voltage as well as the width one can adjust the leakage currents. This is how you require these models in your algorithms. So, I thought I should provide you some models. In case of logic where we are examining more complex block of logic rather than simple inverters let us say for the case of 2 input NAND which is not very complex but complex which is much much complex than the inverters. There are 4 possible input combination even for 2 input NAND gate 0 0 0 1 1 0 1 1 and for each of these we can examine the amount of leakage current that flows and is assigned in effective gate width because for the worst case we must find what should be the effective W which corresponding to a leakage current of which logically one can express as transistor logic number of transistors in this case is 2 input N is number of this. This is 2 this is 4 this is a actually derived from a actual expressions actual graphs this is a fit function system. So, one can see from here the width of the logic is proportional width of device by this kind of expression. So, adjust in your series combinations the correct widths so that you can have a relatively good drive current at the same time you may have lower leakage currents. Particularly for IO drivers we can use the logical effort technique to go into the chain of them but if you are using a single driver by similar arguments for N channel and P channel device operating a factor which buffer is called the ratio of capacitance high to low. One can find the width of this proportion to how many pads you have what is the actual lengths drawn from the this to the device and how many stages you are going through for this buffer factor. So, IO design is very very difficult designs though it is said very trivially here, but the one can see from here the at least the input and output buffers consumes a very large amount of power. So, many a times in our hurry to design a circuit we keep forgetting that there will be a huge loss of power at the two ends at the input and output and one must take care much more than the normal circuit design in this case so that the net power is minimized. Then another issue which is coming into clock drivers the powers if you are too long a clock this then you are putting a lot of capacity loads a dynamic power is very large. So, the RC time constant of an interconnect through which the clock is moving has to be so adjusted and it should be at the highest layer of metal layer in the case of normal technology the maximum clock frequency which is allowed should be proportional. So, you figure out for that driver what should be optimal and channel widths so that it can give the required clock clock and you must always create h tree h tree for the case of clock distributions. This all all these are shown you to somehow to reduce the dynamic power in the case of additional circuits which normal circuit designers at times do not realize that they may be the ones who may actually create large power dissipations. The other technique as I say where the limitations are coming we are said that if you reduce VDD everything goes well fine. So, here is something which one must look into before we do VDD reduction the cycle time which is nothing but the length of the device which is logical or not length or it is a logical depth that is in a chain of logic is called logical depth 1, 2, 3 if there is 3 gates are driven by each other then it is called 3 depth of 3. So, L D is the depth of logic C average is the average capacitance seen wielded is power supply and on current which essentially T cycle means the period of this two one transition to go is 1 upon f clock C average is average capacitance of the load. So, one can see the T cycle is proportional to inversely proportional to VDD if I substitute correspondingly because we know ion is proportional to VDD square. So, if I see T cycle is this. So, obviously if you reduce VDD that you want to have low voltage design the first effect is your speed is going to be lower because your delay is going to increase do what you. So, the first thing when we said that you must do some kind of adaptive is simply for this reason because reduction in VDD may be helpful in some way, but it does actually change your speed itself to a lower value. Now, the other techniques of doing the low voltage this is essentially supply is called pipelining. Since the delay increases due to scaling of VDD we break the combinational logic and introduce a storage element. For example, these are these are your latches or flip flops register these are essentially this is your logic. So, you actually this is your normal you have a data coming through a register the logic is A and B may be non or whatever functions and or planes like an FPGA or PLAs and then finally, the output is given a other frequency which clocked at f 2 or f same either non overlapping clocks or otherwise. So, you have an input output registers and you have the logic this is what normal references. Instead you put through a input through a register through a logic A put additional register in between A and B and do this. Now, it is quite obvious now the delay between flip flop is reduced and because you know you have this delay is essentially governed by it you know there are two flip flop delays. However, by increasing the f here please remember this is likes putting in a pipeline at first data may take time to come out, but once data comes in every clock cycle you have the data and therefore, one can see from here that in here you can always have after only two clock cycles in this case every clock cycle will have a data and since the VDD is reduced. So, your logic is going slow, but anyway does not matter because now the data flow is governed essentially by the pipelining system and every clock you have the output this is essentially what is called throughput rate is available to you. So, one method even if you have a scale VDD the logic should be more like a pipeline data flow. The other possibility of course, if you are lower VDD you can have an extra hardware and many multiplexers to support you. You can now divide your work into number of parallel paths and every clock cycle depending on the select here one of them may come. So, this is essentially similar like pipelining, but only thing is that the data is partitioned by you and it needs a great effort to partition it equally or at least the time for which this select signal changes this must occur correctly in this each of them should not exceed that. So, essentially the critical path among them will decide the select signal here. However, each can run at its own supply voltage because at the end of the day this max is going to decide the throughput and therefore, gain this clock signal which is driving the select signals and therefore, in paralleling also you can have lower voltage supply and we have already said lower the voltage supply power is minimized in all three cases dynamic switch as well as leakage and therefore, you can do architectural thinking these are called architectural thinking either use pipelines or use parallel processing what is the penalty you are paying additional hardware. So, the catch is that when you put any additional hardware can you afford it because if you if they consume power then you have the whole game lost. So, one has to worry keep finding out what is the additional power you are going to spend to reduce the net power. If anyone of us are doing architectural power reductions so, here is the problems which you may see in issue issues which are related to pipeline or paralleling there is an issue of latency. Latency essentially the net delay between input to output. So, one must see to it that the for how many cycles data will not be available to you that is the depth of your register. So, is that acceptable to you because you know you cannot wait for even the first data output to arrive for any length of time. So, there is some latency issue and we also know from our general understanding of a system that the throughput rate and latency are somewhat related and therefore, when you design any circuit using either of these architectural techniques one must take care of latency as well as throughputs throughput rate. Obviously, you are putting extra registers everywhere you may put the depth of register may not be one may be not be one flip flop may be more than one flip flop. In that case you are additional area and this overhead circuitry will consume power and therefore, how much one has to worry about. However, we will be is getting worrisome as I say it is called technology driven scaling is creating a problem which is why this pipeline was thought of due to continuous scaling of channel lengths you are creating because threshold is I mean the VDD is not scaling in the same way the electric fields are increasing velocity saturation effects are seen. Even if higher VDD is kept current would not increase quadratically rather than linearly. So, the delay almost becomes independent of VDD which is very fantastic that technology scaling actually is helping you to reduce power because you can then reduce the VDD anyway. But they have to be combination of all kinds of technique I discussed along with pipelining and paralleling and see to it that the net power is minimized at the and at no cost of increase of delay. I just told you one of the method of reducing power if you have a smaller swing that is 0 to VDD instead of you have you may have VDD to 0.1 volt it is a VSS is not it is little above then one can say. So, for example, your level may go down one upon n times this is your register pipeline this is your logic then you have a driver which does not swing fully then this is capacitor n then your the receiver end you again have into n kind of this. Now, you can see this this noise margin should be sufficient it may come back to this old level to make logic be path through register. Driver circuit at in its voltage receiver amplifies back to the rail to rail and it can be found that the net power reduction in the bus this is essentially used in the bus data transmissions this is essentially the bus part. Bus has a larger capacitance these please remember I have not discussed so much, but interconnect is the major worry as of now for both speed and power speed essentially because capacitance is larger. So, is R is larger these days because of the technology I am using R C time constant is very high for the interconnect or the bus and because of that delay is going down. So, to improve this somehow you and since C is larger obviously the power dissipation is larger. So, to do this low power in the bus one technique is called low reduced voltage switch. Please remember additional registers, additional driver receiver circuits how much power you consume on that that decides whether to use this kind of structures. Then there is a possibility of clock gated pipeline use you know enable signal to turn off clock see after all clock is driving it all the time. So, when the clock is not required the clock is fed through a signal an AND gate which one input is enable now if you enable it goes 0 the clock does not work. So, essentially the logic can be held to this both input output register can be switched off when no data is expected to go through. The other is dynamic power reduction is proportion to duty cycle how long the system is used or activity coefficient. So, if I reduce if I change if I stop working here when I am not needed then obviously I am reducing the power. Yeah in the next part we shall see we must whenever we do on off the major worry is what we call glitches. This may result in what we call false clocking. The there is a clock gated pipeline with further power down. So, you can have this enable signal coming through a P channel gate enable bar coming here and all this power supply voltages themselves can be reduced as I said you this is like a sleep transistor a voltage drop there is a lower V d d made available to you power down V d d is V d d virtual here and enable of course, will stop the signal as if enable bar is 0 enable is 1 it is like an active mode with lower V d d and when enable is 0 enable bar is 1 this is switched off and the full logic is off. So, essentially you disconnect the logic from power supply when the clock is off eliminate the leakage I say cause the sleep transistor is very large it is a P channel device leakage is very small and because of that one sees that the leakage power also goes down. So, it reduces the net power the next of course, is now this what I just discussed by put the register in between the advantage I am saying clock frequency stays the same in the case of pipeline driven voltage scaling reduce voltage to meet relaxed frequency constraints increase clock load offsets power reduction somewhat cannot pipeline beyond a single grid generality this is a requirement, but this is how one can probably used pipelining you cannot have too long a too long depths of pipeline because then the delay will never be correct the latency will be very very large. Now, the last part of this power which is not really last last, but one may be we are talking now the power dissipation in a CMOS circuit an example is taken from a VLSI design conference and other conference papers through Vidya Graval's professor Vidya Graval's group at Auburn University and these slides are provided by him to me of course, they are probably available on website as well you can go google at name Vidya Graval or Vishwani Graval and probably you may get some of those. A typical CMOS charging discharging transient is shown here and we see dynamic power is because of charge discharge, but even if this transistor is on or off if this is 0 or this is high there is a static current going through it initially at least and during that this device has to supply current here and here sorry here and here same way discharge has to come through here. So, the net power dissipation we discussed so far if going from 0 to 1 transition is CL VDD square F 0 to 1 then short circuit current is time for what short circuit occurs VDD into I peak into F going from 0 to 1 at what times and finally of course, the leakage when the device so called is switched off and this is really not off. So, the leakage power and in old technologies of 0.25 this what I said you 75 percent dynamic power 20 percent in short circuit power and only 5 percent in leakage power and I have shown you earlier some table which shows in 32 nanometer node this is becoming 60 percent or 70 percent of the power and this is 30 percent. So, now one is worry about because if this power increases what do we do, but that apart which we have already seen how to reduce leakage power. However, the worry which is not so in so much shown in the dynamic power is right now this issue here is a circuit you know some unnecessarily transition is essentially called glitch. If you have the circuit if you have a logic which this probably if you are remember I have already discussed these issues particularly in the case of in the case of logical effort equal delay system, but if in case there are no equal delays between this and this one can see there will be a one additional transition occurring here which may result in a longer long output and long out we need not have a actually switching over, but it may switch and this may lead this switching may occur which is not expected and that may lead to as high a power consumption as much as about 30 to 70 percent and this glitch power is actually coming up now very much essentially because your frequency of operation is going to gigahertz and the smallest line delays itself can actually cause the glitches. Please remember these are after all the metal lines and line delays different lengths of metal lines or polylines can create the self delays and that may give a huge glitch power. Now, these are some papers which Vishwani Grawal has stated about so one can go and look at those this their essential effort was for optimization of cell based design how to improve the cell selection etcetera for low power switch glitch powers. So, this is their earlier work this sheets you can always collect as you will have that. So, this is essentially techniques which are available in this you can see here the redesign all gates glitch is suppressed when the inertial delay of gate exceeds the differential input delays. So, redesign all gates in the circuit for inertial delay which is greater than the differential delay if you do that that is essentially what this is a old paper which is available in Wales cell design conference this is called filtering effect. So, as I said there is a already prior work done by many others including Vishwani. You have the method is objective function minimize some of the buffer delay the inserted objective minimize net delay for all buffers j glitch removal constraint d g should be greater than t g minus small t g for all gates g maximum delay constraint is greater than maximum delay should be smaller than the net propagation delay. Therefore, new transistor sizing and procedures have to be used we can see you can do cell optimization as they are suggested transistor sizing can be again multiple driving strains balance rise and fall times power optimized by minimum in parasitic capacitances. Of course, there is a discrete set of varieties possible you create a different cells which can give you options then in the case of normal design. And then you know particularly the cells are not very circuit specific for all possible hardware are not possible number of cells available may not be sufficient. So, that may be large cost new glitch removing solutions balance the differential delay that cell input itself which is called feed through cells automate the delay element generate and inserts into the circuits and if do you do this then probably glitch can be minimized. And this is typically a flow design flow which they are suggested we start with design entry we do technology mapping remove glitches by the techniques which we are suggested and then go for the layout resisted feed through cell generation fully automated scalable to any IC size layout generation of modified net list can use any place to place root tool. So, this is essentially a work from a computer science persons they want to actually not place so much with the technology since that is available whatever available to you on the spies or any other models available. So, given a design entry can we do still glitch power reduction and this is the or at least create the cells IPs for different such requirements for drive currents and this, but having equal delays at least differential equal delays so that the glitches are minimized. The last, but not the least maybe there are many more things maybe I will quickly go into this the new structure of a MOSFET which has appeared in last 10, 15 years is called FINFET which is essentially the new MOSFET which is going to be used in almost every low power circuit. The slides which I am presenting to you here is from courtesy of Neeraj Keja at Princeton University. Of course, these slides are available on webpage I trust, but anyway Keja being our good old friend from the Neeraj Sardinian conference was kind enough to give me many years ago. So, I am first time showing you here. So, what is the motivation the traditional view of CMOS power consumption active mode which is called dynamic mode which includes dynamic switching and short circuit plus glitching and the last is the standby mode which is the leakage power. The problem as all of us have just seen that active mode power is 40 percent even at 17 nanometer bulk CMOS 60 percent is really going to the leakage power. This is essentially due to the old paper of 2002 by Shivana Rehner Kaur at Genarikasen. So, what is the techniques for the leakage that is standby sleep transistor we are just clock getting we have seen we can have leakage vector applications glitching of course, we shall see later. Interfer with disabled SS switch off switch on possibilities in circuit operation and do not address active mode leakages do not play too much about these you know during active mode do not try to play VTs or anything on the leakage because during that mode let higher current be possible. Active mode circuit optimization will include gate sizing multiple VD due to threshold ratios we have both multiple VDD and multiple threshold I already discussed all of them. Respect circuit operations and timing constraints can be used to reduce active mode leakage. So, we can have now techniques in which this however this assumes a standard power transistor which is a normal N channel or P channel MOSFET or CMOS in general case what we are used so far. What opportunity is therefore, a typical structure called FinFET over a normal MOSFET. A FinFET is a device which is characteristic can be leveraged for low power design. The static threshold voltage control through back gate bias as we could do in normal DVS kind of techniques. Area efficient design through merging of parallel transistors this is another feature of FinFET that you can have reduced area compared to multiple MOS transistors normal transistors. Independent control of FinFET gates either you can have connecting all the gates or you can have independent control and you can have therefore, different novel circuit design opportunities. So, this is how FinFETs were thought as replacement for normal MOSFET and we believe that they can also be have since your threshold can be this area efficiency can be done capacitance can be minimized one probably can have low power design using FinFETs. Here is some typical up to say 32 nanometer case you always say like bulk CMOS and you are non silicon nano devices which may come into 10 nanometers you are still away from here many things are tried here, but as I am the strongest supporter of silicon next 30 years we are with silicon come what may. So, let us look this to reach this we are still a gap. So, what can be done? DGFETs can be used to fill this gap instead of bulk CMOS you have the double gate or multiple gates as FinFETs as we call DGFETs are extension of CMOS manifesting process is same as CMOS. The key limitations of CMOS scaling address through better control of channel from transistors gates reduce short channel effects better ion to ion of one thing we had I discussed for a good high performance circuit is higher amount of ratio of currents this is possible and of course, because of the variety of parameters under your control now additional parameters the sub threshold slope can be improved which essentially will reduce the leakage power. And one can probably get away from the problems of dopant fluctuations as they occur in the normal MOSFETs. There are structures this is called planar DGA MOSFET this is called multiple FinFETs connected which is called FinFET and there is also same structure in the vertical mode possible is called vertical DG MOSFET. These are standard figures a typical Fin type DG MOSFET can be shown here to you two size this is your please remember this is your gate and these are your source trains these are two sources and backside are two trains. If you see this figure this is your gate which is shown here this is your source and this is your drain contacts to this now this is one FET now one can see from here why we say it is double gate because one can have control from the this side this side and the top side. So, the channel in this is not only this is the channel length but even the channel width which may act like a transistor for this kind of this. So, we have now as if additional control possible there is a gate here there is a gate here and there is a gate on the top. So, essentially you have a double gate. So, one can see if you have an independent control this gate and this gate have separate biases possible when we say it is called IG gate IG FinFET both gates are FET can be independently controlled and therefore requires of course you need an extra process step this is called back gate and this is called the front gate and in between is the oxide thickness this is your source drain. Please remember this is called thickness of this here is called the fin thickness thickness of green line here is essentially called silicon fin and that is most important that is why it was named FinFET. In the case of FinFET these are number of FinFET you can actually connect all gates like this shown here. So, this will become a common gate one single gate structure this is called SG FinFET single gate if you are independent control then you say IG FinFET typically if you all connect the number depends on how many such infants are there n infants will have 2 times n. Please remember h is the height of this this because that is where the channel is going to form. So, width of the channel is 2 times n means 1, 2, 3 here channel width in a FinFET is quantized width quantization is a design if the fine controller transfer strength is needed. Now, this certainly very helpful in having a good memory actually we shall we will not look into this in this course. Here are 4 possible structures shown here one is single gate other is independent gate and one can see from here in the normal single gate FinFET. The back gate bias is connected to the gate itself and this is the standard NAND gate since the FinFET we can have lower leakage current because of the we can also normal single gate we do not have control, but in the case of low power this you may have a separate power supply for the gates for the back bias substrates like sorry for both p channel we have a high voltage here pull up bias voltage this is pull down bias voltage which is one can be forward bias other can be reverse bias and one can adjust the threshold of this one can adjust the threshold of this and therefore can and one can have some clock going on this. So, when in active mode they behave normally in a off mode they increase VT so much that the leakage current goes down. So, similar technique was tried in the case of independent gates and either of these 4 techniques have been tried for the implementation NAND gates it becomes very difficult in the case of very complex logic to use the IG mode gates because the connection connectivity is too many places. However, this has been tried and this is one of the major technique of reducing the low power reducing low power in the newer circuits below say 32 nanometer nodes. This is a comparison for ISG, LP, IG, IG, LP just to get you an idea very low leakage current 85 nano amps in the case of SG very high leakage because you are connecting gate to the substrate. So, one micron whereas in the case of independent gate LP you have larger than this. However, because the width is very small comparatively and disadvantage of course is low leakage not so low, but this is very very low leakage. So, now you can see the speed of a circuit essentially can be better with SG it also gives you the SG's version with low power can give better leakage. However, many other switch capacitance analog circuits or any other blocks can be best attained for low power using IG, LP. You can have higher or lower leakage depending on you if you match the pull up or pull down. So, there are advantages disadvantages in SG and this please remember SG has the worst thing is that it has normal SG has very high leakage, but LP version of this has a however once as soon as you say low leakage the speed has gone down for a feedback. So, depending on only low power only high performance or a standby one of the possible combinations can be chosen. This is typically what I am trying to show you red shows the delay and green shows the power. So, this is only shown for SG kind single gate you adjust your back gate bias with a low power and in that case one can see this is the back gate bias as you increase it the leakage goes down, but if you increase it the delay also rises. So, now you can adjust some way how much back gate bias, how much leakage, how much delay or speed you want and correspondingly teller your biases. So, that the on current to off current ratio is of your choice correspondingly and the low power the power is minimized. There are variety of challenges in FinFET by circuit design is no comprehensive circuit level comparisons are available. There are not enough tools to control design tools available at the higher levels. There are not enough standard sales available so that you can synthesize for optimal or suboptimal operations. FinFET with condensation is based on solving a complex integer convex integer formulation which though I solved it very simply, but it is not so it is extremely complex. So, you doing lot of variability issues also with an adds to it it becomes extremely complex and it does not as I say handle all logic styles you cannot have dominoes and every other style in the FinFETs. However, I mean you can you can have some things and you may not have all of it. The last part quickly will go I think in already I am running short of time, but let me finish this one other technique of reducing power is essentially coming because of the interconnect power consumption. We already have if we have a system on chip which is nothing but with intellectual or intellectual property S O I P. There are lot of S O P's S O C's or I P's are getting marketed and they are normally firm and they act like a black box. There are issues of timing power area to be solved for any interconnect layer of each of them. So, this particular part I am going to talk about more about interconnects in an S O C or any in circuit per se. Interconnect consumes large power 60 percent of the current processor DSP process in particular is consumed through a interconnects. So, obviously all the techniques we discussed for the devices and all techniques of architectures everything they were valid only for device performance device related performance. But if the power is additional to those is coming more from the interconnect one should worry more about the interconnect power. So, I am one of the major worry is that this is not scaling down because the RC time constants are not scaling down. So, how do you reduce the power at least the dynamic powers? We know dynamic powers has something to do with the activity coefficient. Dynamic power is alpha times C v square C v d d square into F. So, can I reduce alpha which is the activity coefficient? So, what we say our goal is there to reduce number of transition on the bus. So, techniques explored in the past to reduce L d i by d t which is called switching noise on the output pads. This is of course, is always present even now L d i by d t problems cannot be easily solved this is always present. However, the other power reduction could be which is called bus invert coding as in the case of what we call starvation coding or limited weight coding. Now, between the there has to be trade-off between reduced activity and circuit overhead. So, we say you reduce alpha and to do this if you put additional circuit to do that how much is the power on the additional circuit. So, overhead is what most important you need extra wires to do this. So, additional power on the bus encoding circuitry can be complicated sometimes and decoding is also equally complicated and it may consume large power sometimes and one has to worry about this power low power interconnect. A typical bus can be modeled into LCR circuit like a transmission line. In our the kind of model which I am showing you is something one of my student in 90s or 2000 time has worked on similar model have been chosen by many people and they say a typical between the two lines of a bus there will be a capacitance and between between the bus and each ground there will be a substrate capacitance. So, this is the each this is one bus each line in a bus. So, there is a capacitance between the wires and there is a capacitance for wire to the substrate. So, there is something called lateral and horizontal capacitance vertical capacitances this is called C s and this is called C c. So, we say energy is half y times C c x times C s times we did this. Now, what are this x and y how many sections we have is deciding that. So, this is the energy. So, let us see how do I reduce the energy reduction if we reduce V d d then we can reduce the energy. So, what I do is use hamming codes if you use reduce V d d and use hamming codes this can be reduce a y c c c V d d square f c shunt capacious current sensing. So, I have different techniques you do bus invert coding you get this power. So, you adjust only x reduce y use alternate bus invert coding. So, you can say either reduce y reduce x reduce this f c is not controlled because if you control f c then your speed goes on. So, I am now I am only interested in transition to C c transition to C s power supply. So, depends on I can do coding on the data arriving on a bus and when we may use different methods to actually reduce the power. So, particularly I am interested in reduction in x and y. So, different techniques we are suggested in literature and if you what many people do if you only do V d d scaling then we say it is they use hamming codes we shall see what they are. If you do shunt capacitance this you say bus invert for x this alternate bus invert for reduce y what we have done we have used a new technique which we saw based on modified Huffman codes in which both x and y can be minimized. And if you can minimize both in x and y the net energy lost on interconnect can be minimized further. I will skip this because this will require lot much effort to explain but let us look into power versus residual probability. So, depending on the capacitance value this is half puff and this is say 5 puff the depends on the different kinds of coding you use this is single error single event error for example this is multiple. So, one can see from here bus transitions play a minor role in the case of lower capacitance whereas they play a larger role the load capacitance is very high. So, if there are larger transitions on a higher capacitance buses then you have large power lost this particularly occurs in a normal codec systems where these switching are constantly as the data goes with a very high speeds. The typical idea in bus invert code is to a simple invert all wire values say if you have one you make it 0. So, it is called bus invert now what is that advantage the advantage of bus invert is the assumption is that if the data is coming and the last data is let us say on a one wire there is a 0. So, next data should come 0 then there is no transition, but the next bus should also should not should have one which must receive one again. So, there are no transitions. So, the coupling as well as this if I change it I probably feel that at least one of them will be reduced transitions and in that case for a particular data I will have lesser number of transitions. So, this is what essentially bus invert does in the case of alternate bus invert we actually see odd and even bit wires dealt separately explicitly. So, how do we find this? So, the better method I may give you the very simple technique. Here is the one what is essentially the way we are doing it what we do is we figure out how many 1s and how many 0s are in a data. The difference between them is called the hamming distance number of 1s minus number of 0s. Once you know the your hamming distance that is how many they differ and if the hamming distance is larger than n by 2 then you must invert extra invert signal equal to 1 and put inverted next data on the bus. So, that the transitions are minimized if the else you put invert to 0 that is whether to change the data on the buyers. If you find the hamming distance is less than n by 2 what does that mean 0s are larger if the 0s are larger you do not invert them because anyway you will require more 0s to available no transitions. So, you actually prefer to have those. So, every time first figure out that means there must be a circuitry which must find the hamming distance for each data arrival from the last data to new data how much is the new hamming distance and correspondingly you invert signals whether to buyer should be inverted or non-invert code it like that. Once you code it you pass through data on a line and at the end you decode it. So, that the original data is stored and that is essentially what is called bus invert technique. Here this is the graph shown here this is called effective transitions versus the ratio of c c to c b you can see if c s is 0 here c c is 0 means b 0 b 0 means c c upon c s plus c b. So, c c upon coupling capacitor divided by the substrate capacitance plus net capacitance if I say it is called b please remember I am again saying c c that is the coupling capacitance between wires divided by the net capacitance which is c c plus c substrate. So, at this point c c is 0 c c is 0 and here c c is 1 means c s is 0. So, for the 3 cases this is uncoded. So, you can see large activity if you do not code it if you do alternate bus coding it is something like this a b i if you do original bus invert coding which is the triangle this is lower than this red is a b i as is implemented. If you this is of course, actually implemented this is theoretical one finds the red one is what is most important. So, one can see from uncoded to the coded implemented this is theoretically one feels that a b i will should be better when the c c is roughly half equal to c b, but this is not true if we figure it out this is anyway increasing this is theoretically evaluations we did and we implemented it on a simulated actual hardware. Then we figure out alternate bus invert coding actually reduces the from 1 to 0.75 to 0.8 it is increasing, but not very much. So, is alternate bus invert is also so is this original bus invert coding, but it is still it is higher than alternate bus invert. So, now I say the best transition is that you do alternate bus invert coding correspondingly for almost any interconnect and you may have activity coefficient reduced. Now, the problem is as there is a maximum number of transition therefore, reduced from n to n by 2 assuming uniform and independent bits peak by dynamic power is therefore, tough to half, but with invert coding n by 2 becomes most likely hamming distance. So, the inverting data values makes no difference n gets bigger average power saving becomes smaller and larger than that the saving is lower because the other power starts in scheme optimal for overhead one extra wire is also required to create invert signal. We suggested another technique which is this is a you can go into probability theory of information theories on this area one of the very famous code of data transmission is called based on probability of occurrence of alphabets is called Huffman's code s is called the symbols this is probability this is the code which it creates code is not unique we can allot 0 1 0 1 1 0 1 1 1 to each alphabets. However, this number keeps rising one you can see the number of bits rising as you increase the symbols what we did we are actually truncated that only 3 bits which is called Monopane Huffman's code we actually applied it to a standard buses we are compared our result with this for a typical on processor with a memory with the bus length going from in a cache from say this much to this much in millimeters one can see this is roughly this is cache bus and this is memory bus 20 this is large enough you feel it, but this is essentially in an on processor we have actually. So, green blue one essentially is normal instructing bus encoding this is address bus encoding and a b i is which is what is shown here which is what we have tried and using our Huffman's code and we we figured out that we can actually minimize for any lengths of buses the power activity coefficient or energy reduction is almost 5 to 10 percent more and may be 6 percent less I mean the more reduction on an average in our codes the last, but not the least one of the simplest technique which initially everyone tries in a sequential data stream is to go gray coding only one wire out of n transition in any given cycle is on extra circuit only changing you know in gray code only one bit changes. So, if you have a data converted to gray codes only one wire out of n transitions in any given cycle will change extra circuits and extra area that therefore required useful for address traces which tends to be sequential like say program counter, FIFO pointers, indices for arrays stored in a RAM these may be useful and many of the sequential final state machines to set the states you require state transitions there at that time may be use gray codes mix of gray code and humming code based bus invert probably can do both random and sequential traces power reduction and one can have low powered circuits. At the end of the day in conclusions one can reduce for any design of a processor optimization algorithm level you can have transformation for filters like glitch can be reduced motivation and coefficients you can do lot of things in algorithm itself. So, that the amount of computation itself is minimized at the architectural level you do a architectural level voltage scaling minimize the transition using coding just now and discuss add or input bit swaps which are discussed arithmetic later. So, we will show you minimize glitch as much as possible by equalizing the delays at the layout level and the logical level place in root optimization bus bit ordering you do all low voltage support circuitry logic level power down and get a clocks can be used. If all the techniques discussed by me are applied one can design a low power processor which requires all kinds of architectures all kinds of circuits in different architectures and can have a low power. Please remember one of the major effort all across the world right now is to create a very very low power processor the arm has one of them or maybe Intel is also having in some tablets. So, we are trying to reduce the power because this iPads or tablets everyone wants to have extremely low power circuits many of them are not very high performance circuits, but some of some part are high performance. So, you need to control threshold for those architecture parts. However, overall the effort is to create a low voltage low power processors and all my such techniques which I discussed with you so far can lead to such designs. Some of my graduate student who helped me in this long time are listed here Monday pundit Bhima Rao where my PhD students there are many MTech students which are not listed. Those who are directly worked with me on the low power area are listed here Saurabh Benglani, Agashe, Saural Rajachadri, Gurvinder, Sri Hari Bhama, Kapil Jain, Raghunandan, Gulwani, Pumbhurya, Mahaja and many others I forgot in some of them I apologize if their names are missing here and thank you. These are some of the references which I will be providing to you. The three books which I may actually advise few books for this low power design you may like to have is low voltage, low power, ICs and system I am shorting the names ICs and systems by E Sanchez. His full name is I think Sinensiko or something. I do not know exactly but may be Sinensio. This is one book this is the name low voltage, low power ICs and systems. I think the other author is Andreas I am not sure but just check Andreas something something. The second is again low voltage, low power ICs VLSI systems, VLSI subsystems by KYEO and Kaushik Roy from Purdue University. Third book which is very popular is digital IC design perspective by very famous people. Very best textbook for VLSI design, basic VLSI design which in my first course I discuss is by J. M. Rebe, Anand Tashree, Anand K. H. Chandrakasan from MIT this is from Berkeley, Chandrakasan from MIT and the third author is B. Nikolik from Berkeley. And fourth which is one of my most standard book which I keep using in my VLSI design is CMOS, VLSI design and system etcetera by Eshrangian, Eshrangian and Neal West. If you use this four books most of the thing which I discussed this last two will give you the basics of power design or power consumptions and how this then of course number of papers which I gave you will give you the actual details of the things which I showed in this work and we can see why power reduction is becoming so very relevant in the present era. Thank you for the day.