 We have been discussing the design of a fast CMOS circuit and the technique we have been discussing so far is logical effort. We will continue with that theme and today probably we may finally finish off what is exactly the word logical effort and its importance in designing circuits. Last time we have discussed about calculating the logical efforts and also electrical effort. We shall look into what we call as a net result of all the effort which is what we call delay estimation. So, here is a simple circuit shown to you. It is a ring oscillator, n stage ring oscillator and one wants to know what is the delay essentially or frequency related to this ring contour. Since all are inverter, individual inverter has a logical effort of 1. The electrical effort since all inverters are identical, their input capacitance which is going to be the output of the next stage is always same as for all the inverters and therefore, c out by sin in this case is 1. The parasitic delay since it is an inverter, it has a parasitic delay of 1 unit and therefore, the stage delay d is gh plus p which is equal to 1 into 1 plus 1 which is 2 and as we know the 1 upon delay of course, since it is a 1 0 transmission for a oscillator. So, we say it is 1 upon 2 d tau into n as the oscillator frequency. There is something interesting about the circuit as I last time said. There are all the circuits normally designed prior to the actual load knowledge. We say it is fan out 4 inverter design. Essentially means an inverter is driving equivalent of 4 inverters load which is shown here in the figure. There are 4 inverters and driven by one single inverter. If you look at the logical effort for this inverter, then the g is equal to 1. The electrical effort is since there are 4 inverters. So, the c out is 4 times the c in. So, essentially the h is c out by seen as 4. The parasitic delay of this inverter being 1. Therefore, the stage delay is gh plus p which is 1 into 4 plus 1 which is 5 units. So, in this fashion we will be able to derive the stage delay and let us calculate for a larger circuit which essentially say there are multiple stages logic networks an example shown here. One can see here that there is an inverter whose input capacitance is 10 units and output capacitance essentially for the whole network is 20 units of capacitance. Now, since this inverter is driving a NAND gate and one of the input is driven by it. Let us say it has an input capacitance proportional to x. Similarly, this NAND gate is driving another NAND gate one of the input of NAND gate and let us say it has an input capacitance proportional to y and then finally, this NAND gate drives a buffer stage or inverter which input capacitance is z and the output load of course as I say is 20 units. Now, if I want to find the delay for the signal starting at this input side and how much time it takes to charge this capacitor at the output the technique which logical effort people have suggested is the following. We first actually ascribe to each of such stages 1, 2, 3, 4 there are 4 stage circuit now. So, we say for the first stage g 1 the logical effort for inverter is 1 logical effort of an OR gate we know is 5 by 3 for 2 inputs and logical effort of a NAND gate 2 input is again 4 by 3 inverter being 1. So, g 4 is 1. So, we actually know now g 1, g 2, g 3, g 4 we also can see that if you see localized electrical effort and let us say since x is the output capacitance of this, then we say x by 10 is the electrical effort for this, then y by x for this, z by y for this and 20 by z for this. However, I repeatedly told you last time that the net electrical effort should not be the should not be calculated by multiplying on this, but essentially is always 20 divided by 10 which is c output path to say in path. So, in our case for example, it is 2. So, if you want to calculate the net path delay, so the method is first calculate the net logical effort along the path and the method which we suggest is the net g or net path logical effort is the product of g 1, g 2, g 3, g 4 and this stems from the fact that we have actually chosen in our logical effort our template as single inverter which with p channel device has twice the size of n channel. However, any other circuit can act as a template and since it can act like a template with reference to that the next logical effort can be calculated. So, based on this analysis one can see as if the net path, the net path logical effort is g 1, g 2, g 3, g 4 and net path electrical effort I keep saying it is just the output capacitance by input capacitance on this and the path effort that is total path effort f which is g h is essentially one has to actually make a product of g 1, h 1 into g 2, h 2 into g 3, h 3 into g 4, h 4 and then we can write the net path effort f as g h. Please remember I repeatedly saying you h should not be calculated as product of h 1, h 2, h 3, h 4 and so on. This is an example which you can like to see one inverter is driven by another two inverters each has a capacitor of 15 and 15 and the first inverter has an input capacitance of 5 and the output capacitance is both are both inverters driving is of 90 units. So, if I want to find along this path the net g then we know g for this g 1 is 5, g 1 this for second inverter is also 1 sorry 1 and 1. So, the net logical effort along this path is only 1. So, g capital is 1, the h is of course, 90 divided by 5. So, it is 18 and therefore, g h is 18 into 1 is 18 and one can then also find in between h 1 by h 2. So, let us say this is h 1. So, we say 15 by 5 is h 1 and 90 by 15 is h 2 and therefore, one can always evaluate the net f which is g h which is 18 in our case. Now, to take an example little more seriously we can now compute the delay of a multi stage network by saying that path effort delay we have we know that if I have the path effort delay defined as d f then it is essentially the sum of all these stage efforts f 1, f 2, f 3 for all stages which is g h, g 1, h 1 kind of thing. So, the path parasitic delay also by same logic one can say sum of all the parasitic delay along the path and please remember a path parasitic delay for inverter is 1 for 2 input any circuit is 2, 3 input it will be 3 and so on. Therefore, the path delay essentially across is d is equal to d f plus p or d i which is written d is taken from here. So, d f plus p, p is the sum of that. Now, one must evaluate this what we call as d f and once we know this d f which is the path effort delay then we will be able to calculate the net d. So, how do we calculate? We say that by theory one can prove and this I am not proving it here you can look into the books theory there. What Eshra Sproul and Sutherland and Harris have suggested and that can be proved rather easily that if you want the delay to be minimized the best way of doing is that each stage of each stage of the path of this network should have same stage delay that is the f bar as I showed here must be equal to g 1 h 1 equal to g 2 h 2 equal to g 3 h 3 equal to g 3 h 3 and since f is nothing but f i this. So, one can say f which is essentially is the net path this g h. So, we say f to the power g 1 h 1 into g 2 h 2 into and if they are all equal then it will be g to the g h to the power 1 by n individually and therefore, it is f is equal to f to the power 1 by n if n stages are actually being used and therefore, if I substitute this part in this therefore, the minimum delay of a n stage path is there are n stages each has a delay of f to the power 1 by n plus the net parasitic delay. So, we say it is n f to the power 1 by n plus p. The key result of this logical effort is exactly this that I can evaluate the net path delay if I know the number of stages if I say they are optimized to a minimum delay circuit. So, I can evaluate f which is nothing but g h and since then as I know as the stages I go through number of blocks there I can evaluate the net p p 1 plus p 2 plus p 3 of p n and once I know this I know the path delay. What is the advantage of this since I have already said for optimal the stage delay is known which is can be found from f actually and since f is known 1 by n is known I know optimal delay optimal stage effort g h and if I know g for that I can then evaluate h for every stage and that is exactly h means the ratio of capacitance or therefore, the ratio of sizes of the two sides of the gate. So, here is an issue which here is in a problem which we can then take care in our next analysis which is what we are just now at each gate apply the capacitance transformation therefore, since g h equal is common to all which is equal to f bar. So, c in 1 is equal to c out 2 by g y by f bar and since I know for each gate what is g I know what is the load output load I have I know the stage effort because I know that net f and therefore, I can evaluate the last c in 1 of that stage and by keep doing this back calculating till the first inverter first stage is obtained I will give an example. Let us say there are 3 nand gates for simplicity the assumption is all gates have identical structures and there are 2 inputs only. So, we can see from here the first stage c in is known to me if this is my signal where I am actually putting and c in is c the next inverter next nand gate which is driven by this I do not know what is it c in this second gate is driving another nand gate I also do not know what is it c in, but I know what is the c out and I right now assume that the output capacitance to be driven is same as what input capacitance is. Now, I want to know the value of c in here value of c in here and if I want to do c in means essentially I know the width because capacitance proportion to width. So, I can size the transistors here and here. Now, let us say size the transfer nand gate for 3 stage shown here the path logic effort what is the net path logic there are 3 stages. So, G 0 into G 1 into G 3 since all of them are nand gates each nand gate has a logical effort of 4 by 3. So, 4 by 3 into 4 by 3 into 4 by 3 if I do a calculation this terms are to be 2.37. Now, this word right now I assume that since each gate is only drawing one input, but there is a possibility that this would have driven another input another gate here as an out fan out, but right now we say there is no branches individually one each is going through in the path. So, we say branch effort is one will come back to this path little later again no branch path in this. So, b is 1 and then we say electrical effort h is c out by c in which is essentially c by c which is 1. So, if we say G B H that is G B H product then this is the path effort and the path effort is there for 2.37. We are just said that the minimum delay occurs when each stage each stage has same stage effort and using that analysis we derived that the minimum delay achievable between input to the output is n which is 3 1 2 3 into G B H which is f to the power 1 by n which is 1 by 3 plus the each stage delay since there are 2 input nand gates each has 2 parasitic delays and since there are 3 of them one can say 3 into 2 p invert as the formulation substituting the value 3 star 3 into 2.37 into 1 into 1 to the power 1 by 3 plus 3 into 2 because inverter has a delay of 1 2 of them. So, it is 6 6 plus 4 which means 10. So, the minimum delay achievable to us is 10 for this circuit and once I know this now the net path delay and therefore, I know f a small f f bar since f is equal to f to the power 1 by 3 is f G B H and therefore, I can I will be able to now evaluate from there the capacitance back. The effort of each stage which we call the optimal f minimum or f bar is equal to G B H to the power 1 by 3 and if I do it this occurs to be 4 by 3. Seen at the last gate should be equal to seen in the last gate minimum is equal to G I into C out I by f minimum as we just derived. So, it is 4 by 3 into C divided by 4 by 3. So, C. So, what we now say that the last stage of this should have an input capacitance same as the output capacitance. Now, using that for the next stage middle gate we calculate again G I C in last gate by f minimum which is again 4 by 3 into C divided by 4 by 3 C. So, it seems that for me getting the minimum delay and the total part delay of 10 the capacitances are same across all inputs and that we could derive. Another example. So, to create very slightly different from the last one now here is something which I am adding additionally one can see from here in this circuit there is a 2 input NAND gate which is driving 2 input 2 NAND gates. Each of the when the output of this stage is not only driving this gate, but also driving this gate. This NAND gate not only drives this one last NAND gate, but also drives another 2 NAND gates which are shown here. So, one can see output here is branched between 2 of them output here is branched between 3 of them and therefore, the net contribution coming here will not only come from here, but these 2 net contribution to capacitance will not only come here, but also these 2 and we must find how much additional current or additional path delay will occur because we are putting additional path along the along the net path for which we are looking for a delay. We are assumed here that all the output capacitance for every stage is 4.5 C and let us calculate now a to b what is the path delay or minimum path delay. Let us do the same analysis once again from a to b logical effort this is 2 input NAND gate 4 by 3 is the first 4 by 3 is the second 4 by 3 4 by 3. So, product of 4 by 3 4 by 3 4 by 3 the electrical effort is output capacitance divided by input capacitance I forgot to show it the input capacitance here is 1. So, 4.5 divided by 1. So, it is 4.5 and the branch effort is essentially now total branches 2 here and 3 here should be taken care in evaluation branch effort. The way we calculate is the actual capacitance here is y plus y whereas, we are only driving one of them. So, we say y plus y divided by y is the actually the real effort which is required to drive this because you have twice the current, but only one of that will take care. So, we say y plus y by y into because this is in series effort is in series of that. So, we say now it is z plus z plus z 3 z divided by the one which you are looking for one of them only z. So, it is z plus z plus by z. So, the essential branch effort is 2 due to this and 3 due to this. So, 2 into 3 is 6 and since we now know g we now know h we also know branch effort 6 4 4 by 3 4 by 3 4 by 3 and this is 4.5 this is 6. If I multiply g b h then I get f is equal to 64. Now, we know the best stage effort or the minimum stage effort f bar shown here is nothing but g b h to the power 1 by 3 because there are 3 stages of path. So, it is g b h to the power 1 by 3 which is 64 to the power 1 by 3 which essentially means f bar or f minimum is 4. Now, we know that the delay along the path is number of gates stages. So, 1 2 3 stages multiplied by f bar which is which we have calculated we rewrote that f bar plus the delay due to the each gate 2 input NAND gate will give 2 inverter equivalent delay this will be 2 inverter 2 inverter. So, it is essentially 2 plus 2 plus 2 which is 6 this P inverter if I substitute this value 4 here plus 6 here. So, I get 3 into 2 6 plus 4 into 3 12 12 plus 6 is 18. So, the path delay going from input A to input B which is minimum path delay is essentially 18 units. If I want to actual delay then multiplied by tau then it will get absolute delay 18 tau if tau is known from the technology. Now, coming back to find what should be the capacitances here because that is the sizing of this gate or transistor sitting here I must evaluate the values of z y and right now assume that y's are same they need not be actually same it can be 2 z it can be 4 z even then it can be calculated it is not my point to say that all should be equal for simplicity example is taken where all the capacitances all 3 gates are identical. If I do so to calculate z you can see c out divided by g which is 4 by 3 because of this 2 input NAND gate divided by 4 because this is the f bar which we achieved is f bar. So, I repeat c out into g of that gate which you are calculating for c out to see in ratio divided by f minimum f minimum which is 4. So, 4.4 5 into 4 by 3 by 4 and if we calculate this it becomes 1.5 c c is 1 in our case similarly once I know now z here then I say the y calculations can be perform identically we may say it is 3 times the z which is the output capacitance 3 because 1 z plus 2 z plus 3 z. So, the net capacitance seen by y is 3 z. So, c out for this gate is 3 times each. So, it is 3 z into the gate which is again 2 input NAND. So, 4 by 3 is the g for that divided by the best effort stage effort which is 4. So, 3 z divided into 4 by 3 by 4 z is of course, is 1.5 as we calculated. So, y also turns out to be 1.5. So, I can then therefore, evaluate the actual capacitance as seen by each stage even not knowing this provided I know what is the output capacitance what is the input capacitance how many stages in which signal is propagating what is the kind of gate I am going driving and if I know this and how much are found out for each of them if I know these numbers I will be able to tell what should be the capacitances here here and once I know capacitances I know capacitances are proportional to the widths of the transistors and therefore, I can evaluate the widths and therefore, I can say I know the best delay if I put these much values of W by L's in respective transistors. This example essentially is what our crux of everything what I said because given a network I should be able to say size the transistors such that the delay is minimum that is the idea of all design of a circuit. We could always do all this analysis by putting on a spice putting some arbitrary values of y and z and can start simulating. However, one has to accept some value initially of y as well as z to actually start simulation. Now if your guess work for those values is not correct or not close to what you are going to get like 1.5 or 1.5 y and z then what will happen that the simulator will never able to converge to these values or even if it converges it will take hell of a time to know what are the values of W by L's corresponding to y and z. This kind of analysis we can see in a very short time back of the envelope kind of situation. I was able to find out what is the value of 1.5. Now I am not trying to say this takes care of every parasitic in real life. So what you now do is since parasitic cannot be larger than the actual capacitance. So we say once we know this value of x, y, z or whatever in the line in the network we can then probably put those values of W by L's correspondingly everywhere and then start simulating the spice result will come for even for a very very large circuits the result can come in a very short duration. This is what exactly I said once I know the typical transistor gate you are talking about NAND gate for example, I know 2 into W by L is C by 2. So corresponding to this width of those transistors can be chosen. This is a table which essentially gives you a catalog of gates for example, the first part essentially says the logical effort of static CMOS gates for example, you have an inverter NAND gate NOR gate multiplexer XOR XNOR and these are the inputs number of inputs is 1, 2, 3, 4, 5, 6 or N. For an inverter there is only one input no questions for NAND gate the 2 input NAND gate has a logical effort by N plus 2 by 3. So 2N 4 by 3 it is 2N plus 1 by 3 it is 5 by 3 for a NOR gate it is 5 by 3 3 plus 2 by 3 is 5 by 3 3 plus 4 2 plus 4 N plus 2 is the number of inputs. So 4 plus 2 by 3, 6 by 3, 5 plus 2, 7 by 3 and N plus 2 by 3 by similar argument we know NOR has a logical effort of 2N plus 1 by 3. So, one can see for 2 input it will be 5 by 3 for 3 it will be 7 by 3 for 4 it is 9 by 3 and so on. If you look at the multiplexer which I designed earlier one can see the logical effort for all of them because there is an array of multiplexer N way we are doing individual stages only 2 input or 3 input 4 this per this a same number of inputs of number is independent and therefore the multiplexer has typical logical effort of 2. Now XOR or XNOR by same thing if you have 2 input XOR or XNOR you have a logical effort of 4, 3 input as 12, 4 inputs as 32 so it actually rises extremely fast as the input start rising. Corresponding to the theory which I just developed for the inverter total static parasitic delay for static CMOS this is the table gives you finally the inverter has one P inverter which is one unit N input NAND gate will be N times P inverter. So, N as the parasitic delay for N input non with N P inverter for N way multiplexer it is 2 N P inverters and 2 input XOR will be 4 N P inverters. So, once I know the parasitic delays of each of kind of gates I am going to use in a static CMOS I know the logical effort and if I know the electrical effort for a given network I should be able to actually design any network for its best performance. Having done so far static CMOS and normal combinatory logic one of the interesting feature of a design of a circuit is essentially what we call a fork or amplifier chain of stages. This is very very important for simple reason that in a normal digital hardware you see clock is being distributed all across the chip. Now since the clock is a signal which is moving on a interconnect line and interconnect line acts like more like a transmission line. So, there is a delay from the clock generating pad to the end of the chain wherever the clock is moving. So, it may happen that clock may not remain clock because of delay it may become clock bar some at some stages. And therefore, every now and then we must actually have to amplify and bring it back to its original clock timing. Now if that has to happen we need so called amplifiers. There is another issue which occurs in normal life is the following. For example, you have a network and you are driving through some input some network it has an input capacitance of some C in and it is driving an output capacitance another this C L or may be C out which is a typical circuit we have, but the problem right now is that C in by C in is very very high. Now this case occurs in most times because this is the output of a chip possibly the output load may be very high which we are not even aware when we design. And your input is decided by the stages which we have already put. So, this network in that case will be a buffer stage. So, this will act like a buffer then in which whose input capacitance is C in output capacitance to be driven is very large C L. Now the question arises if I want to retain speed that means I should be able to charge this charge the output node faster this is my v 0 for example. Then the problem is since the capacitance is larger C L d v by d v 0 by d t which is the current required to charge this charge this capacitance will be very large because your load capacitance is very high which means this network must be able to provide or sink this large current. The larger current can only be supplied or sunk if the sizes of the output stage of this network is very very large because w by L are very large only then it can provide you large currents. If the w by L of these stages this buffer stage has very large w by L obviously your scene also will increase. So, when the signal actually is coming it will take much longer time even to charge your input node which essentially means every time you want to improve this there is no way you can increase the current without increasing the last stage effort last stage input capacitance. Now it has to be driven by another block since this is increased I need larger currents coming from the earlier stage. Now these are to be larger I will get this capacitance large and there is no end to it. Now how do we get rid of this problem? So, what logical effort methods suggest that this buffer can be broken into stage of inverters and the way it should happen that this capacitance to this capacitance ratio is so chosen that it does not increase drastically the w by L for this, but does increase it and in stages you can see this is x this will be 2 x this will be 4 x at least double I am saying and then finally may be end stages and finally you may have your CL. If you do stage kind of inverter in this then we can say the input to output individually you do not have to drive too much because the ratio electrical effort will be not very large and since this is also now larger here also it will be smaller and probably you can optimize the delay along this path as we did just now. Now the other problem which I am saying in the fork is the clock and at times even the clock bar which are required simultaneously in this circuit if you see this is clock bar and this is clock. Now in this circuit also one can see there is a inverted delay of 1 here there is inverted delay of 2 here. So, this is essentially not just out of phase because normally what I expect if this is my clock this should be my this is my C k this should be C k bar, but if there is a unequal delay this circuit this input clock may actually show you worst case this kind of this and they may actually never be clocked bars at a given time. Now this situation can be sorted out if somehow the delays of 2 inverter is made equal to delay of the this inverter. So, when clock and clock bar appears they are actually out of phase as is desired this is called fork. So, coming to fork whatever I said I think it is already shown here I do not have to repeat for example, for example, multiplexer and XOR circuit require complementary signals. So, they need forks a general fork sorted 1 leg inverts 1 does not shown here as just now I discussed. So, there can be number of ways this fork can be created one is you have only one leg of the circuit network has 2 inverters the other is one which will allow therefore, if you can equate the delay on both sides then I will have clock bar as I need alternatively I may have a combination of 2 inverters on 1 leg and 3 depends on the load at the outputs I am going to drive. Now this is called 2 1 fork 3 2 fork it can be 3 4 fork and so how much kind of this numbers be chosen. So, the delay is minimized is the aim of this part of those lecture which I am now talking. A general fork is therefore, shown here you have a load capacitance and then the output capacitance on both side may need not be same as well because this is very relevant because clock may be driving some circuit clock bar may be driving some other part of the circuit and their loads may not be same their input capacitance therefore, load to these may not be same. So, let us say C A and C B are the loads of the upper and this this has N inverters stages this is N minus 1 stages for example, shown here it has an output capacitance of C B both have a net input capacitance of C in and we say now we want to know for making delay is equal on N and N 1 N plus 1 stage or N N N minus 1 stages we believe that if I can size these chain of inverters on this side and chain of inverters on this side such that the path delay along this line is same as path delay along this side then we may say clock bar will always exist even if the output load capacitance is not equal. However, the net output capacitance is essentially C A plus C B the input capacitance is also divided into two parts C in A C in B and the total electrical effort from input side to the output is essentially we call C out by C in which we can therefore, calculate individual electrical efforts will be H way for example, this one will be C A by C in A C A by C in A and for the second for this case C B by C in B. So, I know individually I can calculate electrical effort for this block electrical effort for this block and even if let us say C A is C B I mean the best case possibly H A and H B may still not be equal this is very relevant thinking because when I am equating the delay the capacitances which I will get from this side and capacitance will be because that stages are not same will not be same. Therefore, even if the C A is equal to C B that is H 1 A is H A is equal to H B even the sorry C A is equal to C B H 1 A may not be same as H B that is C in A will not be equal to C in B and therefore, the evaluation should not take care that it is always C A equal to C B because even if it is so you know has to still calculate C in A and C in B. Here is an example we have to design a 2 1 fork with input capacitance of 10 units and total output capacitance is 200. So, what is the total delay in the fork? Now, we say C in right now we assume C in is 10 and let us say for the simplicity case which is C A equal to C 200. So, we give 50 50 to each of them 50 percent. So, C A C B is 100. Now, we already said the two parts will have C in A and C in B as the values for the input capacitance. So, we actually and we know C in A is equal to C in B which is equal to the net C in which is 10. So, we say that let fraction of input capacitance given to part 2 inverter is beta of that and the 1 minus beta is therefore, given to the upper part or lower either other part and then we calculate the part delay along the 2 parts. For example, this is of course, the since it is an let us look at the circuit the inverter. So, we say 100 is our H, G is our 2 inverter. So, it is this 10 times beta is essentially the ratio of this capacitance beta times this. There are 2 stages let us say 2 forks also 1 by 2. So, 2 is H because 2 forks 2 into 100 by 10 beta to the power half plus 2 inverters please remember this is the part delay for the 2 inverter side. This is the part delay for 1 inverter side and therefore, 1 and what we are saying if we want the circuit to perform as clock clock bar they should they should be equal. So, we say 2 fork system will have 2 into 100 by 10 beta to the power half plus 2 p inverter must be equal to 100 divided by 10 into 1 minus beta now because there is only part of it capacitance is given to 1 inverter. So, 10 into 1 minus beta plus p 1 inverter therefore, 1 inverter. Now, we equate them since I know this I can this is a simple equation one can see since I know p inverter I know every other thing except beta and if I solve this simple equation then I get beta is equal to 0.258. Since I have divided my input capacitance in a fraction of beta. So, C in A is 10 times the beta which is 2 fork system which is 2.2 10 into 2.5 8 which is roughly 10 2.6 and correspondingly the 1 fork capacitance will be 10 times 1 minus beta which is 10 1 minus 2.258 is essentially 2 6 is 7.4 into 10.74 into 10 to 7.4 and if I substitute in either of them they are equal obviously, one can by using this formula one knows the delay along either of the paths is please remember I you calculate this because you do not have to do this 100 upon 10 minus 1 minus 0.258 0.26 plus 1 if you evaluate this number appears to be 14.5 units of delay and since they are equal both arm of a 2 1 fork has equal delay of 14.5 units the input capacitance of the 2 fork 1 should have a C in A as 2.6 whereas C in B should be 7.4 units since they are the net capacitance from the driver stage is 2.6 plus 7.4 which is 10 units as provided. So, one can see now that by actually making a choice of 2 or 1 or 3 and 2 as same by in case of 3 and 2 this will become 3 into 100 by 10 minus 1 beta to the power 1 by 3 plus 3 P inverters and then we equate and re-evaluate beta for 3 to 4 stage any number of such stages can be good enough and we may evaluate then for 3 to case again the delay. If we figure out that the 3 to fork is lower part delay than 2 1 fork then we go for 3 2 part if we do also analysis for 4 3 fork and re-evaluate the delay and if that delay does not change very much from the 3 to fork then we need not go for 4 3 fork but we continue with 3 2 if that does not happen we may try 1 5 4 fork and keep doing till the delay is minimized. Just now I said I do this calculation for 3 2 and one can see this equation gives beta 0.51 delay of 11.1. So, obviously since the delay there was 14.5 here it is 3 2 it is 11.1 a 3 2 fork is always better than 2 1 fork for this electrical effort. Please remember for different electrical efforts this value may be different and may have not same if both C out A and C out B are not equal this values will be different for 3 2 and 3 2 1 it may be found that 3 2 may not be as good as 2 1 or 2 1 will be as much as sufficient to work with. Now, question is what is the best number of stages for a given electrical effort. So, here is a table which doing this analysis which I said for every such stage people have already done this ashram this sprawl as other let book already gives this number. So, for example, if your electrical effort is from say 1 to or 0 to 9.6 set it is found that 2 1 fork is good enough for the minimum delays you can get 3 2 you do it, but that it will not improve very much in delay. If your electrical effort is 9.6 set to 38.7 then we say 3 2 fork should be used and by similar logic if it is 1970 to 7 that is very huge loads if you are looking at the output then you may require a 7 6 kind of fork requirement and depending on break points you are looking for you must be able to decide which one of them should be used. So, in case of buffers at times if the load is very heavy at the external side then the number of the buffer stage should have more than 3 or 4 clock stages to put the drive out. So, is that point clear that what is the advantage of fork because the fox tells us that we can equate the delays, but we can figure out how many stages should be used in a fox in the upper and lower arm such that the delay is minimum and equal to on the both arms. This interesting feature you must see it that at this point let us say from here to here they are same values. So, whether you use 2 1 or 3 2 will always give you same delay and that you can verify that is how the break out points are actually obtained. Then another interesting issue which is shown here if you are an input capacitance driving in such two paths unequal effort branch 2 2 for example, with unequal efforts h 1 and h 2 are unequal efforts equating the delay equation for each path the ratio since they are same identical this. So, h 1 by h 2 is c 1 by c 2 c 2 is here c 1 by h 1 is equal to c by c 1 c by c 2 ratio is essentially h 1 by h 2 is c 1 by c 2. So, since I know for one I can evaluate the other can easily be evaluated and figure out what should be c 1 and c 2. Now, if you are unequal effort that is the number of or kinds of gate on one path is different from the other even then you need not worry f 1 by c 1 will be equal to f 2 by c 2 where f of course, you can figure out by the simple path effort g h way once I know g if there is a branching on the lake we can still evaluate b f lake by f other way we can then find the path effort f for both paths stage effort we know by f to the power 1 by n and since we know f by c 1 f by f by c 2 for both f are known and c 1 c 2 one of them will be known I will be able to get c 1 or c 2 and using f we can go back continuously back till the first inverter is obtained. So, one can do 2 to far unequal effort or unequal electrical effort or unequal path effort and can still do the analysis here is an example for the minimum delay you have a NAND gate which is 3 input you have a NOR gate which is 3 input and it is driving that is driving an inverter at the output the g for this is 5 by 3 this is g for this is 7 by 3 please remember this is N plus 2 by 3 this is 2 N plus 1 by 3 N is 3 inputs. So, 2 into 3 6 plus 1 by 3 this is N plus 2 by 3 that is 3 plus 2 by 3 that is 5 by 3 inverter has a logical effort of 1. Let us say each of them output has h 1 is 144 by 12 h 2 is 1 separate capacitance is driving h 2 is 192 by 16 the effort of the top leg is 5 by 3 into 7 by this 3.89 effort of the bottom leg is only inverters for example, is 111 is 1 this is one path this is another path and one can calculate f 1 g h b b of course, is not existing here. So, only g h path for lower leg from here is g 2 h 2 is 16 overall path effort is the sum of the 2 f 1 plus f 2. Now, since the if you want to size the this is the kind of circuit I am saying b is equal to f 1 plus f 2 by f 1 which is 13.4 f is g 1 b h 1 is 62.7 there are 3 stages 3.97 working backward I can evaluate as we did earlier all capacitances similarly one can do back calculation from here to here by calculating the minimum stage delay and then going backwards to evaluate. Please remember when you do the last calculation this capacitance accuracy will be decided by when you calculate scene for this using the stage effort which is common to all then it must take it must get the value of 3. If it does not please remember your analysis or calculations are gone wrong because this is fixed to start with scene this is fixed start with see out. So, when I am going back calculations from 12 to input this must appear 3 at the end of the day if it does not your earlier calculations are also wrong please go back and verify. For large parasitic delays to get best designs we need to adjust branching allocations. So, we can see this must be that is what I said this is the nutshell what I said n into f 1 by c into 1 plus p 1 n into f 2 by 1 minus e 1 to the power p 2 and if I can do this I can minimize the delay for any of the branch efforts I have done already this. So, let us we can have more than one branch you may have 3 to 1 with different f 1 f 2 f 3 here and by same method which I say instead of now 2 it will be 3 and therefore, you will have some kind of a cubic equation in this case for each f this when you equate and if you solve this one can always still optimize the path limits. You can read the problem solved in here because I think I must finish some my talk fully before today. So, a typical summary for this is draw a network buffer non-critical path with minimum size gate estimate the total effort along each path verify the number of stages on that path estimates the branching ratio compute accurate delays including parasitics adjust the path b to minimize these delays. So, this is essentially what I say given a branch or fork systems any number of branching can be done delays can be minimized and this is actually what a normal circuit will have please remember that each circuit we may draw more than one blocks and therefore, the essentially branches will be constantly available in a hardware. So, analysis of delay all along the path is a very crucial way and it can be done using logical effort much simpler way and much faster way and can also give you minimum delay for which capacitances can be calculated. The another part of this logical effort since we have been taking a normal you know simple symmetric logic in which the template inverter was 2 1 and then we always say make it equal to 1 and make it equal to 2. There is another way of doing many circuit may not have such symmetries and we may have to do a symmetric logic in real life. For example, the point I am saying if S is half that is this is this is 2 and this is 2 that will be makes 1 and this is S and S. So, it essentially means that this is 2 and this is 2. So, it is 2 essentially. So, one can see that if S is half perfectly fine there is no problem in actually getting 2 1 ratio on this, but if F is less than half essentially you say A input is favored because S is half means this value will be larger S is less than half means this value will be smaller and this that means A input will be essentially favored. If S is greater than 1 the B input will be favored. So, this path will have more current to worry about or limiting point other in other case this will be essentially decided. Since they are not symmetric one must evaluate their logical effort for both paths and then also get the delay for each of these case. For example, this is since gamma time. So, G A is 1 upon 1 minus S plus gamma upon 1 plus gamma G B for this size is 1 upon S plus gamma upon S and the total of the 2 G A plus G B is 1 upon S 1. Please remember if I substitute S is equal to half this is half 1 upon 2 this is 1 upon 2 2 by 2 is 1. So, 2 plus 2 gamma upon 1 gamma sorry this is half. So, this is 2 this is 1 by 2 means 4. So, 4 plus 2 gamma by 1 plus gamma which is 2 plus 1 that is 3 which is what essentially you get in the case of a NAND gate of equal size 2 1 size. Now, if you see you can see this is 4 by 3 this is 4 by if I take gamma is equal to 2 then it will be 4 by 3. So, this 4 by 3 system which is for a normal NAND gate will now change to a total of 3.5 and individually G A will be has a logical effort of 1 whereas, G B has a logical effort for 3.4 if S is very very small please remember if A will be favored if S is less than 0.5 B will be favored if S is greater than 0.5 and this is the value which actually tells you. So, continue working with new logical effort if you are using asymmetric gates all that you have now need is very important issue has been created the discharge and charging transients will be different in the two cases and therefore, one has to actually derive T P H L path they are separated from T P L H paths and then evaluate the net delay slightly not very simple as trivial as a made, but can be done still analytically can be done for S is equal to 1 by 4 pull down transistor bits will be 4 by 3 and 4 and therefore, J will be 1.17 for gamma of 2 similarly, G B will be 2 G total is 3.1 little more than 8 by 3 which you would have got otherwise. The stray capacitances must be taken care order of transistor that smaller transistor near this we know in normal designs that the transistors which are smaller in W by L should be kept closer to the output nodes. So, change this A or B correspondingly and C to C this means S is less than half which favors input A. If not the second part last, but one part of this course we are looked into normal static CMOS we are also look for branching effort we are also look for Fox now, we look for other circuit families. One of the famous circuit family in CMOS is called pseudo NMOS this is essentially chosen in particularly when you want saturated kind of loads to be put to a N channel device. This is like a saturated load transistors, but in this not necessarily saturated because depending on the state of this this P channel transistor even if its gate is grounded may actually move from saturated to unsaturated. So, it is a varying in resistance here this is of course, fixed by us by 4 by 3 the ratio is still kept 2 to 1 4 by 3 and 2 by 3 is the for a pseudo NMOS. There is a NAND gate using pseudo NMOS what is the advantage of pseudo NMOS? The advantage of pseudo NMOS can be seen from here in a static CMOS I would require 2 P channels in the pseudo NMOS I required only one. In the case of NORC I have 3 transistor NORC whereas, in the case of static I would have put 4. The other things are kept same this each is to be 4 by 3. So, it is 8 by 3 8 by 3 double all this 2 by 3 whereas, this is kept same 2 by 3 please remember now it is slightly different since it is a load inverter kind normal. We know the load size should be smaller than the drivers transistor please remember in the case of NOR if this is the resistor this resistor value should be higher. So, that the V o low is lower than the available minimum required for less than a V t. Now, to make this typically we use this transistor half of size of this and that is the number which was please remember it is not like a template transistor of a static CMOS where this was 1 unit this was 2 unit here it is the other way this is kept as 4 by 3 whereas, P channels are kept 2 by 3 double the N channel size is half the N channel size is used in P channel. So, that the resistance of P channel is higher than N channels and because of that the V o low can be less than thresholds. By same argument since they are this is to be 4 by 3 this is now new template for us in a pseudo NMOS the template is 4 by 3 2 by 3 if I converted to NAND it is 8 by 3 8 by 3 series makes it 4 by 3 P channel is single. So, 2 by 3 in a NOR gate you need either path to be 4 by 3. So, each P channel N channel beyond parallel need to have only 4 by 3 do not need 8 by 3 because they are not in series. Now, once I actually decide this for each input one can evaluate the G 8 by 3 plus 2 by 3 which is 10 by 3 divided by 4 by 3 plus 2 by 3 which is 2. So, divide by 2. So, 8 by 3 plus 10 by 2 by 3 is 10 by 3 divide by 2 is the 5 by 3 per input this is for the NAND gate. By similar logic I can I already said if they are not P channel and N channel are not same resistance values because to keep V o lower. So, that then there is a noise margin the rising rise and falling transients will have different W different path efforts or logical efforts. Please remember this is very important because they are not equal charging transients occur because of the P channel transistor discharge transients occur because of the N channel chain. And therefore, the logical effort for rise and fall transients is different. For example, the table has given you all of it for it and remember load is always pseudo en masse in our case P channel device which has a value of 2 by 3. So, 2 input NAND as a rising transient 8 by 3 I just calculated. Whereas, this you can see why it is 8 by 3 and a falling transient will be 8 by 9 average of course, if you take 8 by 3 plus 8 by 9 by 2 it is 16 by 9. 3 input NAND rising has 4 falling as 4 by 3 average is 8 by 3 and keep doing for example, n n wave multiplexer one can see from here it is 8 by 3 for rise 8 by 9 fall and 16 by 9. For a normal arc you have 4 by 3 for rise 4 by 9 for fall and average is 8 by 9. Once I know my logical effort for these kind of gates the rest is identical because all that is different from the earlier one was G is not like a static CMOS one has now new G is available for these kind of pseudo en masse kind of gates you are using. Since I know my logical effort for these gates again the GBH will give me the stage effort and GBH to the power 1 by 3 time 1 by n to the n plus P will also give me the path delay. So, there is no difference once I get the G values which is different from static CMOS. This is a symmetric NOR gate you can see from here these are 2 this is very interesting this is one inverter this is another inverter driving this and one can then figure out if their outputs are connected does this act like a instead of a pseudo en masse. If I put a NOR gate in this fashion as shown here please remember otherwise what is static CMOS I put a P channel transistors in series of this 2 P channel in series and 2 N channel in parallel here is using asymmetric theory that is the gates I have used 2 by 3 4 by 3 2 by 3 4 by 3 and when I connect the output they are still acting like a NOR gate and then if I evaluate the following and rising effort G D and G A G U then I find for the rise transient since they are in parallel G is 1 and G D is 2 by 3. So, the average logical effort is 5 by 6 for this kind of gate which is smaller than a NOR gate in the case of static CMOS. So, even if you are using a symmetric method of doing symmetric NOR gates paralleling like this which is essentially proposed by Johnson Bayer back it is a novel structure and it shows that you can improve on the logical effort and if you know logical effort has decreased we know G H B which is the path effort will also decrease and if your path effort decreases the stage effort decreases and if stage effort decreases the value of capacitance will be smaller and also you will have a lower path delays. So, normally we always believe that a NOR gate has to be static or something. So, here is a static gate used as a symmetric gate and using this method one can see one can actually reduce the path delays and therefore, one can see that one speed up is possible with little different. Please remember in static also I would have required 4 in this also I required 4, but now it is faster than our normal static CMOS. The next problem other than pseudo enma I mean any such dynamic systems I do since the P channel device is always on in the case of pseudo enma it constantly dissipates the power when N channel is on except for V G S less than V T which is only one transient time small time in a transient the both N channel P channel conducts currents and therefore, constant power dissipations. If you look at normal dynamic gates one can see that in a normal dynamic gate which I will see soon later which is not pseudo kind only N channel is acting at one time or only P channel is acting one time and therefore, net power will be minimized in a dynamic gate and here is some kind of the only difficulty there is you need some kind of a pre-charge and evaluation phases and then you have some kind of problem in all dynamic gate of charge sharing or logic not properly getting or the next logic changing earlier than all these of course, can be solved by modified dynamic gates called modified domino domino or norards zipper and once you know one can then evaluate if G for them then I will be able to again calculate the delays. Here is for a domino circuits this is a gates with evaluation transistors P channel this is an inverter this stage essentially is the evaluation phase phi this is our input transistor when the S goes 0 only then sorry S goes 1 and this is a phi. So, when S is equal to 1 sorry S is equal to 0 ground is removed and charging occurs pre-charge as we call when S becomes 1 whatever is the pre-charged phase here depending on input here if it was 0 that 1 will remain if it was 1 then it will discharge to this path to make it 0 and since when this was charging this was off. So, no current in the power supply to ground on this side when these were discharging since this was off there was no additional current coming from P channel by same logic I put 2 N 2 N channel transistor in series here to make and I can correctly size them for series combinations and parallel combinations and one can then please remember the evaluation is on this side and charging is on this side. So, for charging and for evaluation transistor there may be possibility of only one of them please remember this is another Zoraz zipper circuit in which a lower N channel is removed now only clock is done through here. So, we say when this is 0 now we assume that that time that this input is still though it is available this takes care of this current as well and still charges this increases little bit of power, but it reduces the one additional transistor N channel transistor and in that case probably the power may be slightly larger, but at least you have lower number of transistor hence the area is minimized. By similar calculations one similar thinking we can do logical effort for transmission gates here is the inverter which is putting transmitting passing through a pass gate CMOS pass gate transmission gate this of course has a ratio of 2 to 4 as usual 1 to 2 and this is taken larger you know because it has to drive this capacitance Cgd and Cgd of these and therefore and finally the load. So, one normally puts little larger W by L's here this is my data this is my transmission gate whenever this is 0 S bar is 0 or S is 1 this is on and the data is transmitted when opposite occurs when this is S bar is 1 and S is 0 transmitting it is off and data is retained. The same kind of circuit can be re-implemented on they also are 4 transistors here they are also 4 transistors identical to this can be reconfigured and you can see here this is the data at the external this is what the ways control is there for the multiplex if S bar is 0 S is 1. So, this inverter data is then transmitted out corresponding to invert d bar and when S bar is 1 this is off S is 0 this is off. So, no data can be transferred in the output looking at the numbers given to you for each input the logical effort is just 2 4 plus 2 by 3 that is 2 logical effort for S which is select lines is 2 plus 2 by 3. So, 4 by 3 the net logical effort is 2 plus 4 by 3 which is 10 by 3. Now, by similar argument this table has provided logical effort are all kinds of dynamic gates inverter NAND NOR multiplexers the method which I have just now shown is used to evaluate for 2 input 3 input 4 inputs all of them are clock evaluations if there is no clock as I just now said then this becomes closer to static and therefore, the logical effort is smaller. So, typically before we quit this area you can see from here this is a typical flow chart which shows how to do logical effort method to apply for your designs. Here is the block specification essentially I need to know function I need to know output input capacitance and delay which you are expecting. Select a circuit family use static CMOS when in doubt this is this is been now proved again in 2008 or 2009 onwards that the best design is easily possible on a static CMOS at the cost of bit of gate area increasing but the designs are better for performance. So, if you are specified you have to work on domino or you have to work on pseudo NMOS or you have to work on zipper that is another issue but if not given always start with static CMOS. Then sketch the path for the whole network which you are looking for label each gate with their logical effort whatever in the series of each path write g 1, g 2, g 3 and write equal to their gate paths each of their has a logical effort. Label wire lengths and layers default to minimum pitch. So, you know if you know the wire lengths. So, you will be able to calculate the capacitances associated with the wire compute wire R C delay because you know the wire dimensions and you also know the wire lengths and which layer layer means whether the interconnect is on poly or it is interconnect on diffusion or interconnect on metal 1, metal 2 or metal 3 whichever layer you are talking you must give the specs for that and based on this R and C can be calculated for wire and so that is typically R C delay for that blocks in between blocks you can specify wire R C delay is reasonable or not very fine. If they are not reasonable increase the width and spacing of the this or add a repeater I mean add additional amplifier buffers there until your delay of the wire is attained. Once you say you are putting a repeater this is essentially a buffer design here again you have to use logical effort technique to find how many stages of repeaters you need or in a chain. So, that the R C delay is of your choice and the delay from that path itself is not larger. Once you know the R C delay of your choice is reasonable then label nodes with lumped wire capacitance. Estimate stage effort as I said G H B compute the gate sizes going back from output towards input seen and as I kept telling last time the last stage from the last stage output to input when you go back the scene evaluation from your calculation of F bar going from output towards input should actually come as same input which you have started with. If there is something wrong obviously you have to redo the analysis reduce stage effort again increase stage effort or add buffers if any one of them does not match. Once the scene which you are choosing matches of your requirement and specs are given estimate please remember firstly it should match secondly it should be sufficient you already said I cannot have more than this scene. So, if scene is less than the specification essentially it means that your stage effort you can reduce you are unnecessarily putting that additional stage effort. If you find scene is larger than you must increase the stage effort. So, that the currents are as if sizes are higher. So, that essentially then it will reduce the value if your obtained value is much larger as I said you like repeaters put buffers till the scene value is within your. Once your scene value to your specification obtained estimate and simulate delay if it is again too fast increase stage effort a bit reduce scene if it is too slow can improve the topology and if you improve the topology go back sketch the new path and do all this analysis once again. If it is not so then just that what you wanted delay then you estimate and simulate delay once it is this you can improve the topology and if you cannot then you say if you here is a catch at the end if you cannot improve or you do not have any other topology then you say with the kind of topology given the specifications cannot be made. If it is available as I said you go back and redesign till you are done for the required delays. This is how high speed circuits can be designed at the end of the day I may say I may outline now my concluding remarks on the logical effort. Now, what is the importance of logical effort I talked so far it allows to compare alternative circuit topologies circuits are fastest when effort delay of each stage is the same then one should select the number of stages to make this effort about for this is a you know hand waving calculation say typical state effort should be around 4 then then you can see that if the stages are too high or too low this effort can be too small or too large and therefore capacitance value will be too higher or too low. So, 4 is typical number and not necessarily the exact number the path delay is very insensitive to modest deviation from the optimum thus allowing back of an envelope calculation is easy. Please remember path delay is not very strong functions of small deviations and therefore need not worry too much in a normal design and in any case you are going to simulate at the end of the day using all other parasitics available. So, these values are good enough to start with and they may give your run time so small that your design can come much faster. For a given design path this is a interesting formula which we want to suggest F is the stage effort which is logically I said is 4 is the best stage number of stages you should use log G to the base 4 and log H to the base 4 times 4 plus p is the bell design delay path. The logical effort of a gate increases as the number of input grows this is obvious because the input capacitance is larger branching is larger currents therefore required is larger. We should practically limit 4 series transistor in logical gates which is true in any case if you are too many series transistor the size of each transistor has to be increased enormously and the power supply may not be able to provide so much current every now and then or split the gates or split the logic itself into number of blocks. The branch circuit should differ by not more than one gate between the branches better to use 1 2 or 2 3 fog instead of 0 1 fog. Please remember 0 1 fog is always wrong because there is no design on the 0 side and one will never be able to measure 0's delay so there is no question of actually saying that I can use one inverter to make clock bar and one direct input to as a clock or any signal therefore 0 1 fogs can never be used only possible to have equal delays or equal efforts is by using 1 2 or 2 3 fogs. Choosing p to n ratio equal to square root of the ratio which gives equal rise and fall times in a normal static CMOS this is possible pn ratio p to n ratio of 1.5 works in many cases for virtually all processes even up to 28 nanometers. The logical effort quantifies the benefits of different circuit families one can actually compare as the my flow chart shown you one can have number of topologies and number of families one can do logical effort calculations and then verify which of the families or which of the kind of circuits or families or group of families one can work to get the best delay circuit design. Thank you for all that you heard so far once again I repeat this acknowledgement for the people who have helped us to get to this even Sutherland, Braubstrahul and Diharis for their great logical effort I always say I salute them the famous book on logical effort by them is the source of this lecture and I said earlier also I received its first version of book from the website in late 90s before publication thanks also to publisher of the book namely Mrs. Morgan and Kuffman publishers published such a book thanks are also due to Harris Merman of Stanford University for giving first time giving such course and then making available their slides on the website free of charge. Also thanks to Ray Bechandrakas and Nicolene whose book on digital internet circuit design is very often used by us as first course in valet sat design they have a chapter 13 where they discuss this logical effort relatively briefly if I say in the chapter as section called effort you can also go through it as the first rundown to know what exactly logical effort is and of course I am always thankful to my post graduate students at times even undergraduate students of my valet sat design course in last 25 years as they keep asking me very pertinent and interesting queries which actually improved my understanding of many of the areas particularly the logical effort. I will give you at the end afterwards some solved problems or logical effort and then also give you a list of unsolved problems thanks for the day.