 Last time, we were trying to see what is the importance of logical effort and we actually discussed something about how to calculate the propagation delay of a CMOS inverter. And based on that, I suggested last time that we will come out with a new newer and simpler technique to design a high speed circuits, where average propagation delay from input to the output of a circuit is of relevance, which decide the speed of the data flow. And therefore, last time we discussed about actual transistor evaluations and we figured it out that the width of the transistor decides the speed essentially because it decides the capacitance and it also decides the current charging though or discharging those capacitances. So, now today we should start with the logical effort method suggested by Sutherland's problem Harris in their book, which is essentially published by Morgan and Krugman and is available in the market. Please do have a look at it if you can. Now, here we will apply, here we will attempt to apply this method during the course of this circuit that we will look at and we will also look at static CMOS application first. Now, let us start with what we said, we start with a model which we call gate delay model. Now, what it says delay will always be normalized to a dimensionless units to isolate effects of fabrication process because you know the tau which is essentially the minimum size inverter which is driving another inverter with no parasitic is essentially decided by some kind of technology parameter. And therefore, we will actually not use tau as any parameter in our evaluation, we will actually will work only on D which is D absolute divided by tau. So, we say normalized to tau delay is D and that is the delay we are going to work with. Please remember, tau is not no load delay of an inverter, it is essentially an inverter driving another inverter which is identical with no parasitics. Please remember these are some terminologies which Sutherland others have decided. Also, it is not delay of a 1x inverter driving 1x inverter since that includes the delay contribution due to parasitics. So, please be clear about the definition of D, definition of tau and therefore, we will be able to evaluate the actual delay of a circuit which is normalized delay D. Now, delay of a logic gate is composed of delay due to two parts, one is due to the parasitic delay P which we say no load delay and delay due to the load which is effort, we will call that as a effort delay or a stage effort F. So, one is because of the delay due to the load which is we say F and then delay due to the parasitic which we say P. So, the net delay of course, again normalized net delay D is F plus P. Now, this stage effort F which is delayed due to the load can be expressed as product of two terms. F is equal to G into H. For example, given here is F is equal to G into H. Where are these two terms, what is G and what is H is of relevance and we will see quickly what they meant. So, if we substitute in our delay model this term F. So, we get D absolute is F plus P into tau or equal to G H plus P into tau. So, if we want only D then it is G H plus P and that is the actual delay we are interested to evaluate. Now, what is that what are these terms G and H. So, it is defined as G is captures the properties of any logic gate and is called the logical effort which is most important part in all this analysis which we follow now. I repeat it captures the property of a logic gate and is called logical effort. Whereas, that means it is a property of the gate itself. Whereas, the other term H actually captures the properties of the load and therefore, called electrical effort. So, it is external to the gate is the load and therefore, we call any value of load is actually taken care through term H. Any property of the gate through which current and capacitance evaluated is calculated through term G. On the surface of this does not look very different from the model already discussed by us. So, we say logical effort is G H plus P into tau. Whereas, our previous RC models which were very popular as a simple as that is net gate delay is some constant k which represents the ratio of pull down to pull up transistors. So, k times the net capacitance load capacitance plus no load delay. Now, it would help to see how a typical RC model can be used to derive the logical effort model and that is what now I am going to do. Say typical logical effort equation with via RC model is shown here. You can see the pull up transistor has a equivalent resistor when it is it has a two you can see we have R u i and R d i are pull up and pull down resistances of the transistors. Pull up transistor is given P channel kind pull down transistor N channel kind and essentially we say output of this inverter which is the called input to the capacitance which will actually charge the output capacitance C out which is essentially the net load will be parasitic capacitance plus output load. So, net is C l which is C p i plus C out. Now, we are shown two switches both at the towards V d d side towards ground side what it means essentially when the P channel is on in the case of switching of an CMOS inverter then a P channel is on means this switch upper switch is connected and the lower switch is open. When the pull down starts we say the upper switch is open and lower switch is connected so that the capacitor can discharge through R d i to the ground. So, this is typically RC model which we will be using in deriving what we call the values of GNH. Now, I repeat again tau is absolute delay of a 1x inverter driving a 1x inverter with no parasitics. We assume equal pull down pull up ratio which we call R in and C in is essentially the input capacitance of an inverter we say C in so that the tau can be written as R in into C in into K which is actually a constant of fabrication process. So, essentially said RC delay and tau is K times R inverter into C inverter tau is please remember I am once again I am repeating tau is not the no load delay of an inverter also it is not the delay of 1x inverter dying 1x inverter because that may include parasitic delay here we essentially means tau is 1x driving 1x without parasitics. Now, we say a template circuit is chosen as the basis upon which other gates are scaled. The scaling factor is alpha if you see very carefully in the normal CMOS inverter the P channel device is has the twice the size of an N channel device channel length being same for both device. See the width of P channel is normally kept double that of N channel this is essentially this ratio is appearing because we want to have symmetric transitions and to have equivalent resistance on both sides we know mobility ratio of mu N to mu P is 2. So, to compensate for that we actually double the size of P, but we can have any other factor mu may not mu N mu P may not be 2. So, we say that scaling factor we call it as alpha. Now, C T is the input capacitance of the template circuit R T is the pull up and pull down resistance of the template C P T is the parasitic capacitance of the template. Then we say say in now is alpha times C T which is our template capacitance R i which is now input resistance which is equal to R u i or R d i because either of the P channel or N channel will be on. So, the resistance R u i equal to R d i and that is equal to R T template resistance divided by alpha which is through which are scaling and C P i which is the parasitic capacitance is alpha times C P T. So, what it essentially means when you scale and alpha is greater than 1 input capacitance scales up, channel resistance scales down and parasitic capacitance scales up. So, if we see now D absolute from our simple theory then we say it is equal to K R i into C out plus C P c R i R in you can say R in into C out plus C P i net capacitance then I replace R in at C out C P i in this expression from the expression I wrote earlier. So, I say it is K is equal K into R T by alpha which I represent R in and C out plus C P i rewrite as C in into C out by C in plus K times R T by alpha alpha times C P T doing little simpler maths, we say D absolute is K R T C T into C out by C in plus K R T C P T. Now, where written in this form we can see relation to logical effort model now, we have said absolute delay is tau times G H plus B. Now, tau we there we said is K times R inverter into C inverter G is equal to R T C T upon R inverter by C inverter H is equal to C out by C in and P is R T C P T upon R inverter C inverter. So, the definition from this is clear to us that the logical effort term G is essentially the ratio of a R C time constant of your template circuit divided by your normal inverter circuit R in by R time constant of normal inverter R inverter into C inverter. So, one can see clearly from here if you have a 1x inverter that means R T is R in when C T is in G is equal to 1. So, for a normal inverter template and if the normal inverters are same then G is equal to 1, but if the your inverter R T C T is not same as R inverter C inverter G will be higher than 1 and we like to calculate how much is this logical effort G for different kinds of circuits. Please remember here that the H value we are seeing is the ratio of the output capacitance to the input capacitance. What does that mean? It is immaterial that in the chain of blocks of a inner circuit the in between capacitances are not relevance in calculating the H. We are only interested in the input capacitance decide the first driver stage and output capacitance decide the output stage of a that inverter. So, obviously how much current I should provide by through inverter to charge this C out if it is seen is so much and one knows very well that if C out is very large compared to same as in the case of buffer stage then the to provide a large current for a faster circuit to operate. You ought to have a larger current to be provided by the inverter or buffer and if you have a larger sizes of the inverters then obviously the input capacitance being larger because their widths will be larger in both P channel and N channel. So, the ratio of C out to C in essentially is the one which decide the speed at the end of the day. Also we know that P is equal to R T C P T divided by inverter and typically we say P inverter value is normally taken as one if parasitic capacitance is equal to normally one inverter capacitance. Here is some interesting graph shown we know D is equal to F plus P which is G H plus P. So, if I plot electrical effort versus normalized delay D is not the absolute delay, but D by D absolute by tau. So, it is a normalized delay and you see the blue line which is for inverter which shows that since I said inverter has a parasitic delay of one. So, even if there is a H equal to 0 there is a normalized delay D equal to 1. So, P is equal to 1 and H is 0. So, that is the intercept I am showing you on the y axis and as I increase H since it is a straight line could see that the delay will linearly rise with the blue line shown here. And if you see the now from the parasitic delay line at one above whatever is the additional delay you are getting is essentially because of charging the additional capacitance of the load and that is taken care through what we call electrical effort and that is the delay essentially called effort delay. More complex grade have a greater logical effort and more parasitic delay and that is what we are going to see. So, if you see equivalently another device though we will prove this point if you have a two input NAND gate it can be shown that it is parasitic delay is twice two inputs stand for one for one each input. So, a parasitic delay of normalized delay is two and it has a slope which is different from inverter. This is most important because in a NAND gate we will see in a static case at least your two inverter chains two N channel and P channel transistor in series and their net capacitance will be larger. And therefore, even at a lower logical effort electrical effort we may have a larger delay we will see into this very soon. So, now having given you some introduction how what is the logical effort terminology we are going to use. We will now look into logical effort estimation for different gates which is very important because at the end of the day if I know my G then if I know my H I will be able to calculate what is my delay and delay is essentially is related to speed by one upon delay is speed and therefore, we are interested to know what is the logical effort G as well as what is the net electrical effort H which is C out by C N. So, let us look at a logical effort in inverter versus a NOR gate. When I say I am having a standard gate or template gate which I am going to decide with which I will scale everything. So, here is on my left here is a standard CMOS inverter where width is one and here P channel width is two assuming channel lengths are same which is has to be true in a technology. So, we do not say W by L we only say width has twice and width is one and one can see this I just now said if we compare this inverter with itself then the G which is the ratio of R C by the two will always be same and therefore, we say G is equal to one. However, here is a NOR gate shown to you. You have two P channel devices in series and you have two N channel devices in parallel. This stands for a simple static NOR. Now, the idea in all derivations is following. If you have a series combination parallel combination the minimum size transistor width one is the minimum. So, you keep each transistor of the minimums this which is then equivalent to of this case because in the case of NOR there will be either this branch operating or this branch operating. In either case therefore, you need at least one equivalent of that. However, if you look at the P channel device since there are two transistors in series and if I want to make equivalent of W equal to two obviously, I must make each transistor double that of normal P channel which is four times W is four here W is four here series combination therefore, will give a equivalent resistance of only equivalent of two and therefore, now I have a situation when I have two P channels with width of four each in series and two N channels in parallel with a width of one. For each input A and B, one can then evaluate capacitance which are as I said you already proportional to their widths. So, we can see from here the for a input the capacitance is proportion to four units for a input capacitance is proportion to one unit. So, obviously one can say the net capacitance is due to input A is four plus proportion to four plus one. So, we say it five and if you see a normal inverter here the net capacitance will be proportional to two plus one which is three called template capacitance proportionality three. So, ratio of this to this is therefore, five by three for each input this is five by three for B also it will be five by three. So, essentially if you say it five by net delay or net logical effort for a nor two input nor gate is twice that of five by three or which is ten by three. Now, we look into other similar simpler box for example, here is a by logic of same we can see here is a complex gate which has say three input A B C. Essentially what we are doing is this is a actual function which you are implementing is A plus B C bar is the complex function which we are implementing here. So, we see for A plus B C bar function B C are in this is in sorry A dot B plus C I am sorry A dot B plus C bar is the function implemented. Since it is a B plus C two are in parallel and for a input A which is here in series to that. Now, interesting feature is you can see from here since each template has to have width W and since these two transistors are in parallel. So, obviously at least each arm should give you W equal to average equal to one which is essentially capacitance on one each line should be one, but one can see from here since these two will be in series or these two will be in series. So, if I have to make n channel chain as with equivalent of W one these two at least should give in series combination such that the net W is one. So, two and two must be kept. So, that parallel series combination of two and two will give me W equal to one similarly two and two here will give me W equal to one. So, now I say A will have width of two B will have width of two and C will have width of two. So, that this is equivalent of a pull down chain of W equal to one. If you look at the P channel pull up area we know it is opposite of this occurs B and C will be in series and A will be in parallel to that. However, since you need W to form ground to the power supply to the ground. So, obviously from power supply to the output you need at least each chain each arm this and this should give me equivalent of W equal to two since A does not have anyone in series or in parallel A in series. So, we say A must have a width of two whereas C and D B and C should have width of four because they are going to be in series. So, four and four in series will give me two. So, path is either this two or this two to the output. Now, if I see per input as the logical effort if I evaluate I say A has this two and this two. So, it is four this is of course, three. So, I say two plus two four by three is G logical effort for input A by similar logic logical input logical effort for B and C you can see from here now this is four and this is two. So, six. So, six by three is two. So, logical effort due to input B and C both because they are similar we say it is two and the net logical effort of a such a complex gate will be two plus two plus four by three. So, intuitively result what we are saying the worst case G of a complex gate is always higher than the nine two or a not two gate. The value for logical effort G depends on what gate is chosen as the template. In our case we have chosen two is to one inverter as our template gate and therefore, we say called as G is equal to one. However, please remember this is simplicity which we have made us to think that inverter is equivalent of that, but otherwise you can choose any other template gate and we will all that we will have call as some value G for that and with reference to that we can scale other circuits or we find what is the additional effort coming out of that and you will get another library of function with different logical efforts. So, what essentially G is doing in my opinion the G value captures the effect of varying number of inputs A, B, C and transistor topology which is a series combination or parallel combination and compared to your template gate. More complex gate will require more logical effort to produce the same output current as the template gate and will also present a higher input load. The logical effort for one X nine two, two X nine two, four X nine two are all the same. However, the effect of extra load by the larger transistor is captured by electrical effort parameter, but not by G. So, NAND will have whether it is two X NAND or four X NAND X strength, it does not matter whereas, the value increase of size essentially in increase the input capacitance or and also the output capacitances and therefore, the ratio of that will decide through the electrical effort parameter H. So, we have repeat electrical H parameter is used to capture the driving capability of the gate via transistor sizing and also the effect of transistor sizes are loading. Therefore, we say H is equal to C out by C in, where C out is the load capacitance and C in is the input capacitance of the gate. Please note that H for gate will reduce as the transistor become wider since C in increases C out is assumed fixed at the output. Now, there is some definition of parasitic delay P. Note that parasitic delay P is constant and independent to transistor size as you increase the transistor sizes the capacitance of the gate source drain area increase also which keeps no load delay constant. So, how do we measure P? So, we must find out how to monitor P, but once known we can then calculate tau very easily. So, we have one, we have two methods to suggest method one says you have a circuit shown here 1 X inverter driving another A is this inverter B is this inverter. So, delay for A is G H plus P into tau and since inverter has a logical effort of 1 since they are 1 X 1 X. So, the output capacitance input capacitance are same. So, H is 1. So, I have 1 into 1 plus P into tau that means tau is A delay upon 1 plus P. Now, I take this circuit for the other one C delay is similarly G H plus P. So, it is since it is 1 X driving 2 X we say it is 1 into 2. Please remember the capacitance here will be different H will be larger because this is 2 times the capacitance is only 1 times. So, H for this C element is 2. So, 1 into 2 plus P into tau. So, C delay is 2 plus P A delay by 1 plus P. Now, if we then use those 2 equations A delay and C delay we say P is equal to 2 A delay minus C delay upon C delay plus minus A delay. So, I can monitor by monitoring actual delays in a normal circuit and then evaluate the parasitic delay. Oh sorry, this slide should have come earlier, but does not matter I will just repeat this is a case of a NAND gate and we can see how the logical effort of NAND gate is different. This is my template gate this inverter which has w equal to 1 w equal to 2 and therefore G is 1 and now I am having NAND gate in which 2 inputs are in 2 N channel devices are in series and 2 P channel devices are in parallel. Since 2 P channel devices are in parallel and we need at least w equal to 2 for either of the reaching from V D to the output. So, each can be only 2 because then it can provide you a path either A or B can I worst case will at least be 2. Therefore, I have a path to reach here with 2 itself is sufficient whereas, for N channel transistor make this equal to 1 the 2 N channel transistor should have double the size 2 2 and obviously, we I repeat the logical effort essentially is the input capacitance of NAND gate to the input capacitance of the inverter or template gate. So, if I right now I see for any input I have 2 here 2 here. So, 4 and for the template it is 2 plus 1 3. So, the logical effort per input of NAND gate is 4 by 3 for similarly for B it will be 4 by 3. So, the net will be 8 by 3 for A 2 input NAND gate. Now, many times how many times I should do this to say execution time we may actually do one time of set of this kind of calculations compute delay changes and then we need not repeat again and again. So, here is a method 2 a better way to measure P and tau is something like this I have a 1 x inverter driving 1 x then I have 1 x inverter driving unknown x and there is another last current is 1 x again. So, what I do is assuming that through this we get a realistic in wave form because there will be delays associated with 2 inverters. So, there is something we get some wave shaping as if equivalently done and you get some realistic wave form shaping here. Having received a input here I say I it passes through 1 x which we call is the device under test and we keep varying this G 3 gate this of course, the output is fixed. So, no Miller effect should be passed should be allowed through G 3. So, we fix this and now we only vary 1 x 2 x 4 x 6 x 8 x. So, that the C load ratio is G 3 by d u t whatever is the capacitance here divided by capacitance here. So, we keep finding out that ratio and once I have a number of such measurements done I can then plot this versus delay number of capacity ratio we can say and one can start looking for 1 x 2 x 4 x kind of thing and put a. So, when there is no load delay that is the delay and if you have the load value increasing to the inverter size 1 2 x 3 4 x 8 x the delay will start rising and from this I can then evaluate by extrapolation I can evaluate p I can evaluate tau and therefore because a straight line m x plus b. So, I will be able to derive both p and tau in one go. Continuing with our effort to once I know how to calculate G I know how to calculate h I also know how to know p. So, obviously I am now roughly know what is f which is G h and how much is p, but let us look how to evaluate G for number of gates of different kind and what is their logical effort. So, here is another interesting circuit all of you are aware that an XOR gate essentially is consist of a function which is a b bar plus a bar a bar b. So, to implement it you need 2 inverters to create a and a bar which are essentially present therefore to do and then you need 2 AND gates a b a bar b and a b bar and a OR gate. So, to make a b bar plus a bar b as the function. So, obviously if you see a AND gate you need as many transistors for that number of inputs and for a C mask each will require 4 and for ORing you will require further 4 of them. So, by using this kind of hardware which I shown here I will be able to reduce the number of transistors from let us say 12 to 8 only. So, let us see what is the way circuit is functioning there are 2 n channel devices here and 2 p channel devices here there is another arm whose outputs at the gate out both outputs are connected as the output of the XOR and below is your another 2 n channel transistors another 2. To make it this the first 2 have inputs a and b and the other is a bar b bar. On the p channel we just do the otherwise we say it is we keep b as it is only on the upper side now. So, it is a bar b and it is a b bar here a is lower a must be connected both side b is other side b either going to b d d or going to the ground. By using this simple method we start looking into values now since there are 2 p channel devices in either case. Now this term gamma which so far I have not discussed now may be I will tell you what is my gamma. The gamma essentially if you see the resistance of anything resistance is essentially proportional to rather the width rather width to the power minus 1 of the transistor. So, one can see from here that if you see p channel device p mass in your current equation you get a term mu p and in case of n mass you get a term which is for n channel mobility. Right now I assume v t p and v t n are equal if they are not equal there may be another issue. So, if you see the ratio of resistance in the 2 case and if I want r p mass be same as r n mass. So, that my timings are universe both side equal transient of rise and fall is same then p channel width must be in a ratio of mu n plus mu p into w n and since typically I use mu n by mu p as 2 in most surface mobility ratio typically is around 2. So, we see w p is twice w n that is the number 2. So, we say gamma is essentially mu n by mu p ratio. However, this was under an assumption that v t p is same as v t n. If v t p is are not same as v t n gamma will be a ratio of r n mass p mass to r n mass sorry width related to p mass width related to this and then it will be taking care of both variation in thresholds as well as mobility and it need not be exactly equal to 2. So, for a generalized case instead of using gamma as 2 we can have more generalized term which is called gamma which normally can be used to, but need not be f p channel n channel devices are not identical in the thresholds and then mobility ratio is also not 2. Typically I repeat gamma is 2 for most cases in all our designs. However, this is not true all the time. So, due to this we come back to our this. So, look at the XOR now since there are 2 p channel devices each should have normally we need a equivalent of 1 gamma there. So, we say to make series combination equal to 1 gamma we must each of them must have a size of proportion of 2 gamma 2 gamma here. Similarly, these 2 chain p channel chain should have 2 gamma 2 gamma in series which will give series combination of gamma whereas, here I need for n channel w proportion to 1 ratio of the series of this should be 1. So, I see to make series 1 I must have 2 2 here same way series combination for n channel here should have 2 2. So, now, I see once I put these 2 2 and 2 gamma 2 gamma 2 gamma then I can evaluate equivalent effort compared to the well template inverter we say for we can see for here for input A the capacitance sorry or logical effort per input is 2 plus 2 gamma 2 plus 2 gamma for A or this 2 plus 2 gamma divided by 1 plus gamma which is 2. So, you have a logical effort per input is 2 then we define bundle A and A bar are identical in some sense because they are complement to each other. So, we say both logical effort for A and A bar together is 2 plus 2 is 4. Since there are 2 more inputs B and B bar and therefore, bundle is B B bar which also will have the identical logical effort of 4. So, we say the net logical effort is 4 plus 4 8 which can be proved from here directly 2 plus 2 plus 2 plus 2. So, it is 8 plus 8 gamma divided by 1 plus gamma is equal to 8 and therefore, logical effort of a XOR gate is very high which is 8 by 3. Look at this another very popular combinational circuit is multiplexer a typical multiplexer chain is shown here it is n-way multiplexer 1 2 3 4 n of them. The first two first multiplexer chain shown here has a control input of s 1 s 1 bar is 2 is to 1 and therefore, it is this is s 1 s 1 bar is selecting the 2 and this is your data flowing through this the p channel and n channel as data d 1. Similarly, there is a data d 2 then there is a data d 3 and all the outputs are connected. So, we were like to see first d 1 going then d 2 going then d n going as select s 1 s 2 s 3 s 4 s s n s n bar are actually impressed. Now, let us see what I am talking I am now saying that these these central 2 transistor n channel p channel series combination the point center is there output is now driven by select signal s 1 and s 1 bar. Whereas, p channel n channel forms a standard inverter CMOS inverter has an input d 1 since there are 2 in series p channel. So, if to make it equal and template timing. So, 2 gamma 2 gamma for n channel it will be 2 plus 2. So, if you see per chain per this vertical chain you can say 4 plus 4 gamma divided by equivalent of a standard gate is 1 plus gamma. So, 4 plus 4 gamma upon 1 plus gamma is 4 and if there are n such chains multiplied by n this into this into this into this the total logical effort is n times 4 plus 4 gamma upon 1 plus gamma or 4 n. If you are interested to know per input you can see 2 per input is d 1. So, it is 2 plus 2 gamma divided by 1 gamma therefore, it is 2 only. The another interesting circuit which we use in the combinational block which is essentially used in many communication circuits essentially is the creation of parity blocks or we say detection of parity or one may say detect may be parity can be generated. Now, previous gate can be generalized that the XOR gate I have shown we know parity gate is essentially series of XORs. So, if there are previous gate can be generalized for n inputs we said 2 to the power n minus 1 pull down chains if there are n transfer in series each of width n then there will be 2 to the power n minus 1 pull up chains n transfer in series each and has a width of p channel device pull up will have width of n gamma. Now, the total logical effort 2 n minus will be equal to 2 to the power minus this. Now, I can show you how I do it. Let us take I have the circuit which has 3 input XOR then we can say that pull up chain that is p channel chain will have at least 3 transistors. So, there will be 3 transistors 3 p channel transistors pull down chain will also have 3 n mass both in series. Now, if you say pull down means n channel. So, you will have a size since there are 3 in series and if I want to make equivalent of 1. So, I must make if I want to make it 1. So, 3 3 and 3. So, the combination in series will become 1 which is equal to 1 and similarly the other p channels will be this is power supply for this this is output out. So, for this I must have 3 gamma 3 gamma 3 gamma. So, that it gives you equivalent of a gamma. So, once we declare this then they for the logical effort per chain of the when they are in connected here. So, one can see it is 3 plus 3 gamma divided by 1 gamma that means it is essentially equal to 3 3 plus 3 gamma 3 plus 3 gamma 3 plus 3 gamma divided by 1. So, g input 1 input is 3 and say there will be 3 such n for 3 inputs. So, 3 into 3 is total logical effort. So, it is 9. Now, if you see generalized parity gate which is shown here this is essentially what I was just talking. Now, this figure you can see from here if there are 3 inputs a b c you require 2 to the power n minus 1 chain 2 to the power n minus 1 n minus 1 is 2 to the power 3 minus 1. So, you have you need 4 chains. So, that is most important. So, if you now calculate for individual a b c a bar b bar c bar a bar b c kind of each of them the total logical effort you can say 9 by 9 gamma 9 plus 9 gamma divided by 1 gamma which is 9 and multiplied by 4 chains it is 36 effort per bundle for each input a and a bar together. Obviously, it will be 1 third since there are 3 inputs. So, it will be 12. So, one can evaluate a logical effort g for n 3 input parity gate which is shown here and one can see that the logical effort is extremely high which is 36. Now, larger the logical effort what does that essentially means that means if the logical effort is larger the delay is going to be larger. Because we already said the delay is essentially the delay not necessarily absolute delay, but delay normalize delay is g h plus p. So, obviously, if g is larger the delay is larger. So, if you have a large chain of saw gates to create a parity generation or parity detectors the effort required is very very high. However, in our all VLSR design one of the feature which we keep looking at is how to minimize the delay or how to improve the speed. To improve the speed h of course is slightly not in your hand many a time because it is external to you output loads p of course is decided by the input number of inputs we use in this case is p is 3. So, may be since you have 3 inputs. However, g may decide g is 36 please remember how much delay I am going to have. So, can I reconfigure such gates into this is called symmetric gates why it is symmetric you can see from the circuit the upper part this is symmetry and also this part pull down chains and exactly identical. Now, I want to see can I do little metric. So, that I can do a symmetric connections and still implement the same logic. I will come back to it little later, but I just say that can I reduce this g itself from the same blocks by reconfiguring the hardware itself. The another gate of interest which is used in many combinational blocks is majority gate. Now, what is a majority gate if the majority among the inputs is high the output is high if majority among the inputs is 0 the output is 0 is called a majority gate. A typical majority gate symmetric gate is kept here you have 2 n channel here corresponding 2 p channel here you have 2 n channel here 2 p channel here 2 n channel here 2 p channel here and any series combination for making it 1 should have 2 2 any series combination to have gamma you should have both p channels in series should be 2 gamma and I give 3 inputs a b c of which the majority I have to test you can verify this whether out of when a and b when a and b are 1 you can see from may be 1 can quickly test if a and b is 1 both these transfer switched off c is 0. So, obviously this transfer is on this is of this is because receiving a. So, this is off this is on fine, but does not matter because this is off if you see the lower down transistors since c is 0 this is off. So, ground cannot be provided. However, since c is 0 this is on b is 1. So, this is off. So, this chain is off however if a and b are 1 these 2 n channel devices are 1 and this output is pulled down to 0 this output is pulled down to 0. So, essentially invert is we are acting if a and b is 1 the output is complement of what we are expecting. Whereas, if a and b b and c are 0 correspondingly 0 may appear through the this chain and if sorry a and c then it will appear through this chain. So, we can see if all 3 are 1 if all 3 are 0's output will remain high because of either of the chains and therefore, we may say it will not able to decide the output. If you see the logical effort for per input 2 plus 2 gamma by 1 plus gamma which is 2 if there are 12 transistors 12 in this 12 plus 12 gamma is 12 per input 12 by 3 which is 4. So, a typical majority gate has not very large, but relatively large logical effort. Another circuit which is used in data paths is adder carry chain. One stage of ripple carry chain is an adder total logical effort you can see from carry chain. If you have seen any ripple ripple carry chain adder circuit the carry is given between these 2 p channel devices g and these are propagate and generate functions and one can see from this since there are 4 transistors here 2 plus 2 gamma by 1 gamma is 2. So, logical effort for input c n is 2 plus 2 gamma is 2 logical effort for effort for input g input g is here is again 1 plus please remember now that is the catch in this this is going to be 1 this is going to be 2 because this is only one inverter. So, 2 plus 2 1 plus 2 gamma divided by 1 gamma is the logical effort due to g by similar argument logical effort due to the input k this is 2 plus 2. So, 2 plus gamma divided by 1 by gamma. So, the next logical effort which is sum of 2 is 5 plus 5 gamma divided by 1 gamma and therefore, the logical effort of such carry chain adder carry part of that is 5. Another important circuit in our analysis of any sequential blocks or designing any sequential of either latch and this is a dynamic latch which is essentially created due to the 5 5 bar as the clock phi is the clock and phi bar is the clock bar and one can see here if D is the date dynamic this when phi is 1 phi bar is 0 both transfer is on then the data actually as if they are shorting that and therefore, only D will decide the output if D is 1 the output will go to 0 if D is 0 output will go to 1 as normal. However, if phi 0 and phi bar is 1 which is clocks 0 then this transfer is switched off because input is 1 this transfer is switched off because input is 0 the both power supply and ground lines are disconnected from the output and therefore, it retains whatever is the last state which is what latch wants. So, this is essentially interesting dynamic latch which shown here one can see from here the logical effort since there are series combinations of N and P 2 and 2 gamma is the requirement. So, the logical effort of this is 2 plus 2 gamma by 1 upon 1 plus gamma which is 2 for each input therefore, it is 2 plus 2 is 4 logical effort of bundle of course, is 2 and therefore, one can see that total logical effort is 2 plus 2 which is 4. There is another element which is a very important in the case of asynchronous blocks and I may actually show you this a molar element C essentially is something like this I may actually look at this figure correctly again and I may say what exactly I am saying now. If molar element function can be written as something like this if a is equal to b equal to 1 then output f or rather here f is in case our case is c the output c is essentially 0. I repeat if a and b are 1 the output of the latch is 0 if a is equal to b equal to 0 then c is 1. I repeat if a and b are 1 the c is output is 0 if a and b are 0 then the output is 1 however, if a is 1 and b is 0 or a is 0 and b equal to 1 then whatever is the output present state of c will be same as last state of the c. This is essentially what latch wants. So, if we have a a and b as 1 please remember there is no clock the data itself decides whether to latch or not to latch. So, inputs are 0 is latch to 1 if inputs are 1 is latch to 0 it goes to 0 I can set and reset the output of a latch and it can retain the data if a is 0 and b is 1 or a is 1 and b is 0. So, this essentially is we say it is called Muller's element and on your right I have shown you is equivalent circuit as shown here. It has 3 AND gates 1 OR gates each of them this is 2 input 2 input 3 AND gates and 3 input OR gates will actually create what essentially I am now talking as latch circuit Muller's c well I already said. So, I am sorry I am just repeating once again and you can see another circuit possibility is this kind of equivalence which I have shown you from the schematic to this you are 2 n channel 2 n channel p channel devices there is a 1 inverter going out and this is of course, 1 can say retaining the data this is like a latch action from here this is equivalently saying that. So, if I see a typical Muller's element requirement you have 4 transfer series again like 5 5 bar you have 1 input connected to a and the other input is connected to b and now you can see if a is 1 this chain will operate a is 1 this chain will operate, but if b is 1 this will go to 0 because this will operate and this will be pulled down to 0. If a is 1 and b is 0 in that case a is 1 means this is operating this is operating b is 1 means this is not operate sorry b is 0 that means this is not operating, but this is operating. So, 1 will be transferred. So, this is essentially Muller's c. So, one can see Muller's c is a latch which is used without a clock and therefore, extensively used in most of the asynchronous circuit which is one other area of great research in the digital design because once you say that you have asynchronous circuit please remember asynchronous circuit is not always the only data driven it can be also clock driven, but may not be synchronized to the local clock which can be therefore, slower clock or a and b could be slower. In that case the since the frequency relative frequency smaller than the synchronous clocks then we say the power will be minimized in most cases. So, asynchronous circuits are normally preferred where power down is required. Therefore, there is a method of designing a circuit nowadays in a larger system can be designed into two parts or two kinds of circuits synchronous and asynchronous and general trend is to say globally asynchronous and locally synchronous gals as the word goes since gals will require latches in asynchronous part. So, I just showed you a dynamic Muller's c element which we can evaluate its logical effort and if there is a n type of this an input n plus n gamma of this and in the number of inputs then you can say it is n square. So, our logical effort per input is n. Now, this is what I was telling I have brought it little late, but does not matter I have a typical circuit which I showed input 2 chains circuit I have shown you. If you see earlier one has a 36 you are 3 x or 3 input xors and now I can reorganize 3 input xor to give us what we call as lower logical effort. Please input 3 input xor as a logical effort of 36. Now, I like to show you that the way I have organized is since something which is not related to one of the kind of paths one can actually remove them from the common points. So, what we do is one can see here of course, this is c bar please check it this is c bar and not c. So, one of the chains in xor is a bar b bar c bar in series. So, they are connected in series and brought to the output and correspondingly it is a bar b bar c bar is coming from the top. The second chain one can see from here a bar b c going to the output then one can see from here a bar b and c coming from this side to this output. If you see the other chain you have a b bar c and then you can say a b bar c. You have the fourth chain is a b c you have the fourth chain is a b c a b c. Since you have a logical effort of 36 there, logical effort of 36 4 into 9 4 chain into each of them is 9 3 plus 3 gamma 3 plus 3 gamma 1 gamma 3 there are 3 such inputs on 9 and since there are 4 vertical chains. So, 36 was the requirement there logical effort. Now, we see by actually separating the two not you know this point is common only to these two circuits c bar coming from here sorry c bar c this is also c bar c bar coming from here c coming from here. Now, you can see for the single chain you have 3 gamma plus 3 gamma 3 plus 3 gamma is by 1 gamma is 3 you can see now you have please remember I 3 here 3 here. So, for each bundle you have a for effort of bundle. So, you know a is not required everywhere you can see for a bar only this chain is required unnecessarily you do not have to go through a a bar on this chain. The idea is if those part do not contribute to the output for a kind of input to go we may separate them as a parallel combination slightly separately connected and by using this effort for bundle a a means a bar and a together. So, you can get through one of this one of this. So, 3 plus 3 plus 3 3 plus 3 only 6 whereas, for b which is coming everywhere you can see 1 2 3 4 times. So, it has a logical effort of b is 12 c is only coming twice and therefore it has logical effort of 6. So, total 6 plus 6 by 12 24. So, we have saved a logical effort by from 36 to 24 by proper connection and since now this is not symmetric in any of them for this called asymmetric design. So, asymmetric has an advantage over symmetric in some sense that it has a lower logical effort. Similarly, I had done it for majority gate proper connection for here also the logical effort has been minimized from 12 to 10 larger the number it will be further reduced. You can see whichever is not connected in the chain does not appear you can say here there are 4 here here there are only 3 here there are only 4. So, we can now reduce not all input reaching all transistors and therefore the logical efforts becomes you can see from here this chain is going to here and then to here whereas, this chain is only here whereas, this chain has a 4. So, we can see here the logical effort per input for a is only 2 b is 4 and c is 4. So, you have a net logical effort of 3 10. Having seen this logical effort evaluation at the end of the day please note that our ultimate aim of any circuit is to evaluate the net delay. Let us say these are blocks data is flowing in number of some logical blocks and there is a formal c load which is sitting here. Here is an input capacitance which is c in block 1 block 2 block 3 and now I want to see when the time taken or delay taken for input to start from here and reach an output here delay. Now, one can see that this c in and this block cannot be changed at the input this as an output external capacitance c l much not much can be varied here. However, one can play the sizing in this and the paths in this for example, you may have a smaller number of gates in a particular path. The delay is essentially decided in a path by which is the slowest we call the critical path. So, we will be able to derive to improve the critical path or to find the critical path by actually sizing number of transistors in these two blocks for this may not be because this is an input driver. So, using this we will be able to actually design a circuit which has the least path delay. However, before we do this we must now figure it out what a how to estimate the delay taken from here sorry from here to here. If I know this delay I can estimate for a given blocks then I will be able to actually optimize the path delay and that is what the all designer that is what essentially logical effort logical efforts effort is how to minimize a delay. We know delay already I said the delay is is equal to g that is which is called the logical effort h which is called the electrical for CL by CN plus of course, the parasitic delay net. So, I would like to see in each block what is the g product and then we will say add to that pay and then see what is the net h and evaluate the delay. So, let us see this technique to evaluate delay. So, let us take a simple ring oscillator example we are now observing estimate the frequency of an n stage ring oscillator is my problem. Now, each inverter in my chain if you see each inverter in my chain has a logical effort of 1 we have said inverter has a logical effort g equal to 1. So, g is 1 electrical effort is essentially since the all inverter see the same equivalent inverter. So, c out for each of them is same as c n c out is same as c n therefore, the electrical effort for each is c out c n since it is an inverter each inverter give 1 parasitic delay which is this. So, stage delay essentially is g h plus p which is d which is 1 into 1 plus 1 which is 2 and then the all inverter frequency will be 1 upon 2 d 2 d means twice of that because you know at times 1 will go next and 0 will go. So, you need twice the timing and n is the number of stages through which or 2 n number of stages are equally required to create the maximum oscillator frequency. Of course, it depends on the even out numbers as well. Then all the design of a digital block it is a common practice to design a delay for the load which is equivalent of 4 times the inverter load and such circuits are called 4 fan out inverters loads or which call f o 4 fan out of 4. So, here is an example this inverter is our driving inverter and these are this is the our load inverters a load together putting into 4 of them. For this inverter I want to know what is the logical effort delay it will create we know logical effort of this inverter g is 1. However, now the electrical for each will give you 1 c 2 c 3 c 4 c input is 1 c. So, the c out by c in is 4 since you have only one inverter here the parasitic delay is 1. So, the stage delay is g h plus p 1 into 4 plus 1 which is 5. So, for an f o 4 an inverter has a delay of 5 for an f o 4 load. Now, here is an interesting example which will verify what I am all that time talking about what to how to get the day path delay total path delay. Here is a this circuit of course, is the salt circuit in ashram both in the sprawls otherland book. So, I have no credit is taken anywhere please remember most of the example solved in my present day I will give some unsolved and may be solve few of them next time, but at that time please remember this data is also taken from sprawls book sprawls otherland Harris book. You have an inverter driving a 2 input NOR gate which 1 of the output of that is 1 of the input of a 2 input NAND gate and the output of this NAND gate is inverted through an inverter here and is driving a load which has a capacity load of 20 units. Let us say this inverter has a input capacitance of 10 units. So, now let us calculate g and h for each of them normally one the method one we found out is we can calculate c out by c in c out by c in c out by c in that is h for each of them. However, right now the net h if you want to know how much is the electrical effort all that we are interested in is c l divided by c in which is essentially 2 in our case. Why I am saying you because if you see h 1 then it is say let us say this as a dimension of x this as a capacitance here x. So, the ratio is x by 10 if it is y here y by x if it is z z by y and if it is here 20 by z. So, if you see then it should be it should occur as x by 10 into y by x into z by y into 20 by z should give me h which is equal to 20 by. So, this is what essentially you are meaning, but I repeatedly telling you do not define h as product of h 1 h 2 h 3 h 4 this is not true. Not to be defined like this h is a same net path effort is output capacitance divided by the input capacitance c out divided by c in across the path do not go piece wise. However, if you look at the logical effort part this as a logical effort of g 1 this as a logical effort of g 2 this as logical effort of g 3 this as logical effort of g 4. Since it is a NAND gate it has a logical effort of inverter it has a logical 1 this is 5 by 3 this is 4 by this is a 2 input NAND. So, it has a logical effort 4 by 3 inverter again now the way we define the net logical effort I already said I have started with inverter as my template. Now, for this this is fair enough. So, it will be 5 by 3 with reference to inverter, but this is not receiving only inverter. So, we say with reference to this for this you must now see 4 by 3 into 5 by 3 1 is the net logical effort to come here from input to the output and if you go here further multiplied by g 4. So, we define the total path logical effort g capital G is the product of g i that means g 1 g 2 g 3 g 4 in the path. Then, we define g h is the capital G into capital H is and please as I keep saying do not do this I keep saying we can just write please do not write f is of course, f is f 1 into f 2 f 3 but instead of all of them you actually write the net path effort f is capital G into capital H as I do not want to define h as product of h 1 h 2 h 3. Now, let us do this calculations for the values which we gave sorry you look at it the actual g if you see for this circuit the capital G is g 1 g 2 g 3 and g 4 which is essentially 1 into 4 by 5 by 3 into 4 by 3 into 1 which is essentially equal to 9 by 9 sorry 20 by 9 I am very sorry 20 by 9 is my capital G 20 by 9. If you see the capital H which is c out by c in then it is 20 by 10 which is 2. So, f the path we can say f path effort this is the net path effort g h is g times h which is 20 by 9 into 2 which is 40 by 9. Why we want to do this because if we use this expression I have no idea actually how much is the value of x y z. So, then I will never be evaluate h 1 h 2 h 3 h 4 product and find h and something like this. So, the method I suggest is do not always evaluate h 1 h 2 h 3 initially you only see the capital H which is the output capacitance by input capacitance and evaluate g's for the gates in the path the red bar shows is the path please remember the other input also plays at the output values, but it the path is essentially shown as the red line. So, I can therefore calculate the path effort as 40 by 9 for this above circuit. Now, there is another interesting thing which will see later if there is a branch that one inverter grabbing the two such inverters how do we calculate g h and once we do this for example, in this case h for this is 90 by 5 and this is also 90 by 5 though this inverter capacitance is known to me 15 by 15. So, if I want to know the g for each path this is 5 this is 1 into 1 that is 1 and h is 90 by 5 which is 18. So, g h is 18 on the either path. So, the effort for each path is 18 as such because g is 1 and capital G is 1 1 into 1 is 1 and capital H is 90 by 5 which is 18. So, the branching effort also each branch has f value of 18 I think before we start ahead of this I think we will stop today here and we will continue from here next time and then I think we will now go a little faster on the rest of the circuit having explained you the basic principle of actually getting the delay across a path how to evaluate g how to evaluate h and once we do this we will be able to quickly figure out the net delay and alternatively if we know the multipath delay we will be able to calculate individual h values and therefore, individual stage delays and once we get that we may be able to actually design the input capacitance of each block in the path and that is what essentially the logical effort is going to do.