 So, I start with as he said logical effort, what exactly I am talking about all of you are aware these days that there are three kinds of circuits which most industries fab and we use depends on the kind of circuit you are using the design is different. For example, there are three kinds which I am talking is one is called high performance circuits essentially performance relates to speed, speed means the delay time in the whole logical path and these circuits requires very high frequency system clocks typically few gigahertz and above and these are specific designs which require high speeds and there are designs which requires extremely low power. So, in those designs maybe speed is given up, but at the cost of speed maybe you can reduce the power consumption itself these are mostly mobile systems hand held systems PDAs, mobiles all these systems require extremely low, but there is a third class also available which is called low standby power circuits which essentially like a mobile phone when it is not in the on mode it should not consume power and these are three different classes. So, this part of my course which I am teaching here is essentially for the high performance circuits I am improving the speed. So, I am not bothered about power as far as this part is concerned doing real life I may have to keep optimizing power area and speed anyway, but that assuming that I have enough power with me and area is not that constrained I am only trying to play game on the logical paths where I can improve the speeds. This lecture will start with few things I will give some introduction partly I have already given then I will talk about logical effort estimation for gates, forks, amplifier chain of stages, branches and maybe if time permitting I will also cover some logical effort for different circuit families which Professor Sharma has just talked about. I do not think it will have that much time for me, but in case I have or I may actually refer to the book which you can have more details and finally I will give some remarks which is my concluding ones. This is very interesting figure which I picked up from Harris his talk chip designers face a bewildering area of choices which partly I said something of that what is the best circuit topology for a function. How large should be transistors W by L should be how much and then how many stages of logic will give the least delay. These are three bewildering in the sense when you start designing these should bug you before you actually start designing. So the logical effort is the method which answers some of these queries or some of these questions. The best part of logical effort is it is like some kind of what word I always borrow from my colleague Professor Sharma back of the envelope calculations. What do you mean by back of envelope means the calculation should not take more than few lines and any place you can quickly do some calculations. People always believe that in course or in real life the circuit can always be designed using standard simulators available like a spice or a spice or whatever it is. But it is not proper to start using simulators without having knowledge what actually you want to design at what specs and if you are given those specs and if you are to design something some basic design has to be started by you which can be modified or optimized for actual performance what you are looking for. So in some sense basic understanding of the logic design itself helps you to design much better and in a shorter time you can get optimal results and that is why this kind of effort which I am doing and which I thought when I first time heard about this course I had happened to be in Stanford at those days maybe 90s, 92 or something and I met by accident Dr. Harris who was that time part time teaching at Stanford this course and I happened to be very much impressed when we were having a lunch together with my another colleague who brought me there and then I realized that this is the method which is very powerful at least for those who want to do circuit designs or logical designs. People who only want to do what we call IP based designs maybe this may not be that relevant but people who are creating IPs themselves this is a relevant talk for those people. So what is the most important part in this logical design system which we are propagating is it is very simple model of delay we are used as just now said back of the envelope calculation and very simple optimization possible which is tractable one can find which parameter I am optimizing and how much should I actually it is not that we are doing something new if you are using Ray based book or any other book of let us say Neil West and Ashrangian or any other book the methods are same you know it is just given like you know old wine in a new bottle kind of thing but it is interesting and you will see what new interesting can be done by simple ideas and that is what I say when I talked to Harish and he said it is method suggested by Bob Sproul and Iwan Sutherland they were both earlier in Cadence and also in Sun Systems and he is Sutherland still in Sun Systems his vice president. So this method appealed me very much and then I discussed with my colleagues I got some website 15 days website was available and I am the only one probably who downloaded it in that 15 days so of course not book is available in market it is a very 100-200 page book or something and available not very costly either this cost of course is relative some of you may find it cost is what 300 rupees is I feel this is very cheap so who cares about logical effort this is what I took from Harish I said circuit designers waste too much time simulating and tweaking the circuits high speed logic designers need to know where time is going on in their logic which part is actually having more maximum delay and cad engineer need to understand circuits better to build the better tools. So in all and all this is an introduction to say why logical effort this is also taken from his slides which he very nicely give me or sent me then there is some person he named Ben Bediddle is a memory designer for a motor oil 68 W86 motor oil companies memory unit and it is a use for an embedded processor for automotive applications. Now he says help band Ben design the decoder for a register file register file is like a FIFO or a stack maybe one some times it has a 32-bit wide length of a word and it has 16 words to store so to enter this you need a 4 to 16 decoder which is what probably is designed and that is what the specification he gave each we present a load of 3 unit size transistors now this number we will come back to it what exactly 3 unit size means assume that true and complementary inputs of embedded bits are available each input may draw a 10 unit size transistors. So Ben needs to decide how many stages to use how large should each gate be and how fast can be how fast the decoder can operate before I go to speed part when I am looking for a speed in a CMOS or in a MOS or any logic per say I am actually looking for most cases a capacity loads and loads are normally coming from the second stage the input capacitance or input impedance or load is coming from them but that apart there is an interconnect line and there is an output capacitance from the driving stage so for example so there are capacitances which are associated with gate to drain gate to source bulk to drain bulk to source all these capacitance are shown here and they actually constitute this is essentially source to bulk is essentially diode capacitance so means drain to bulk then from gate to drain and gate to source essentially through oxide and also some overlap of that. So these capacitances can be evaluated from MOS devices which is what listed here and once I know what is the net capacitance so these two capacitance this capacitance these two capacitance are essentially what will constitute a load at this point and this is total load what we call so one can estimate capacitances but one has to understand that most of this CG4 CG3 many of these capacitance are area dependent so larger the size of a transistor W into L will be larger and so is the capacitances larger so when I am driving something what driver is going to drive is decided by what much load it is going to phase and what is the size of transistor it has now one thing one can understand the problem starts if I am inverter and driving another inverter it is same load whatever is input capacitance is probably same as the output capacitance load from the other side but if the load is higher is this current which is given by driver is sufficient to charge those capacitance or discharge those capacitance same times as if it would have been a single inverter or same size inverter now this is essentially timing problem now if I want to retain time I must provide more current capacitor charges CDV by DT currents so if I want larger I mean smaller time or higher speed I must put larger currents larger currents will come from larger W by L larger W by L will create larger area so it will increase the capacitance so here is a catch I improved I am driving for the next stage but it is own input capacitance will increase and success stages will find it very difficult because every time I push current I increase size and if I increase size I increase capacitance for the next stage so this is an issue which we have to solve after all one cannot say that one can never improve the speed because this will always be counterproductive it is not so very true so these are something which is an issue I think this is very popular figure we are interested in mostly delay time which essentially relates to both rise time fall time and from the 50% points we calculate TPHL and TPLH and essentially as I said all this game is simple we are charging a capacitor or a discharging a capacitor through a equivalent resistor so we all know it is V into 1 e to the power minus T by tau into some voltage up to which you want to charge now this gives you roughly 0.69 hours this is called lump model so I know roughly if this is the kind of values capacitance and equivalent resistor how what kind of timing I am looking for this idea has been taken care in logical effort just to give an idea again the capacitance is our computed using the sizes W is the width length is not chosen because length is a part of CGDO is per unit length we define so if larger the width larger is the capacitance you can see all of these values are specified of course in the gate capacitance both width and length total area will come so in this so I can evaluate CL sum of all these relevant capacitance please take one simple word in calculation of capacitance that those capacitances at any node whose other terminal is grounded are only once to be added because otherwise they are not in parallel if floating capacitance has no connection to the node even if one point is connected so always see it should at least provided AC ground if not DC ground then those capacitances should always be added at the node as a parallel capacitances of the all of them together this fact has to be understood many times you see a capacitance but if it is not getting ground connections through any path it does not get charged anyway so it does not actually load you recently this morning you must have done some CMOS inverters it is always said by CMOS people that the transfer characteristics is size independent or what they say W by L does not play a role in fact it does okay we say it is a ratio less it is actually ratio as far as this transition part circumsion may be VOH VOL has no dependence which is how the definition of ratioed circuits were given but in reality one can see if I change the W by L one or the other P channel and N channel will become stronger and therefore speed will vary depends on the size and that is very crucial for us so typical inverter characteristics which Mr. Sharma must have done I just quickly hurriedly say a P channel load this is from say we are transiting from low to highs transition it normally done from the load side that is a P channel transfer actually is on N channel is off that is what okay this is N channel switched off input 0 0 to less than VT P channel is fully on initially in saturation and on saturation saturation than non saturation or what sorry I initially non saturation than saturation as we outbuilds so this is the path of charging the capacitance in the case of discharge when the input has already going from high to low and channels which stuff P channel switched off and N channel is now a resistor and this capacitor discharges this is essentially simple RC time constant calculations so all that we did in all the great calculus that is the figure I am showing essentially tell you the logical effort actually uses this much electrostatic okay no more and no less okay so it is not that it can be not be taught at second year it cannot be taught at intake level it is actually true for everyone even a designer in a company again coming back to basics switching speed of course is limited by time taken to charge discharge capacitance this is the total response I say rise time waveform from 10 to 90 percent of steady state value fall time from 90 to 10 percent and delay of course the time difference between input and transition and 50 percent of the output transitions and we know the propagation delay is the average of low to high and high to low this low to high will be different from high to low depends on the size of P channel and N channels and one upon delay of course is the speed this is a standard formulation which all books give how to calculate for given input pulse from 0 to VDD or VDD down VTOP is the threshold of a P channel device at 0 bias right now we are assuming no bias dependence in normal CMOS case if the substrate are properly connected it will not have any bias effects so VT can be assumed constant so we can calculate TP LH we can calculate TP CHAL and if we do so we calculate the propagation delay so what exactly is delay optimization here is what essentially I say if I want to optimize the delay in a circuit any category I break into two categories one I can choose the size W and L or W L essentially gate size selection okay what is the gate size essentially I mean you know what kind of you know if you see a standard cells they say it is 1x 2x 4x amount of current it can drive okay that is called gate sizing and then there is a transistor sizing so two things we look into it when we say which get to choose and what are the sizes in those transistors we have we can see that different driving strength the IP cells are available for almost all blocks okay the current synthesizing tools do a good job on this part of it actually gate size selection is very good very simple and you can always pick up the cell which you want okay but the if you are doing a custom design then this is not known so you actually you are designing from transistors and if you are designing from transistor the size of the transistor is essentially somewhere dependent on the even process what kind of you are working on 90 nanometer we are working 65 nanometers values are different for different cases so you need to optimize delays obviously one can appreciate this fact that if I scale down the technology speed will improve scaling laws will tell you that speed at any cost will improve at cost we will see that time permitting okay so if you are looking for custom design quality depends on individual designers now that's where my work is I prefer to be an individual person rather than a standard cell business in which or IP business in which you just pick up few things connect and hopefully it will work this hopefully word is essentially because not everything works so I only say hopefully of course there is a synthesizing tools available very strong tools are available these days cadence and synopsis and they may name and company is ma they all have produced fantastic tools but to produce the tool they must have studied what I am now teaching the way we normally teach our students and what student want to do he just goes without even thinking a word about it is a design this he will go and put some value then start iterating on that this iteration sometime helps or sometime it works because you hit correct value day one and you say I got in a second time but that may be luck not every time will hit the correct values and therefore it may take hell of a time to iterate and if the circuit is very complex it will take much more time than what you anticipate of course there are many algorithms for gate size selection exist one iterative approach which is very famous in the literature is called T los algorithms they assume can we can compute the delay along a path of gates and have a multiple gate sizes to choose form will yield good results for any part delay this is stillo's theorem is what sprawl Sutherland and Harris Harris of course for the teacher he actually picked up these flowers plowland Sutherland's method of logical effort this tillo's theorem forms the basis of logical effort so what is stillo algorithm let's say our four inverters the last inverter you may say current gate and prior to that it's driving gate okay so what you do initially choose all inverters of one size 1x 1x 1x strength as if and it is driving a load of capacitance CL okay CL so what we do is we first put all of them one one one and measure the delay from input to output that is a time taken to charge the output load capacitance this is we call last delay then we take we start the last driving or current gas or last current gate you increase the size to 2x and again measure a delay okay by same techniques then you reduce you go back and reduce the size of current gate and increase the driving gate size and you will measure it and compare both a and b with the last gate delay which is all uniform single gate delays okay and compare which one is better okay which one is better so if you find b is better second one is better use second one if you find a is better use a1 okay so basically what we are saying you keep repeating this process from the output side till the input side keep reduce keep changing this and see finally average delay is where my minimum you are getting this is always possible this is what iterative technique is all about okay this is still a theorem so logical effort people actually use if not they did not specifically said I figured it out that they are using kilo theorem so they themselves did not say much in their book initially at least now I do not know and I figured out that this is only the method which they employed in a different name okay now also the tillers theorem as very interesting point they said do we have to really calculate for every time the total path delays no we do not have to we can actually fix a window where you want to optimize let you first and last as usual and start only optimizing the window size and you can probably get most optimal delays in this case but if the chain is too long or the sizes are too big then this issue may be very difficult every now and then to do simulations but as a you know optimum gate size happened to be 2.5 x it is found and that choice of 2x or 3x normally leads to good results this is what he was presented in their first paper so if you are using a rule of thumb so even before you go to logical this as a starter you should always start with tillers theorem and it's a key fan in low as don't have 2 large input gates often should be less than 3 key fan out less than 5 for preferably 4 and that is why most of the circuits are designed for what they call fan out of 4 for 4 and that's the reason why they find delays can be minimized along the critical path the definition of a critical path is the delay where whichever path has a larger delay compared to others we call that path as the critical path because that is going to decide your final delay so along a critical path the minimum delay is achieved if each stage delays about if delays of each stage is roughly equal you find the average delay will also be lowest okay and as far as possible use rise and fall times equal okay which will be much better for your design essentially I am saying beta R the ratio of each N channel P channel beta should be close to 1 essentially I am saying W P channel transistor should be double or two and a half time depends on the mobility ratio of N channels okay now this is where I say after giving you a lot of introduction I think I spent too much of time but just was I thought it was necessary but you should not feel something this people did or something I am telling is great actually it is derived from very simple thinking it would be nice to have a backup envelope method of sizing gates transistor that would be easy to use and work yield reasonable circuits or reasonable results Sutherland's Paul Harris book logical effort designing fast CMOS circuit introduces a method which they defined as logical effort will attempt to apply this method during this course of circuits during course to circuit that we look at we look at static CMOS application first because though Professor Sharma said many other logical families are possible dynamic domino zipper Nora name one CV complementary logic many others are possible this time permitting I cannot complete all I at least I will show you the method on static and that is doable for any other logic family as well so do not think it is restricted to static CMOS but it is easy to show so the first thing we start is we give a model which we call gate delay model what exactly I am talking is delay will be always be normalized let us say particular unit of a delays tau okay so we say absolute delay D ABS is essentially D into tau where D is now a delay of a block essentially which multiplied by tau will give in seconds otherwise we will have we will have dimensionless D epsilon by tau is D and we use this itself as a unit but if someone wants actual delay you tell me how much tau you have I will multiply by that okay so it is per unit tau is what we are calculating as our delays in all our calculation in reality if you need actual delay in nanoseconds or seconds then you must multiply by tau tau essentially is the delay of a minimum size inverter 1 2 normally as we said delay of logic gate is composed of two kinds of delays one essentially we will call F the other we will call P the first let me talk about F F is called stage effort or also called effort delay effort is something with the word which we are now bringing in and P of course is a parasitic delay which essentially takes care of all other capacitances in this now what I am talking the stage effort F which we say it is a delay due to the load the driver is seeing a load and I am now looking for that kind of effort required to charge that capacitance in the same time okay so we say it is a product of two terms G and H okay it is a product of two terms G and H then of course if you are looking for absolute delay then dABS thing I am not conversant with this often so I forget dABS is F plus P into tau and since F is GH this is GH plus P whole bracket into tau so what are this G and H so now what sprawl and others are defined they say G captures properties of the gate and H captures properties of the load so we now actually broke this part into two one we say what kind of gate you are using is essentially will give me G and how much is the load I am going to drive with reference to the input load is what I will call as my electrical effort and that I will refine as H okay given examples we will see soon so if you use a standard RC model which 0.9 or 6 RC kind of models if you use lump models there also if you see a gate delay some constant into load plus no load delay which is the parasitic delay the function comes exactly same way 0.69 RC 0.69 R is some kind of K C is what you are getting okay plus some parasitic delay which in C has two parts which may not be function of K directly so that may be separated and we see it is same as what we could have done from the lump model accurately otherwise by integration you may have got better results but if you are too many chain of circuits integrating methods are very difficult because you keep doing integration in times so K essentially depends on the pull up to pull down strength of P mass to N mass it will help to see how RC model can be derived out of logical effort model this is what essentially RC model is okay this is my pull up resistor this is my pull down resistor N channel this can be switched on and off P channel this can be switched off for N channel this is my input capacitance this is my PI which is the parasitic capacitance of the gate and this is my actual load I am driving okay I can calculate the absolute delay by I in R invert into C invert C in a C inverter we assume equal pull up pull down ratio okay pull down R inverter and C in a C inverter so tau essentially some constant which is fabrication limited limited and R in RC this is like 0.69 RC kind of thing. Tau is not the no load delay of in please take it this tau is not the no load delay of the inverter it is not the delay of 1x inverter dying 1x inverter since this includes the parasitic delay that did not include parasitics this means that the determination of tau cannot be done via only delay one delay measurement. So we say we will now choose a template circuit that is standard one block and will then with reference to this block any other block I will compare you say how much extra this is the effort I am talking I have a standard delay available to me a standard block that is an inverter which has a P channel twice the size of P channel size double of that of N channel a standard inverter and then I say okay this is my template compared to this if I use NAND gate or any other gate or any other block how much sizing I should change for same timing if I would have driven by the inverter okay that is what say effort I will use to do similar things. So okay so input CT is the input capacitance of the template and we are going to scale the factor alpha increase in fact most cases RT is the pull out pull out resistance of the template CPT is the parasitic capacitance of the template CN is alpha time CT RI is RUI RPI same we are used which is RT by alpha and CPI is alpha times CPT you can do this simple calculations from what processor my has talked and then we can actually calculate absolute delay as some constant of process RI into C output CPI substitute these values and you get an expression which is very interesting you can see KRT CT out by CN KRT CPT and if you see my expression which I got for logical effort this is some way that practice coming this is parasitic this is logical part this is essentially I am using the same formulations I am not doing very great differently from the okay. So we say if I compare tau GH plus P then tau is KR inverter CN inverter previous definition G is RT CT R inverter CN inverter H is C out by CN and parasitic delays RT CPT divided by R inverter CN inverter so you can see what we could have done by a lump model which is given in Rebus book is same as what logical effort people are thinking so we are not really going out of logical thinking we are still following logical thinking the only reason why we switched over to this because it you know every time calculating RCs becomes very difficult and time cumbersome cumbersome so we said okay can we do equivalent something and that is exactly what the effort was in this case. So for example if you see a delay plots this is called normalized that is not absolute delay this is normalized delay and I am putting an effort electrical effort which is nothing but output capacity load divided by the input capacity capacitance C out by CN and I had two separate blocks I use inverter and two input NAND gates and we figure out and of course this green line is what we call one unit delay for an inverter which is called parasitic that P part is fixed for both. So if you see this plus something that is inverter will follow this line as I increase C out my inverter delay rises that is obvious you know you have to take your longer time to charge the larger capacitance. However if you are using a NAND gate and have similar capacitances then you find the delay is even more okay delay is even more now this essentially now I am trying to say that not only the delay increases with electrical effort but it also increases with the kind of gate I am going to use and therefore what we called as the gate delay or gate efforts okay. So having introduced a lot of it we can now start looking for logical effort for different gates very simple calculations as I said I have taken an example in which W is twice that of a P channel is double that of N channel links are same for almost all technology device all same technology devices links are normally minimal lengths in case of digital. So L is not shown L is always fixed okay. So we say of course this 2 has been assumed as if the mobility ratio of your electron to hole is 2 if it is different use that number there. So sometimes I use gamma gamma is here is 2 but it can be anything else as well. So in the first simple case as I say it is P channel is twice that of N channel in widths. So we say the logical effort of inverter is unity this is our template device template now with reference to this let us say I have a NOR gate. Now if you are using a NOR gate with the same equivalent resistance and capacitances we know since the 2 transistors are going to be in parallel for a NOR function but I cannot reduce the size I am already at the minimum. So even though I have this both are 1 1 for 2 inputs A and B. However for P channel since 2 are going to be in series to make it equivalent of 2 1 upon 4 plus 1 upon 4 must be 1 upon 2. So I must actually double the size or in each in single inverter case to 4 from 2 to 4. Now if I do this I cannot reduce as I say here 1 because that is the minimal I have but here I may have I will have to do it 4 now because this was 2 here to make a series of the same equivalent I must double it so it is 4. So now I see the total capacitance of an inverter we assume it is something like called 2 plus 1 W 1 plus W 2 that is the net capacitance it will see C W into L C C arc W into L and so you can say C arc W 1 C arcs W 2. So it will adds W 1 plus W 2 so the unit here sorry unit here is 3 and what is the unit per gate if you can see from here for I say input A what is the capacitance it is going to show 1 plus 4 1 plus 4 W is 4 W is 1 so 4 same is for B 1 plus 4. So for per input the effort is now capacitance of 5 whereas the inverter equivalent inverter has a capacitance of 3 so the ratio of the 2 5 by 3 is the additional effort I will have to do for the same timings that is what I say I have made logical extra effort to have same timings. So I calculated the size which would have been equivalent of my template inverter added the sizes now for each input and divided by the template value so I get the ratio of the 2 is additional effort. So for a NAND gate 2 input NAND gate the logical effort is 5 by 3 will show in a NAND gate little later maybe if I have somewhere I will show it here is a NAND gate. Now you can see same thing now in the case of NAND function as for Sharma has discussed well now N channel to have equivalent of 1 I need N channel to be doubled because otherwise the resistance will not be same the charge time will not be same. So I want to have this both size double of this so that in series of 2 means equivalent of 1 however in parallel I do not need because the path is something either this or this I do not need to double any of the P channel because I have alternate paths anyway available if A is on or B is on A0 or B0 one of the transfer will be on anyway. So I am not interested in now to find the size of P channel additionally because I have any way parallel path for that. So I do not change the size of a P channel I keep them to however I do increase the path of N size of N channels which is essentially 2 and 2. So for each input now this is 2 plus 2 or for B 2 plus 2 so the effort is 4 by 3. So for a NAND gate the logical effort is 4 by 3 logical effort for a NAND gate is 5 by 3. So all people in all design books say NAND gates are superior to NAND gates and now I am telling you here NAND gates are superior to NAND gates. Of course NAND gates are superior in some other sense also it will reduce the leakage paths and some advantages in low power. So it is not that one does not see other aspect but essentially you can see what I am really talking about. So I am now trying to find equivalent sizes compared to my template inverter and sign what is additional effort I have required to keep same diamonds. So okay this is what I did. So if I have a random logic which is a complex gate A, B, C which can be you know this is essentially A dot B plus C bar at the function. It is A dot B plus C bar is the function implemented. So if I want to find each input logical effort then I say okay for A input you can see from here this game has to be played well whichever has a parallel path keep same size whichever has a series path double or triple depends on if there are 3 in series put 3 times each of them. If there are 4 in series put 4 times each of them whichever in parallel retain the same size. So P channel 2 I retain for A but for B and C since they are going to be in series for a static CMOS I increase from double size 4 by 4 which is equivalent of 2 in series actually. For N channel since B and C are going to be in parallel I do not increase the size I do the reason why I do I could have done with one itself but I figure out that if I do this 2 plus sorry it should be 1 1 okay it should be 1 1 okay it can no no it can be 1 1 but since A is in series I ought to make one total. So A in series B or C so I must now make 2 B or C because then 2 and 2 in series thank you 2 and 2 series should make it 1 so this has to be 2 though they are in parallel but since they are series combination here I need that to be again changed so 2 plus 2 and now for each input I can say 2 plus 2 4 4 by 3 is the logical effort for A input for B and C 2 plus 4 6 6 divided by 3 which is 2. So for B and C input logical effort is 2 obviously for this kind of this the net logical effort is much higher for a complex gate which is true total G if I bundle ABC together the logical effort is 4 plus 4 by 3 which is very very high 16 by 3 kind of thing. So you can see as I increase the complexity in gates my logical effort will keep increasing that is what most important part in understanding that if do not you know people say why not use one simple complex gate and you get rid of many other decisions but if you do so you actually are consuming or actually are delaying the circuit or if you do not delay then you will actually increase the sizes and therefore you will have a penalty on that. The value of logical effort G depends on what gate is chosen template of course is 1 which is inverter which is G equal to 1 choosing a different template gate will alter G value so it does not it is not that inverter has to be permanently fixed as your template if you decide to say 2 input 9 gate as your basic block with reference to that you can find the logical effort with that template so it is not that we are forcing you to always choose inverter but since it is much easier to appreciate and teach teaching one normally chooses inverter which is the basic block of any digital design G values therefore captures the effect of varying number of inputs and transistor topology on more complex gate than your template gates more complex gate will require more logical effort to produce the same output current as the template gate and will also present a higher input load because you are increasing the sizes the input capacitance also will proportionately increase the logical effort for a 1x 9 gate 2x 9 gate 4x 9 gate are all the same the effect of extra load by the larger transistor is captured by what we now will say the other parameter which is called electrical effort or electrical parameter this electrical effort parameter H is used to capture the driving capability of gate via transistor sizing so you increase the sizes so you know how much current you are driving so H essentially is the ratio of the output capacitance you just look at it to have a G values we change the value size of transistor so you are increasing C and so I am now want to show what is the additional or what is the value of output load to the input capacity load which I have increased now so that ratio is additionally I am looking for and that number I am giving through word is electrical effort which I called C out by CN H as equal to H. Now note that H for a gate will reduce as the transistor become wider since C in increases C out is outside load so which is fixed please take it H becomes lower if C in rises because C out is fixed C out by CN will actually reduce if the sizes of transistor increases this is very interesting. Now the third part in my decision is the parasitic note that parasitic delay which is essentially we say no load delay is constant and independent of the size of a transistor as you increase the transistor sizes the capacitance of gate source drain areas increase also which keeps no load delay constant to measure P once P is known one can compute D and therefore one like to know P there are different methods what we do is we take two inverters driving us two same size inverter drive each other and next time we use one x inverter driving to a load of 2x inverter and measure the delay again and from that P can be two equations two unknowns so I can get the P value. Now I will go so I can find from for a given kind of logic I have I can always find what is the additional effort I will require and if I know my G if I know my H and if I know P for that then I always get my D and if I know tau for a given technology or given inverters I can always find the absolute delays. I do not have to find absolute delay any time because I am only comparing the two systems so if D is higher or lower I say okay that is better if D is lower okay that is the way I will actually use designs. So for example this is an n wave multiplexer this is used in many circuit may be another example I will give you like you may be driving 64 drivers buffers for some circuit in many logical memory blocks okay so here is a one out of such selection if you have to do. So here is S1 to Sn are n blocks of multiplexer okay S1 is the select signals and D1, D2, D3, DN are the data signals okay which you want to pass on the common output line which is your C. Now we define since I already given a theory that any series transistor put the size double because if there are two in series so that equivalently it becomes what was for N channel and P channel any parallel you need not because then either path is available. So if you see this these two S1 is selected here this is actually forming the select path either this is on or this is on depends on if this is on data is allowed to pass of course data bar is going because inverting action however if S1 is 0 this block will be off S1 0 mean this is off this is on but in since D1 is 0 then this transistor be on and one will be transferred. So what is I am transferring the compliment C is the compliment of D1 can be transferred is that clear I repeat if D1 is 1 this transfer is on and if S1 is selected so this is on so these both are on means this node will go to 0 and the output will go to 0 so I have actually D1 bar transferred if D1 is 0 so is 0 then this transfer is off no ground this transfer is on and now if S1 is 0 S1 is 1 that is select has been done S1 bar is 0 so this is on this is on power supply value goes here that is 1 you have input 0 so transfer is D1 bar so I am actually however if S1 is 0 S1 bar is 1 both these two transistors are switched off I repeat if S1 is 0 S1 bar is 1 both this P channel and this N channel are switched off no power supply no ground is possible output floats that means it is translated out that means it is this input is not getting transferred okay at that time I may please remember S1 S2 SN must be non-overlapping signals because at any given time contention should not occur so if S1 is on S2 S3 S4 all should be off or any one of them can be at a given time be on that is what select will be okay so if that is so I can now have a N way multiplexing and the effort now I look for it the effort is let us say this is a N so first leave this N part only see for one single this side you can see from here 2 plus this gamma now is that what exactly I said gamma can be 2 or gamma can be 2.5 or in some cases it can be 3 if the technology is fantastic depends on the technology use depends so I kept it right now gamma as a ratio because that is what flawless and not that I have done so okay so it is say 2 plus 2 gamma for one 2 plus 2 gamma for other so 4 plus 4 gamma please take it 2 plus 2 and 2 plus 2 okay so 2 plus 2 gamma plus 2 plus 2 gamma that is 4 plus 4 gamma is total for S and D okay so and if there are undivided by its one effort which is standard template 1 plus gamma okay this gamma is 2 in earlier case please remember gamma I used earlier is 2 and therefore it is 1 plus 2 is 3 so it is same gamma is as I say repeats the ratio of mobilities which can be 2 2.5 3 of all advance so if there are n such blocks 4 plus 4 gamma upon 1 plus gamma is for one chain and chains n times 4 plus 4 gamma is 4 n and therefore the logic net logical effort of an n way multiplexer is 4 n okay so larger the size of multiplexer use larger is the additional effort you will have required now if you are looking for input data you only look for D1 or D2 or 2 3 then it is 2 plus 2 gamma divided by 1 plus gamma which is essentially 2 okay so I can find logical effort for the n way multiplexer I can also find logical effort from any other gates but some representative gates I show you XOR gate for example this is a interesting circuit XOR gate please verify if you like XOR okay it is not looking a standard XOR kind but it has it is an XOR okay if you wish you can try any of the input and check okay the way I did it here is A and B to inputs on the P challenge is A bar B and parallel combination is A B bar the complement of that and here also there is a complement of that this is essentially interesting XOR circuits this is essentially derived from pass gate logic okay so it is a very simple but can be thought of how to do it okay you can take any gate these are given in the books I am only using that because if you are going to see a book someday you should know what what is they are given okay so if I want to find the total logical effort let us first per input per input is A so 2 plus 2 gamma upon 1 plus gamma is 2 bundle is both A and A bar is called 1 bundle B and B bar is called another bundle so if I have a bundle for B it is B plus B bar is 2 plus 2 gamma again 2 so that for a bundle of A or B that is A A bar and B B bar individually it is 2 plus 2 4 but if you chose for all of them together 4 plus 4 8 so the logical effort of a XOR gate shown here is 8 I repeat 2 plus 2 gamma by 1 plus gamma is 2 for 2 of them it is 4 and for all other side again 4 so total logical effort for this XOR is 8 so what is this going to tell us if the logical effort is larger delay is larger so now you have learned that complexity of the gate has direct influence on your speed okay that is what we want to know so in last part of this when I finish this part calculating I will show you how do I then design should I use NAND gate in between or should I use what gates I should use so that overall delay on a path is minimal. So for a N generalized XOR or parity gates if there are N inputs 2 to the power N 1 hold down transistors N transistor in a series each with width of N 2 to the power N minus 1 pull up chains with each of width N times gamma and therefore total this is same more generalizing what I did here I am now re generalizing N such sizing total logical effort will be 2 to the power N minus 1 into N N plus gamma by 1 plus gamma or N square 2 to the power N minus 1 you can calculate 8 now from here okay N is 2 so 4 into 2 is 8 if you choose N equal to 2 to input XOR you can see N square is 4 2 to the power 2 N minus 1 N minus 1 is 2 so it is 8 so logical effort of a 2 input XOR is 8 for N XOR N input XOR it will be proportionately N square to the power 2 N 1. So today I can calculate for other parity gates if it is a 3 inputs you can calculate by same logic numbers okay now there is another case there is another circuit which is given which is asymmetric design with reduced logical effort this of course is another XOR gate which is ABC 3 input XOR gate I think Professor Sharma may talk or may be time permittal this is also trying to save some logical effort and still getting an XOR function okay this is a 3 input XOR I think if I spend time then I think I will not finish maybe if time I will come back and show exactly how it is XOR or maybe you can see this circuit and try yourself putting whether it is an XOR and now if I do this the logical effort for a 3 input XOR was 36 now I have reduced it to 24 so I can also play game in the placement of transistors and also can reduce my logical effort this is called asymmetric that is the upper side and lower is not identical okay there is a majority gate very famous gate if the 2 of the 3 inputs are high the output is low this is of course majority bar you may say if the 2 of the 2 out of the 3 inputs is 1 then the output is low otherwise output is high this is what majority gate is you can see from here if A and B are 1 both this transistor switches off C is 0 let us say A and B are 1 this is off but since A is connected here and B is connected here both N channels are turned on this node goes to 0 and you can see output is 0 you can now see others are not transferring 0 or if at all they may transfer 0 I repeat but these 2 transistors which turn okay these 2 transistors which turn this output node goes to 0 pull down so the X is 0 which is what I say if the 2 of the try any one of the 2 any 2 inputs are high the output is low this is what majority gate is of course as I say it is a majority bar in the sense 2 high output is low okay I can do similar game I can readjust the transistors and put little different logic which is called asymmetric majority gate and now I see the logical effort has come down 10 okay so the trick I am saying that not only you should look for the logical effort as you calculate but can also redesign the blocks and that is why there are so many ways the same logic is represented the idea was to see that the logical effort is minimal or speed is larger or for the same speed sizes are not very high these are the 3 things which we are actually looking for this take a carry chain another gate how to generate output carry and again as I said is out carry bar is made to available to you if you see a typical this carry chain these are 2 input G G cake bar K bar and this is your carry so now you can see G 0 this is on K 0 so this is on then since this is on this is on but then if C in is 0 this is on if C is 1 this is also either this is working or this is working depending on C is 1 this is pulled down so it goes to 0 if C in is 0 this is on and therefore 1 is transferred okay and in since K bar this this is off this is on so it try states out so this does not then play any role because K bar is 1 so this is off and G is 0 so this is off so only C in is 0 1 transfers C in is 1 this transfers and now look for other values of G and K which will try state the device that means this transistor will take over try state it out for so I can calculate the logical effort for this carry chain so for any one of them C in logical effort for this is 2 plus 2 gamma by 1 plus gamma so only 2 if you see for G it is 2 plus 2 gamma 1 plus 2 gamma 1 plus 2 gamma this is 1 this transistor is 1 and 2 okay so total logical effort is 1 plus 2 divided 1 plus 2 gamma divided by 1 plus gamma similarly for K bar this is size 2 this is size 2 so 2 plus gamma upon 1 plus gamma is this add all of this so the net logical effort is 5 plus 5 gamma divided by 1 so logical effort for a 1 single carry chain is 5 if there are n such carry propagation going on then your n times of that so when I am designing adder circuit or I am designing any blocks the first thing I should do if I am given some constraint on timing then I should see what effort maximum available for me and therefore choose your architecture itself which gives you a minimal effort okay that is what design is all about what designers do they what is the other method you start putting a circuit which most people do give some sizes and let spice do all this job keep tweaking it till some time you say oh I good you know acceptable but that is how 24 hours you may run a spice simulator for a very small block if your initial guess is very wrong so the best design is one which you get correct initial guess which is close to the actual value you are going to get put those values that architecture and then actually do simulate on a spice which has a better models than anything quite will take exact value of all capacitance and everything so it will give you appreciate your exact value of what delays you are looking for yes sir one doubt I have why don't we restrict ourselves only to total logical effort I mean why in each stages do we have to go for a you know effort on per input or you know per block yeah but when you are choosing an architecture you cannot see the total logical effort unless you see individual block effort so the trick of showing is that if you can calculate the individual block effort then you can opt in you can see okay if I change this this itself will reduce my total logical the optimization is for total logical effort but total logical effort is a part of these so I can not only if you given an architecture I may only do total calculation is good enough for me but if I am also to change tweaking as I showed you from asymmetric to asymmetry I can play I would not have known where the actual effort is going on maximum so let's say one block gives highest this I will actually look for it can I reduce this one because then itself will contribute much lower to the net is that my clear to you so particularly I am looking for those block which are larger logical effort and I will if I can I am not that every time I will be I will try to see if I can get alternate architecture on those blocks at least which will give me lower overall logical so it's talking something out of ten if someone is six hit six first rather than two two people but okay two to I may reduce one what else I can do but six I can reduce four I actually reduce two anyway okay so that why that's why the Sutherland's proud suggested that we should calculate and also it is like a pedagogical I am not just getting number five by not telling you so okay from where I am actually calculating is also both ways okay another device another block which you use very often is dynamic latch essentially it's a latch call it dynamic essentially we got say clocked circuits 5 5 bar are the clocks if 5 is 1 5 bar is 0 so you can see this P channel is on this N channel is on at that time depends on the data if data is 1 this is off this is on this is on these one queues bar inverted okay latch inwards okay you store you are another block it will store the opposite effect then if 5 0 5 is 1 sorry 5 0 5 bar is 1 both these are off and since both our output is tracerated therefore latched okay so that's a standard method of latching which is standard this latch is normally used actually this is some kind of a multi flexing we are doing that's what we said we have a n-way multiplexer is essentially latches okay so you can see this is how logically will implement in in digital design we don't actually use gates or something we use only transistors in actual designs so this is the simplest way of implementing a latch or even a it's called multiplex latch okay so I can see now if the effort is to be seen for D that is 2 plus 2 gamma by 1 plus gamma which is 2 and if you are looking for select signal which is 5 then it is 2 plus 2 gamma by 1 plus gamma again 2 so total logical effort for a dynamic latch is 4 there are other element which I think professor Sharma likes very much in the case of synchronous circuit we know all synchronous circuit are time circuits and normally they use latches for their timing okay delays elements or call shift register or whatever it is essentially are using clock on that okay whereas in case of asynchronous circuits there is no clock going on okay the data itself is used for as if equivalent of a latch equivalent of a timing so from the data the time is derived and therefore of course I am not saying asynchronous system do not have clocks but they are at least not synchronized with the main clock okay so in essentially there are many cases in which data has to be latched in a asynchronous system where there is no clock with you okay so here is a circuit which is called Muller C okay very famous element and I think if I am teaching a digital course I always show this circuit to people because this gives an idea that latch doesn't have to have a clock for a slatching action okay and that's exactly why I chose this one can see from here if you want to show the performance 2 n channel transfer in series 2 p channel transfer series 1 is a please take it b is here a is here and this also can act like a latch okay if a is 0 b is 0 both n channel and p channel are switched off both are on output is 1 so let's say essentially saying b is selecting a or a is selecting b okay so it is essentially latching if both are 0 sorry if both are 1 1 this is 1 this is this if a 0 and b is 1 opposite like 5 5 bar both all 4 transistors one of the chain will be switched off on either side and output will float okay so essentially it is like a latch okay so it's called Muller C which was by the person who got it okay if I calculate the logical effort it is if there are n number of them n plus n gamma upon 1 plus gamma it is n square n is the number of stages you have okay now all this we talked so far how to get logical effort for a block okay so now I'm going to use blocks in calculation of part delays that's not major job so initially I said okay I was first at least know what are the delays of each block I'm going to use essentially their logical efforts and then I will know how much net delay I am going to get if I put them in a path so the first thing I do is is to estimate I have a n stage ring oscillator okay in ring oscillator we know is odd number of inverters connected series and feedback back to this is a chain so it's a n stage ring oscillator and we know the logical effort of an inverter we just so it's a template one so G is one okay now we know what is electrical effort we say electrical effort H is defined by C out by CN but each inverter has driving the equivalent inverter and there is no change in the capacitance from input to the output so we say it is H is also C out by CN which is one for each this this is same as this okay we are connecting now so C out is same as CN okay so C out by CN is one okay the load for this is this but the input is here so you can say input to output ratio is one okay each inverter is a delay of one okay that's parasitic delay we say inverter has a parasitic delay of one unit okay so if I want to find a stage delay from 1 to n that is for one this G H plus P okay which is 1 into 1 H is 1 G is 1 P is 1 so 2 so the oscillator frequency is one upon of course now D is not absolute so multiplied by tau so D tau is your that is 2 into tau is your net delay and we know by formula it is 1 upon 2 D tau into n essentially is the clock frequency now this is the standard load which you see in all designs or rather you actually standardize your design based on these which is called FO4 loads okay so here is a one inverter which is driving four inverters so it is a four fan out for that so one can calculate the logical effort G is one because inverter is one I am calculating for this one okay G is one H out see now it has four four this or C out is each is CN but four times that so C out by CN is four parasitic delay is one so the stage delay is G H plus P 1 into 4 plus 1 which is 5 okay I am still not at my job here is my first work