 We are last time, already started working on adder circuits. Today we start, we will just recapitulate what we did last time. We said that there is a full adder requirement for almost all arithmetic data paths. And in this full adder, we have possible shown here is to input a and b, which receives an input c n and produces a sum and c out. The typical, the truth table for such a adder is shown on your right, which says a, b and c i are the inputs and c i is the carry input, s and c out are the outputs, which is sum and output carry. And we also discussed last time that depending on the input, as well as as the carry in, the status of a carry will be either to keep 0 or to propagate it or to generate the 1. Delete means 0, propagate means 1, propagate means passing c out as c in and generate means create 1s. So, depending on the input requirements or input data available and the carry input available, we can always figure out that we can generate specific carry or delete or propagate. This we discussed last time. We also said that a typical adder logically can be explained as sum is equal to a x or b x or c i, where c i is the input carry. And we can even, we can also expand it logically into a sum sum of product terms, which is a, b bar, c i and something like this. And an output carry will be also a sum of the a b plus b c i plus a c i. We also said that instead of every time doing such functions, other method of doing is to generate two terms, basically two terms called propagate and generate. Propagate p essentially is x or b and generate term is a dot b. And there is also a delete term sometimes required, which is a bar dot b bar. If we go back and look at the expressions for sum and carry, input carry in the last slide, we could see that c 0, which is now a function of a and b with through their functions a, g and p, then it is can be written as g plus p c i and sum can be written as p x or c i. Now, one can see from here that if I, I know from clearly from this expression that if c i is 0, that is the input carry is 0, then the output carry is always g. On the contrary, if inputs a and b are same, that means a x or b 0, then p is 0 and then c 0 will be always equal to g. So, since we know this, this that in case a and b are not same, p is 1. And since a and b are not same, a b is 0, which means g is 0, that means c 0 is c i. This is exactly what we wrote in the table and these expressions can actually represent the truth table, which we wrote back. Going ahead, we can see that such a, using these functions, we can generate something like a simple adder. This is a 4 bit adder shown here. You have 4 bit full adder shown here. Each receives 2 bits a 0, b 0, a 1, b 1, a 2, b 2, a 3, b 3 from the 2 words. And this is the input carry. But the fact is here, you can see here, unless the output carry is generated by this first adder, one cannot do addition for this adder. Unless it generates the next carry, you cannot do something from here. And unless we find out here, we cannot find out the final carry and final sums. So, essentially if you see the adder requirement, time required to go from input to the output, when the 4 bit output can be made available or n bit output is available, the adder time is essentially n minus 1 t carry, each 1, 2, 3 carries, n minus 1 t carry plus 1 sum register, which final sum has to come. So, this is the adder time. Now, our what is the goal at the end of the day for any data path or any arithmetic? We are looking for fastest possible carry path. We said that such a function can also be implemented using a static CMOS. And in this, we could show here that since we know the expressions for C and S in this form, we can actually create a C term from here. And finally, using that C term, we can create S term here. So, now this kind of implementing these two expressions requires almost 28 transistors in total. And if you add little more inversion here, it may be actually 32 transistors are normally required to actually create a normal static CMOS adder. I may now show you that in any full adder circuit, we already said that there is a path which we say critical path that is from input to the output. The worst delay part is called the critical path. And we are actually always trying to reduce the delay in the critical path. And we are also trying to simultaneously reduce power dissipation. And that is most important because we are trying to reduce power dissipation is a major criteria of today's technologies or today's circuits. We are working on 45 nanometer down technologies. And the power supply voltage is around 0.8 to 1 volt or even lower these days in some chips. So, for those purposes, we are really looking for a very low power circuit. And features of this present discussion from now onwards will be also looking into low power part on the same circuit even if it is not as speeded up as normally would have been, but at least trying to reduce power and increase speed as much as we can. So, what we did, we actually have just now said, if you see the previous slide, if you see this expression, since there are XOR gates are available and XOR gates are required to generate a perfect function. So, obviously, based on an XOR gate, one can make a adder which can be shown that if C of XOR gates, one can then actually create a interesting normal adders which is essentially two bits added at a time. Please remember, they have two inputs A and B and they also have one input carry and then they generate the sum and the output carry. Typically, as I said, it requires 32 transistors. However, as I just now earlier shown you, if we expand an XOR gate function, XOR functions and there is just represent the sum and carry term in terms of and or gates, one can reduce the transistor count as many and also it can speed up the circuit. Typically, the circuit which I showed you last has 28 transistors, may be you can see from here, this is 28 transistor circuit which essentially as using only gates. Please remember, there is no XOR function here. We are just representing a sum of product terms or all terms in this kind of complex logic representation in mean terms method or maximum methods. We can actually create sums and carry using only 28 transistors. We can also look into the beta structure of the same as we discussed this. We can try to put things more in a symmetry form across the carry propagation path upper and lower and upper and lower. You can see it is identical, it is symmetric to the nature. It generates both A, B, A, XOR, B and also A bar, B bar and using modified form of the same, you require 24 transistors. You require 24 transistors to do the same job. This of course, we can have this circuit anyway analyzed. It is very simple. We are trying to generate C through this A, B and these are the inputs corresponding to the bits you have. Please remember again that when A and B are same, then the generate function is A, B is available. If A and B are different, then G is always 0 and that fact can be actually used in the implementation. This circuit which I have shown here is laid out on an egg to create a chip and this is called the stick diagram. The yellow red lines of course, are the poly lines, green lines are the diffusion lines. This is P channel devices. These are N channel devices. Please remember this is source drain, source drain, source drain across poly source drain. Similarly, P channel, this is source, this is drain and things of that. This is source, this is drain. This stick diagram represents so called 24 transistor adder. What are the advantages we said last time? We said NMOS and PMOS chains are complementary and symmetrical, completely symmetrical. A maximum of 2 series transistors can be observed in carry generation circuit to reduce this delay. When laying out the cell, the most critical issue is the minimization of capacitance at node C 0. At this node which is your output node, capacitances are associated from all kinds of transistors sitting at this point. Maybe if you look at the circuit once again, you can see from here the capacitance from this side, capacitance from this side, capacitance from this side, capacitance input side, C G D kind of this. So, there are too many capacitances which are associated at the output node of C 0 and if you go further down to some, there are further some more capacitances are added. So, in the case of capacitance at node C 0 is composed of 4 diffusion capacitances to internal gate capacitance and 6 gate capacitances in connecting an adder cell. Now, this means larger the value of capacitance at a node, the time taken to charge or discharge will be higher and therefore, speed will be reduced. Therefore, the transistor which are connected to C I must be placed closest to the output. This is the standard technique of any logical layouts which we always observe. On the transistors in the carry stage have to be optimized for optimal speed and therefore, the sizes has to be so designed, so that the net depropagation delay is minimal. Now, based on this I just now shown you, equivalent transmission line adder was implemented, transmission gate adder was implemented basically this generate A XOR B and then it from there you can actually generate so equivalent of this by using 4 transmission gates, A transistor, 4 inverters, A transistor, 2 XOR, 12 transistor. Essentially, I am saying it is 16 plus 12, 28 transistor circuit same as what we have and that can generate a function which is sum and carry for you. The advantage of this is it should have equal delay for both sum and carry. There is an issue which I have not earlier discussed. The most adder problems come that when your bit size increases at 64 bit or 32 bit or higher, then the delay which it will create for the last bits will be so large that the sum may come later than the carry propagate times and then you have to actually equalize the delays and to do this, there is always an issue how to minimize and also equalize the delays. A typical transmission gate full adder blocks are shown here. This is for sum generation part, this is carry generation part, this is the setup part to create A B, A bar, B bar, A B bar and then using this, we can create the sum as well as the carry. Now, why I am showing you this? We have most people believe that the transmission line gate adder will be low power simply because there is no direct power supply requirements as far as the transmission gates are concerned. It is only driven from their inputs. It is essentially a kind of max. Since it reduces the power, this became very popular as far as the low power circuits are concerned, but the area wise also the speed wise, they are not really very high speeds. So, to reduce power as well as to see that the speed is not as bad or that it will be brewed, many structures were tried out of 28 transistor, there is a structure which is available, which is called 17 transistor full adder which uses 1 XOR. Then there is a 14 transistor full adder and finally, the one which is most likely used for low power high speed applications is the 10 transistor full adder which I may now show you as this. You can show from here a 10 transistor full adder. For example, this is my inverter, this creates C N. These are two pass gates P channel and channel pass gate. This is another pass gate which receives input. This is P channel and channel. No connection please think it. This is creating a bar. This is an inverter CMOS inverter and one can see from here. I do not want to write now work on it, but maybe one of them I can show you. Let us say A is 0. Let us say A is 0 and B is 0. Then one can see this input is 0. A is 0. This is 1. So, this is 1. Since B is 0, P channel turns on. So, this 0 is transmitted. Since this is 0, this N channel works, P channel works and you can see from here whatever is the input carry is the sum from your expression. One can see from here if G is 0 and P is 1, then C 0 is sum is actually only C i. So, essentially you can put this XOR gate. This XOR is essentially not full XOR. It is called pseudo XOR and it is a two month system in which we are not connecting this. This is somewhere I made mistake. So, one can see from here using only 10 transistors. I will be able to generate both G and P as well as I can use C i to create C 0 and C 0 as well as sum. Please remember why we were looking for such structure because we thought that the number of transistors if they are larger they add two capacitance. So, they reduce the speed anyway. The second issue is the area. The VLSI we always worry about the silicon real estate. So, if you reduce the number of transistors, obviously we should be able to reduce the area. Third of course, if there are pass gates used, transmission line gates used, then one can think that the power itself will be relatively lower and it can be shown compared to any other full adder structure shown here 17 transistor 28 or 32. Typically, it gives 30 percent speed advantage and it also reduce 50 percent of the power. So, if you see that if at all you are going to use a full adder circuit using transmission line gate method, the best available adder right now for you full adder right now is this 10 transistor full adder. This is what most of these present day circuits use whenever they are using static CMOS. Going ahead, as soon as we looked into this last expression we can use, we start looking at this expression again and again. I repeat again, if p is 1, obviously g will be 0. So, c 0 is c i. If p is 0, c 0 is g anyway. If p will be 0 only when a and b are not, they are same and in that case g will be either 0 or 1. So, c 0 can be either 0 and 1. So, I know what is the output carry and some always if p is 0, 1 then obviously 1 x or c i will depend if c i is 0, then it is 1. If it is c i is 1, it is 0. So, I actually know my sum as well as my carry as soon as I know what is my g and p for a given data. This fact can be used in generating a chain of adder circuits which is called Manchester carry. Here are the first cell of a Manchester carry chain which shows there is a, this is of course, based on static CMOS, but one which is most likely used by us to reduce the power is static dynamic CMOS in which you have a p channel transistor and n channel transistors which are dynamically connected to a clock phi. Whenever phi is 0, the c 0 goes high and phi is 1, it evaluates. So, this is standard dynamic process. The carry is inputted through a pass gate. It can be a transmission gate, a CMOS gate, but right now it is only shown in channel who is driven by p i that is the propagation term. Now, please remember if a and b are same either p you can see from here depends on if a is 0 and b is 1 or a is 0 and b is 1 then p i is 0 otherwise it can be 0 or 1 depending on the a and b values. So, corresponding to only when p is 1 that the c i passes otherwise c i bar does not pass. So, this is the logic and therefore whatever if we already said from the expression if p i is 1 then g is 0 and therefore c i c output is c input. So, this is direct transmission possible otherwise if that is this not possible then g will p is equal sum is equal sorry the p is 0 in that case this will be blocking and so only it will be c 0 will be g. So, this essentially what we said an expression can be represented by a simple dynamic or static CMOS circuit. Here is a 4 bit adder shown here you can see here this is p 0 p 1 p 2 p 3 which are propagate functions given to 4 pass transistors. This is one cell of Manchester carry chain I showed you. The output of course is inverted because you are getting bars here you may it is an advantage to put always bars because generally some other day in a logic I explained you why 0s are much preferred compared to 1s because there is a power supply drew problems in most cases. So, passing 0 is much easier at the end of course you may generate the outputs once for all by inverting them. So, the method here is simple you keep depending on p 0 p 1 p 2 p 3 values that is a 0 b 0 a 1 b 1 a 2 b 2 a 3 b 3 values either this input carry will propagate directly if not you have to generate at wherever p 0 that block has to generate the carry for the next block and keep doing this. If you look at equivalence of this in a circuit method this is like R and this at this node this is entry this transistor will act like a resistor a small capacitance also output capacitance here plus the capacitance of all this node here which come from here which come from here which come from here. So, there is a capacitance here. So, this is like a R C network R C R C R C and using almost time constant method one knows that the if R and C is the net R please remember R is essentially sum of R R i or R j's where j's are the number of such R's. So, it is like R 1 R 2 R 3 R 4 they may be same, but in case they are not they can be summed up by sigma R i or R j and then using almost theory we know transmission line theory we say that delay associated is 0.69 n times n plus 1 R C by 2. Now, this fact is very relevant to understand because if you have a n is larger for example, this is only 4 bits if you are 16 bit 32 bits 64 bits. Obviously, the delay will be proportional to n square that means the larger the bit size additions you do larger is the delay which you are going to get. Therefore, it is essentially a ripple carry system in which carry is rippling and as many stages it has to ripple that much delay it will create. Of course, please remember in this expression I am only showing you a propagation delay because of this R C network. In real life you can actually do S Y analysis and you can figure out that you will come very close to this value. The reason why we do not calculate any time constant for evaluating g's. So, you should add little more time for the final g essentially because during these g calculation any way the other processes are also being continued. So, they do not really contribute times, but the final one because this final carry and with this it will decide time and therefore, additional some more time but this time of evaluation is smaller than the delay which is coming through this and therefore, typically one may say the net propagation delay is roughly equal to 0.69 n into n plus 1 into R C by 2. Typically layout of such a carry chain is shown here. Please remember again and again I say capacitance per node on carry chain equal to 4 diffusion capacitance 1 input capacitance from inverter and wire capacitances. So, how if you wish to speed up the first thing you should try is to go on a technology in which the net capacitance of this is much smaller increase the size of transistors. So, that the R is minimized, but increase the size of a transistor essentially you are giving an area and since you are giving a larger area larger W by L will require larger will produce larger currents and therefore, larger powers. We looked at into the expression again and again for P and G and we figured out P, G and C and S we have already said if P is 1 and if P is 1 G is always 0. So, output carry is same as input carry this was known to us. So, if you have a ripple carry adder slightly modified shown here you have 4 bit adder as what we are shown they may be Manchester's carry also or normal full adder also. In that case if we see that P 0 into P 1 into P 2 into P 3 that means for all the 4 bits are not identical in any of this bits like 1 0 0 1 you may have now 0 1 1 0 1 1 0. So, they are complemented all the times in that case P 1 P 2 P 0 P 1 P 2 all are once and if they are once because they are XORs of A and B then the product we call BP which is P 0 P 1 P 2 P 3 that will be 1. We now say this is a multiplexer which is receiving a carry through chain of adders at 0 and at 1 we have the input carry directly given. We know very well we just now said if the propagation is P 0 P 1 P 2 P 3 are 1 obviously G 0s Gs are 0s and therefore, C out is same. So, what we say bypass you give as soon as the select signal of a multiplexer is 1 bypass the carry to the output. In any case if this is not any one of them is not equal to 1 which means G will be G has to be validated for that and that will be creating the carry. In that case whichever part of the chain you have to go through generate G and move ahead to keep moving and finally, at that when since it is 0 the 0 means the final carry will propagate. So, it is not that it will reduce a drastically huge powers, but if for a particular data in which or at least the even a partial data it may actually bypass the carry directly to the output without actually evaluating it. Now, this is called carry skip adder once a while because it generate it skips the carry directly to the outputs without going through the adder parts. Now, typically if you look at the equivalence in a signal flow part you can see from here that you have an some kind of delay coming through a delta function which creates this and you have a feedback path which is non-linear and that is where the whole issue we have in all this in a why I showed you this kind of signal path is many people believe that everything is linear in the system in real life that thing nothing is very all transistor character itself are non-linear and because of that the actual delay between this and this is not so linearly connected as one thinks and that is where the issue of equalization of delay starts varying as extremely in the when you increase the number of bits. This model essentially is to show you that the carry delay model is not linear and since it is not linear one has to always worry because you can see it is one path of the different delay the other path of the different delay. So equalizing the delay at the output is a major worry every time. So, you may have require some kind of a register which will actually wait for all the data to appear and put it out which essentially means you will reduce the speed. In a full adder if the date if the carry has to propagate it will take longer time if these are once then it will directly so the time taken from this path to time taken to this path is not same. So, for a given data this will be available earlier in other data it may not be available. So, for the next block to receive it it any has to wait because it does not know when to take that data. So, even if you have speeded it up it did not really work out well because for other data you may have to still wait. And this is the model which I keep showing that the delay model is very important and therefore something has to be done. So, you may increase the multiplexer delay itself and if you do that then why do this. So, there is a there is always a BLSI issue in the case of so as far as logic is concerned everything looks to be great, but if you really look at the timing part you feel it is not that you have achieved really great. Here is a 16 bit adder shown here using carry bypass the method is what we do is if you have a n bit adder divide into number of blocks let us say each block has 4 bits. So, we k n bits and n is the number of total bits. So, 16 and I choose m 4. So, each block has a 4 bit adder going on. So, 0 to 3 4 to 7 8 to 11 12 to 15 and we do individual operation for 4 bits. Now, the advantage is that now you are doing 4 bit operations parallely. Now, what we can why are looking for this in some cases you may be actually directly passing the carry, but during this process you are also completing this process because at that time all are getting data simultaneously. So, you are actually generally you have this data output. So, the idea is to break into number of such blocks and believe that some of the blocks will actually create faster this. So, overall delay may be lesser in larger number of bits. So, the way it is it is same carry bypass adder as we have and each of them this setup time is essentially taken for calculation of p and g and this. So, this is called the setup time for each of them then there is a propagation time which we just now said what is the propagation time from input carry to the output carry and there is of course, the final some part depending on the last carry you have and of course, the first may not because generally see input is carry this may be fast available to you, but depending on whether p 1 p 2 for all of them is 1 or 0 you can actually directly pass the problem. So, if you look at the delay to reach the final sum this is the sum adder please remember I am not adding time here simply because during this propagation and some this anyway was performed the only the last has to do final time. So, we say there is a some time one some time then you have a setup time and all will be setup simultaneously. So, one setup time then there will be as many 4 bits. So, you have a carry time for 4 bits let us say 1 bit carry is 1. So, m there are 4 bits you require 4 bit carry time maximum worst case time then since there are propagation units are only 1 2 and 3. So, m minus n minus 1. So, m minus 1. So, out of 4 only 3 are required for total delay. So, you say n divided by m minus 1 into this is called bypass is possible carry and then m minus 1 carry because you will have to carry for each of them. So, the m minus 1 carry and finally, the sum time. So, this is essentially we say that for a typical n bit or n bit carry bypass adder which has 4 bit as block one can have this. You can see from here again if n is larger if n is larger t adder will be larger, but if you have at least divided into 4 you have at least n by m minus 1 primes this because if n is 64 and you use m minus say m is say 4 then at least you have one third of 64 that is 20 odd numbers carry bypass is only required. So, the idea behind dividing into such blocks is to overall reduce the carry chain time. If one plots the propagation delay worse for the carry versus the number of bits which you want to add then it is found that around 4 or 5 if the bits are sizes 4 or 5 both ripple carry ripple carry essentially is better because their delay is actually proportional to this. Please remember this additional time coming from here is the other part in the blocks which we have that will always be there. However, if you exceed the 4 bit or 4.8 as the actual number as shown, but anyway beyond 4 bits the carry bypass adders rise increase in time is much smaller compared to ripple carry because this is directly proportional to n this is n upon m minus 1 plus some other terms. So, obviously this for a larger bits therefore, it is always advisable not to use simple ripple carries, but use bypass carries. Another variant of similar carry bypass adder is shown here which is called carry select. Now, in our expression of P and G we are seen I keep repeating gain and gain that expression and do not leave that expression any time in your idea j is a dot b p is a x or b and then the c out is G plus p times c in and some some is G p times x or is not it just check it p x or c in. Now, one can see from here that either I can have carry generated one or I have a carry generated 0 that means, whatever is there are only two possibility carry can be 0 or carry can be 1. So, we say take your p and G whatever number and evaluate a prairie with the next term either using carry 0 or using carry 1 there are only two possibilities carry can be 0 or 1. So, we actually evaluate the output to evaluate output carry we actually have both terms simultaneously available for any kind of p and G because either it will generate a carry of 0 or it will take a carry of 1. So, using this now we have now depending on the last carry we know whether to propagate or not to propagate. So, if it is 2 then we say take this otherwise take this. So, we have already generated the two term we are not waiting for finding whether it is 1 or 0 depending on the input we receive we know what is the next carry output because we are already pre evaluated them here then of course some of course is using this p, x or c and we can always generate this. So, the idea is same vary bypass in some way, but instead of just bypass we say we have both possibility available to you with carry 0 or carry 1 pre evaluated available to you to multiplexer depending on whether in this expression what is the value of this may decide whether carry will be 1 or 0. So, we just accordingly pass the output carry. So, this is a trying to save time because we have already pre evaluated of course you add little more time here, but that time will be more smaller than propagates. So, here is a typical timing calculation done here for the adder setup time of course is standard then if you have a 4 bit carry system like n by n. Now you do not need because for everything you have to do this. So, n by m t plus m times which each for them you have to carry plus t sum this is of course, t max I forgot here this is t max. So, one can see from here they are fixed till n total number of bits m total number of bits per stage since we can now see it is you are already reduced to some extent initial timing of calculating this the net carry time net adder time will be smaller than the normal carry save adders carry save adders. Now, if you look at the 16 bit similar adder we now do a same procedure 4 bit, 4 bit, 4 bit blocks we create same 0, 1 availability for PNG for given any PNGs and you have a multiple excess which receives the last carry bypass or not bypass. And if you now this is called the critical path because it will take one set of time all will be set of same. So, one set of time you have to take 0 and 1 calculation for this. So, during this everyone will be doing so only one time for this then this is what I say n by m t carry then you require a multiplexer times if there are 4 bits you require 4 bits of this and then you have of course, a some time which is this. So, this essentially one can see from here it has slightly reduced the m times t carry time has been reduced in this case. So, this will be slightly faster than the earlier ones not greatly, at least to some extent. Now, we figured it out that you know unnecessarily if you see back I will just show you. In this case for this one to see you know now it has to propagate all of it till it does something here. So, the delay which every one of them receiving is not the same. And in a circuit I keep saying if the delays are not equal and the system is a dynamic kind of address you use then there is a timing issue will be very crucial for you. So, we say if anyway if it is to be done like this we may not do we may initially start with only some kind of a logarithmic number increment or square root increments. If we have first block has 2 next may be 3 next may be 4 then 5 6 and so on and so forth. Why we do this of course this final term I will recalculate we can see from here now the number of bits are first stage is m next is m plus 1 and if p is the number of stages you are making m plus p minus 1. This is a series this is called infinite series and if you actually put this is equal to m p p minus 1 please remember p is the number of stages n is the number of bits p is the number of stages please remember this is not same n by m for the reason you can see here this is 2 bit 3 bit 4 bit 5 bit 6 bits. So, number of stages are depending on how many n's you have finally. So, it is not n by m because it is m is increasing every stage. So, if I see this then I add I see this n number is m p p p minus 1 which by series evaluation one knows is equal to m p square by 2 minus a plus p m minus half. If m is very small which should be m is 2 3 4 5 whereas n of course is 64 128 or 32 a higher. So, n is much larger than m in most cases particularly you should use when n is much larger than m. If that is so this second term can be neglected m is very small compared to this. So, this term is very small compared to square term. So, n is equal to p square by 2 or p is now root 2 n. So, I know now how many stages I have to go let us say if I have a 32 bit addition to be done. So, I should have 8 stages to go through 2 3 4 5 6 7 8 you keep on increasing the bits every stage. So, the 8 stages will be required and then if we find out the carry total at time set up time plus m t carry plus now p 8 stages means root 2 n t marks this. Since it is a root please remember earlier case it is n by m let us take a m is 4 normal case then you can see from here 60 let us say 32. So, this is only 8 this is 64 by 4 is 16. So, we are actually reduced the delay by going a square law terms which is essentially we say logarithmic system. So, here is a table again taken from Revis book which says the number of bits versus the delay for the adder in units of delays it can be nanosecond picosecond depends on technology. So, if you see if you use a ripple carry adder and you keep increasing n the delay will keep on linearly rising as proportion to n in fact. If you do a normal linear select even with 0 1 precomputed for every bit of you know change these are all equal you can see here there are 4 bits 6 8 bits of what is size of m you choose. So, there is a stepping going on for every size. So, we can see from here there is an increase of delay, but certainly much smaller than this, but if I do square the delay is much more constant or you so one can see for larger and larger bits of operations one should prefer a square root select adders because that will reduce the delay to a great extent. So, this is a ultimately I must like to tell you that when to choose a simple ripple carry kind of or Manchester carry kind of ripple adders the minimum thing one has to do is to reduce this. However, if you look at very carefully if you do this square root select on the number of piece will be very large may be 16 or 8 or 30 or 32 in case of larger bits which means the num area of the this circuit will not be smaller in any case. Now, one of the feature of the adder is we are keep telling again and again same thing we keep telling you that major worry of adder speed is decided by the availability of carry before the next add operation is performed. Now, if that is to be done what is the way we do it here is a simple method we say here a 0 a 1 b 1 a n b n these are that is data. So, the first a 0 b 0 creates p 0 this and you add c i 0 here to create some now this will also create c i 1 which you have to create and then the a 1 b 1 and then it will create s 1 and this. Now, can if since you have you if you can figure it out this without coming from here then I am saving time. So, the whole trick in look ahead word is that I do not have to wait the last carry outputs before I do the next operation. I am not trying to say that it is so linear as I am making it there are hidden problems which I will show you now, but still I can say that at least if I make it some kind of algorithm in which c i 1 is not the output coming from is this first block to a great extent I would say I will not say it is 100 percent 1 to 1 then I have some generation much faster compared to any other block. Now, this is very idealistic thinking let us say how do I implement this I basic idea I as I you have seen my first lecture in this course was historic perspective of VLSI and from there you can understand that why I am so foreseen. There is a trend amongst the present generation I hope it is not true for all of you to forget the history and I am probably I am now in the history of your age I already 65 so obviously I am historical person for you and unless you know history you cannot progress in future even if you do not know history 200 years ago at least you know history of last 25 years even to know what went wrong or what went right so that the path is in future is better. So, here is something my feelings so I just tell you that Wienberger and Smith in 1958 suggested this idea of look ahead carry and they received IBM award for that time which was a most prestigious award in those days achievement award for their work which they call Arith 13 and they invented this year the adder this was in 58 just want to therefore give you a reference Wienberger Smith a logic for high speed addition national bureau of standards 91 1958 this has been taken from Oklobis 2004 PPT slides which has a here the lectures on computer arithmetic is a computer scientist of course by the way this Wienberger was a mathematician. So, what is that CLA is trying to do is the following what is this trying to do is we have all maths available to us C i plus 1 is G i plus P i C i P i is A i X R B i G i is A i dot B i and S i is P i X R C i. So, now implement this whole function in this form in this form you creates input A i and B i X R of this will be obviously P and you know the C i now if you do X R of this with C i then you get the sum S i is P i A X R B i X R C i is the sum and to generate a carry you need a G i which is A i B i one of them. So, you should of course that adder is not sure A i B i sum this dot essentially is meaning that I have taken care of addition here and product here this is my G i and you can see from here if C i is 0 please remember this is the A i B i X R that is P i P i is 1 then G i is 0 I know if G i is 0 and P i is 1 C i plus 1 is C i. So, obviously whatever is C input here will pass here because I am choosing this as 1. So, 1 the lower 1 will pass for max in case C in is 0 in case P i is 0 which is from A i B i G i can be 1 and if that is 1 then C in what be anything it does not matter because G i is 1 means 1 plus anything is 1. So, this is directly transferred to C out. So, C out is G i when P i is 0. So, that thinking is so simple that since I look at the expressions I now see there are methods I can utilize it because I do not have to evaluate carry every now and then if I look at this kind of expressions. Now, look at the way it is done here is the definitions we give you have a 4 bit adder you have a 4 bit adder these are the input carry C i C i plus 1 C i plus 2 C i plus 3 this is A 1 B A i B i A i plus 1 B i A i plus 2 or call it A 0 B 0 A 1 B 1 A 2 B 2 A 3 B 3. This is your standard addition what we do. So, there is nothing big about it but if you look at the C i plus 1 the carry which is generated here out of this adder circuit instead of writing X R functions you can write is as A i bar X R B i C i plus A i B i bar like a logical expressions and then you say this is also we know C i plus 1 is nothing but G i plus P i C i essentially I am substituting G i as A i dot B i. And substituting P i by A X R B i which is essentially A i bar B i plus A i B i bar which is the X R function. So, I am representing those functions by their equivalent mean terms not exactly full mean terms partial mean terms. So, if you get this expression you can see now look for the other one C i 2 is G i plus 1 plus P i plus C i plus 1. Now, I know C i plus 1. So, I say it is G i plus 1 plus P i plus 1 bracket of G i plus P i C i. So, I expand that. So, I say it is G i plus 1 P i plus 1 G i plus P i plus 1 P 1 C i just make product of that. You calculate C i plus 3 now it is G i 2 plus P i 2 C i 2 now I know C i 2 so I represent them here expand it here I do same thing for 4 also. So, if you see the fast final one for example, it is G i plus 3 P i plus 3 G i plus 2 P i plus 3 P i plus 2 G i plus 1 P i plus 3 P i plus 2 P i plus 1 into G i. So, the first three terms for the fourth carry has G terms G i plus 2 G i plus 1 G i it has P i plus 1 P i plus 2 P i plus 3 3 propagation terms is that clear you have G i 1 G i 2 G i 3 and you have P i 1 P i 2 P i 3. So, 3 P terms for the last stage however, the last term we say is P i plus 3 P i plus 2 P i plus 1 P i into C i this does not have any G term no generate term in this. So, we say this we call as block G for all the block of this is that capital G j G i in which all G is appear this does not contain any G only P terms. So, we call it P j this is the definition what Mr. Wienberger has given them. Now, if I do this then I write G j is this expression P j is this expression is that clear. So, now, one can see from here to evaluate P j you do not need to see to evaluate G j you do not need to see. So, this is your whole worry of propagating carry ahead I do not need to know carry anyway I can evaluate my P j's and G j's irrespective of what carry I am going to get and because of that if you see here is the input carry and this is your this this P g group which is here will immediately generate to me G j's and P j's which will give me an output carry anyway. So, without actually going through this typically if you calculate this if you would have 8 stages of carry as the last one let us say each gives a delta delay then it it compared to that to 8 delta in R C A this essentially is only 5 delta as will give the calculation numbers 1 gate delay for any one of these operations 1 gate delay for this operation P and 2 fuzzy and 3 gate delays to calculate the final ones. So, you have 3 plus 2 plus 1 6 gate delays. So, essentially what I am trying to say if you would have done a normal ripple carry adder and if you do look ahead I have anyway saved the time in a great extent also look at the features there are no additional adder blocks like carry save stages we did no 16 stages or 34. So, we are keeping normal addition see we are not changing from our normal carry adders which we had, but just modified the way we evaluate things and we figured out that because of this we can always improve the speed of this. So, Wienberg method which I shown here one can see from here this same I think is repeated using Wienberg's technique or look ahead method the third it would carry look ahead will look something like this and I also written the critical path is around 1 delta for GP 2 4 delta for GP into 3 and keep on because you know each is a block. So, each one block I have shown you repeat so many blocks and you will get the net delay coming out please remember there is compared to any look ahead compared to any ripple carry adder this will be the fastest. Just to give an idea you look at this static representations maybe it is better if I show you domino ones for the simplicity you can see from here C 1 is G 1 plus C 1 0 only 4 bits I show you C 2 is G 2 plus P 2 G 1 plus P 1 P 2 C 0 C 0 is the input carry and C 1 C 2 is the output carries G 3 plus P 3 G 2 these are the terms these are AND terms and all terms. In a logic transistor in series creates ANDs parallel creates ORs and since it is a inverting logic it will create bars of that. So, you require finally see you will get C bars instead of scenes. However, you can do start looking at it the first stage we have P 1 C 0 as the input this one and OR to that is G 1 this block P 1 C 0 OR G 1 this output of this is C 1 this output of C 1 is transmitted. Now, this which function you have generated here this into P 1 plus parallel common now these terms I am calculating so I generate C 2 I use the last C which contains all these terms. So, I use that term as one of the input and generate C 3 use this term to create C 4. So, now, you can see from here 4 plus 3 7 plus 3 10 plus 14 transistors are required with much faster speed than any ripple carry. This is domino in the sense why it is advantages because it will reduce power because at no time P n n channel at least the static power or off power will be smaller. So, one can see a multi output domino C mass can implement a C l a kind of structure in much low power and relatively high speeds. Now, we looked into little differently the look ahead we said can you know this is a standard carry what we do proportion to in any way we do block wise we say we do only for a 0 a 1 create something this a 2 a 3 4 blocks they will generate 2 each of them then use those 2 to generate this. So, this is called trees and if you see this 2 into 4 into 2 into 1 this is like a logarithmic system. So, if you look at now you do partial evaluations and keep calculating ahead because after all carry is not required any way. So, you can keep doing in partial sums and add next to that if you do instead of series connections like this this will be proportion to lock. These tree structures which is what is very popular is also uses a logarithmic algorithm which is called ELM logarithmic models. The biggest advantages of ELM logarithmic ELM which I can show you here in my case there are 4 levels for 8 bit adder a 0 b 0 you do a evaluation here and pass some output immediately create carry or create next PNGs take a 1 b 1 and now work with them create s 1 and go ahead keep doing like this and you create s 0 s 1 s 2 s 3 s 3 s 4 s 5 s 7 in 1 2 3 4 stages instead of s. So, it is a logarithmic system in which one can reduce the speed simply because you are doing parallel operations. This is called tree structure additions and this is very popular because all tree structures will reduce the average time and they will be useful much more in something called multipliers little later when we discuss. Now, if I do for all adders shown so far the most common Manchester carry or Ripple carry adders then the modified tree adders see ELM and we have a CLA which is standard win burger system. There is a comparison taken from Roy's papers this is a 32 bit addition for the simplicity they are used very high voltages just to separate things actually they were working on 1 micron 0.25 micron process. So, 3.3 volt 100 megahertz is the frequency they used and lambda as I said is 0.25. So, this is area number of transistors delay nanoseconds and power dissipation average power dissipation. So, you can see from here for a CLA for the same 8 bit 32 bit addition you are 2.27 into power 6 lambda square as the area transistor required as 2132 the average delay is 15 nanoseconds and power dissipation in milliwax is 115 or 114.6 milliwax. If you do a typical RCA then the area which is the smallest in fact 0.80 into power 6 lambda square requires 1204 least number of transistors, but the delay is 55 nanoseconds power is also really really smaller. If you look at ELM between these two that is why I put it in between it has slightly larger area than CLA because you are doing square turns 3 turns. It requires even if area little larger the number of transistors is slightly lower than CLA it gives delay of 10 nanoseconds which is better than CLA and it has a power dissipation of 104 milliwax now you can see from here RCA has 87 milliwax CLA is 114 ELM has 104 this is 55 nanosecond delay in RCA 15 in CLA this is 10 transistor of course area wise this is much higher than but it is low as much as CLA. So, in most binary tree adder requirements as in the case of multipliers valestree multipliers if you are going to use an CLA as your adder then most likely ELM version of that adder will be used. This is very relevant because people should know when to use what. So, I just want to give numbers so that you know this. The same thing I discussed here just to give an idea ripple block carry look ahead adder this is what essentially ELM is the idea of ripple block carry look ahead addition is lesser than pan in and pan out difficulty inherent in carry look ahead. The block size M is fixed to 4 in generators RCA design is obtained by using multiple levels of carry look ahead if there are 5 or more blocks in RCA 4 blocks are grouped into a single super block this is what logarithmic systems are combined. Now, if you look at this there is an equivalent method was suggested for CLA and that is called parallel prefix adders you see into this latter. Now, what is parallel prefix adder is going to say it uses some kind of a little algebra which is different from normal. Parallel prefix adders are constructed out of fundamental carry operators denoted by C slash as we call a C carry or C whatever I do not know what I should say C slash I would say. Now, the way definition is that G's and P's are the carries and propagations indicated generations and propagations then G dash P dash contained in G dash P dash equal to and that the expression they derived this is G double dash plus G dash P dash this. Now, the fundamental carry operator is represented as shown next because this will be more understood to you if you given this kind of a operator C bar or C slash you give G double dash P double dash G double this it will create an output which is of G and P where G is G double dash G dash P double dash please look at what we are trying to do. We have looked into the expressions of carry look ahead by expansion Wienberg expansion we are looking into terms which will get G plus G P G i plus 1 G i P i plus and you see that the G terms contains G i P i products and P terms contains only P terms there. So, block carry and block generate. So, we are just trying to utilize those terms we say where G is G double dash G dash P i and P is same what we said in the case of block this is same thing I am talking again, but this is what it generates this kind of operations can be directly generated this is what the parallel prefix adders are. There is something called associative law which D O T operators allowed and one can see G 32 P 32 can be written as G 3 into P 3 dot G 2 into P this is associative law or C O 3 0 can be written as G 3 P 3 G 2 P 2 G 1 P 1 G 2 into C i 1 0. Now, this interesting mathematics can be implemented something like this the block G and P which we have derived is equal to G 0 P 0 if i 0 if not it is G i P i into G i minus this if i is less than n minus 1 or greater than 1. Now, if you look at very carefully using this i's the C i is always G i C i will be always the part of the G i. Now, this is very interesting. Now, you can see from here this is the first adder which uses this prefix adder's technique is called coccstone. I will not go into detail of this first let me show you what it does there are three block shown to you here in coccstone adder. Let us say I want to add a 1 0 0 1 with b 1 1 0 0 and I know normal adder I am expect addition is expecting is 1 0 and 1 is 1 0 and 0 is 0 1 and this is 1 and 1 and 1 is 1 plus carry 1. So, 1 and 1 is 0 but carry 1. So, I expect a sum as 1 0 0 and please remember this is the last carry is the fifth bit which I know anyway in additions. Now, I figure out there are three blocks I generate the first blocks are normal addition blocks you can say or P G generating blocks the one prefix adder's part of the normal adder's I said P and G. So, what is the legion for this this P and G block receives a i b i as the input and it says the P is a i x r b i G is a i and b i and then c i is g i and s i is p i x r c i c i minus 1 last carry. The yellow blocks which are the prefix adder's block c blocks as we say it receives previous P previous G and it receives the current P and current G this is what say P double dash d double dash P dash G dash just now I said and this is that c slash adder c slash operator. So, what it will generate it now says P is equal to P i and P i inverse and of this and G is equal to P i and G previous or G i those expression which I wrote can be expressed in this and this is called transmission blocks in which it receives P i G i and transmits as they are. Now, if you see here the way it is done I have fed this a 0 b 0 a 1 b 1 0 1 0 0 0 0 1 1 1 to these normal adder blocks. So, now this will create let us say this will create P and G which is 1 0 you can see here y 1 and 0 the expression I already written this this it valid creates 1 and 0 and in the next this it just transmits. Now, we also know the c i is equal to G. So, if c g is 0 output is 0. So, carry is 0 now for the next block yellow block we need the last P to get the P. So, I already generated P and G for the last. So, I use them and use them to this current P and G using this expression I evaluate both P and G here which I get 0 0 please remember take it as example this is P i and P i inverse 0 into 1 is 0 and G is P i and P G inverse. So, it is 0 into 1 is still 0. So, 0 0 now this I propagate down. So, I get next c as my G i which is 0 because this is 0. So, output is 0 carry 0 for the third block I need this is 1 0 I need the last coming from here P i G i inverse I get 0 0 transmit 0. However, for this when I do this I take this calculations for this calculation I need please remember P i plus 2 last to last 1 because unless that is known I cannot generate the c 2 because you know third term is required P 1 P 2 P 3rd term. So, last 2 terms I needed. So, I create 2 term from here and reach here I need 3 terms. So, 1 from here 1 from here 1 from here. So, I get 3 terms to evaluate and I get the output carry c. Some I can directly get from the expressions which P i X R which is not shown here some can be directly parallel out here itself. So, now every time I do an operation I create some and I generate c and then I can get this. So, what is the advantage of this Coq stone method? You can see from here you can feed all the data 16 bit 32 bit any number of bits data and you can create partial sums or sums 0s this and transfer P and G for the next one P and G to next one P and G to next one transmit keep down going down and creep create new sums and new carries or rather new P's and new G's and G's are finally c's. So, we always know what we are actually doing. So, using this kind of structure the net delay essentially is not essential because there is no carry propagation. Since there is no carry propagation the dot products or dot operators allow you to create parallel sums and parallel carries simultaneously. A typical ripple block carries 64 bit if you I do not want to discuss it same as this you can now require 6 stages 2 to the power 6 stages. You can see this figure you starts 64 bits like this keep doing pass on pass on pass on pass on. So, the larger version of 64 bit carry block carry look ahead will look something like this using prefix adders. Another advantage of using domino circuits is shown here for generating P and P this is a propagate signal and this is a generate signal. You can see why I actually shown you here a circuit immediately because I thought that it is not the static wave will do it in most cases you will like to reduce the transistor themselves and therefore, power and to minimize and we will always require P and G generation. So, you can use P and G blocks in the domino form as shown here. You can create domino sums out of this number of transistors as we show you. So, normally the blocks which I showed you may be using domino circuits rather than the normal static CMOS or simple dynamic CMOS. Now, I come back to you in a different kind of adder which is very popular in the case of multiplier before we quit this area is called carry save adder which is very relevant for multipliers not so much may be for adders though it is important. A carry save adder is a digital adder it uses use to sum three or more binary numbers output two numbers of equal dimensions as the inputs. This unique dual output consist of one sequence of partial sum and one sequence of carry bits. I will actually this is the statement I make. I will give examples to show you what I am talking. The most important application is calculating partial products in integer multiplication like valestry multiplier I just now said because please remember in multiplier you need partial products and then you will sum them. So, in those generation of and remember partial products are nothing but AND gates and therefore, this kind of tree structure helps to generate very fast additions. Additional application include adding a large number of inputs to you can have more than three input four inputs simultaneously preferable for addition or more than three inputs addition generates two outputs no carry propagates which is the biggest thing it can do. It has advantages it produces all of its outputs in parallel resulting in the same delay as the full adder. Very little propagation delay when implemented carry save adder plus a ripple adder additional adder will be required to make n plus 1 add total if n are carry save adder for n bit then you need additional final ripple adder to make total sum. Two ripple carry adders will require two n terms allow please remember that is the number if I have two such thing I require two n otherwise I will require only n plus 1 and that is the trick of the trade and it require it has it allows you very high speeds of course disadvantage we do not know whether the result is positive or negative no signs no sign digits this is a drawback when performing modular multiplications is you do not know whether the intermediate result is greater or less than modulus. Now, I will give you example before this is a typically I must show you a typical full adder in a way can be converted to a CSA. A CSA has three inputs two outputs x y z are inputs and c n s are the outputs you can see here if this scene is introduced here then it becomes three input and if this c out is taken as if from here it generates like this though it is not identical the way I said but it is worth seeing that a full adder in a in a form is easily convertible to a CSA here is my simple examples which will prove the metal. Now, first thing I show you because since this is not a very popular adder as far as adder is concerned let me do a addition with a decimal numbers to prove the principle and then show you one of the binary carry save adder a typical carry save adder addition process will be you have a block call CSA which receives an input x 0 x 1 x 2 generates two output call c and s these two along with the actual input carry which is normal c p or c l a can generate your sum and carry. So, this is the final addition block and this generates this reduces your initial inputs to a lower number of outputs here is an example which will verify what this CSA is doing. Let us take I have a three bit numbers three numbers x y z and in decimal they are in tens of thousands for example, x is 43201 y is 1672 and z is 28643 and I want to add. So, how do I add in normal decimal numbers I say 3 plus 2 plus 1 is 6 7 plus 4 is 11. So, 1 I put 1 carry here then I say 6 plus 6 12 plus 2 14 plus 1 15 carry 1 8 plus 1 plus 3 12 plus 13 1 5 plus 2 7. So, I got and no carry because the last digit. So, I say the sum of these three numbers is 73516 this is what decimal does. Now, how CLA will do the same operation as a CSA here is what CSA does it does this operation in two steps it evaluates term as by just adding them x y z with no idea what carry it is generating is that. For example, 1 plus 2 plus 3 is 6, but 7 plus 4 is 11 so, I damn care what is carry I just leave 1 and forget 1 6 plus 6 12 plus 2 14 I put 4 leave 1 8 plus 1 9 plus 3 2 leave 1 and 4 plus 2 is 6. So, the first addition I performed without bothering the carry. So, the first thing I did. So, S term which I got is 62416 please remember the first step is to evaluate sum of the actual numbers bit by bit without taking care of any carry it is only vertical line whatever it comes the LSB is retained. Then we calculate the second term which is C I already said if I have three inputs x y z I will create C and S to output. So, C S I have calculated now I will calculate C. Now, instead of looking at what has been available to us as the sum part I only look for carries the only thing I do is since the carry generated in the first column is transferred to the one next column there is. So, whenever the carry is available to me I actually write the output carry below that column. So, for example, if I do this addition and I do not see carry. So, I put it 0 if I do this I generate 1 carry I put it here I have generate 1 carry I put it here I generate 1 carry I put it here I do not generate any carry if I put it 0. So, I have actually shifted my carry as 0 1 1 as I did like this. So, two steps first I calculated S without worrying C next I calculate C without worrying S 2 steps then I actually add C and S. So, I say 6 1 5 3 7 73,516 the same sum no carry this is number is same. So, now I have an idea that well if I have a larger number of bits I can keep doing this operation and finally, at then I may put the initial carry and find out what has happened. So, this is the trick which carry safe does and here is example for binary same example I have a three numbers 1 0 1 1 1 0 1 y is 1 1 0 0 0 1 z is 0 1 1 0 0 which represents 45 49 28 number 1 22 is required if I would have done a normal binary addition I would have got 1 1 1 1 0 1 0 which is essentially equal to 1 22. Now, and this final one is coming at the output carry. Now, what do we do in the case of CSA carry safe adders we first calculate S 1 and 1 0 0 0 1 and 1 0 1 and 1 0 1 and 1 0 1 and 1 0. So, S is all 0's calculate C please remember in calculating C we do not worry about what S was 1 and 1 will generate carry 0 and 0 will not 1 and 1 will generate carry 1 and 1 will generate carry 1 and 1 will generate carry and the last one which has been generated is kept here. Please remember 1 and 1, but 1 will transfer. So, last one now this is C now you as I my next third step was to add S and C. So, 0 0 0 0 and please remember I have already shifted 1. So, keep that as it is add this you get 1 1 1 1 0 1 1 0 which is 1 22 in decimal. So, the operation which I did please remember I have done two operations generating C and S and final addition to get my final sum and output carry. Now, why it saves time or why is so important can be visible from this tree adder which is a 8 bit CSA. You can see from here a 3 bit 8 bit CSA as x 0 to x 7 8 bits. So, first CSA takes care of x 5 x 6 x 7 generate S 1 C 1 the next CSA takes S 1 C 1 takes x 4 generate S 2 C 2 next CSA take x 3 3 generate S 3 C 3 then take x 2 generate S 4 C 4 takes x 1 and finally it generate S 5 and take x 0 and you get S 5 C 5 then add S 5 C 5 of your actual input carry if you have do a normal addition if input carry exist and you have some in this output. Now, based on this tree adder this here is what a typical valestree was done valestree allows as I say it is like a ELM systems it is a tree additions simultaneously I do a operation of 3 bits here and 3 bits here generate S 1 C 1 and then this x 1 I gave it here and x 0 I give it to the next block. So, I generate S 3 C 3 S 4 C 4 then in next S 3 C 3 in one of the S 4 I brought here and in next I brought the C 4 here and I generate S 6 then you add the normal CLA. So, how many stages I gone through 1 2 3 4 5 stages is only required to go through 8 bits of that any other adder will give a larger timing than this. This is like a ELM which I already said it is a parallel processing and this allows us and why it is allowed like this because we are not using any carry in between whatever C and S is getting we are just going ahead with the numbers. We are not waiting for carry and please remember there is of course some new methods can be found you can actually have a look up table available for all possible access and it can be stored in a ROM given address and you have an output. So, this is something a fantastic method of actually doing a addition which can be utilized in a valestry which as I say most famous multiplier is a valestry integer multiplier last, but not the least. Let us go ahead this is same circuit before we go to the last one. Another problem which many of us do not feel very obvious is worry is that if you have a large chain of operations like a filter FIR filters or COM filters or in a DSP system. One finds that the there are large number of adders and multipliers required in any DSPs. However, the problem one sees is as I keep saying for the first bits of operation you have one bit delay. Now, till it generates you will have two bit delay, three bit delay, four bit delay, but at the output you cannot generate all of them till all four bits are available. Now, this means their delays are not equal. So, throughput is weighted anyway of a whole cycle. So, what we do now for each of them we actually equalize the delay. Say for example, for this I put three register delays or three latches for the next one I actually delayed the output here and then I put three here to make equal same thing I keep doing. So, that at a given input once I give all S 3 S 2 S 1 S 0 are available at a given time. The first time it will take all four bit delays, but when the next bit occurs you are anyway moving ahead. So, when the first bit it may require what we call latency of four it will require four clock cycles to reach the output, but once that first clock four cycles are over every clock you have output bits which are equal in times is that correct. This is essentially the advantage of pipelines. This actually has the same throughput though I mean you can see it has a larger throughput in the sense all four bits for everyone will keep coming. It is like putting things in a pipe it will take some time before first thing come out, but once in then the continuous flow is available. So, the frequency of the speed is much higher except for the first four cycles. It also allows you to data at a given time available for the next processing. So, in many cases irrespective whether which kind of this pluses you use please remember this plus can be of any kind, but the pipelining may be still done to avoid the delay problems and also to generate equal delay outputs. This is an example using an example of a Noura CMOS a pipelined adder could be done. This is available in Ashramian West you can I just explain what it is. This is essentially Noura I please remember it has two clocks five and five bar runs throughout and it is very interesting some are called fly box some evaluates in five bars. So, it is called a five five bar or Noura or modified domino as sometimes some people call and it shows you one should not forget that adder speed cannot be estimated just by the critical part delays in logic, number of transistor in the path or logic levels in the path. Actually it is very complex system estimating adder speed is much more complex and many of the fast schemes may be actually very slow at the end of the day when the net delays you calculate of the interconnect everything you will figure out that the fastest one was not really the fastest one. So, please always worry about when you design you simulate and only then verify which one for your application is the fastest adder do not go by that CLA with CSA kind or something is the fastest yeah in logic yes, but in reality every other thing should be taken care before you actually look for it. The last and the most interesting adder which has you know which I thought normally people do not talk so much. So, I thought may be in my course of advanced wheeler sign I will talk about that. The present day systems if you see in the real life consumers consumer electronic systems or even other useful systems like microprocessor based systems one one is finding that most of the systems are multi modes that is they both have analog operations as well as digital options called mix modes. Since you have a mix mode operation almost all operations in digital are preferred with two logic binary numbers 0 and 1. So, they are that is why they are actually digits which use binary numbers. Now and since binary numbers are very easy to generate by the two voltage levels almost all digital logic binary logic is based on voltage logic two voltage logic. But if you have any way going to have mix mode signaling then why not have the at least some part of the adder part or some part of the logic or some part of the function which can be done analog way. And one of the major advantage of analog way is you do not actually add anything or subtract or divide in voltages you actually do in currents because analog allows you to add currents much more easily. So, I give you some hints current mode logic was a great candidate for low power high speed valet circuit particularly mix mode circuits pure digital may or may not. Most DSP chips use recursive techniques they based on those algorithms in these systems it is very interesting data that 70 percent of the chip area is dedicated to interconnects and just 30 percent to the logic. It is also therefore advisable to use multi valued logic to transmit on a single line because if you only to this then you have a problems if you can put multiple levels then you can say the data can be separated by their levels. So, a single interconnect then will pass larger data compared to only one bit at a time. So, it is called multi valued logic which is essentially uses of course you can say always have a 0 to 1 volt you can have 0 to 0.4, 0.6, 0.8 as multi valued, but to separate and compare these two voltages separate voltages is very very difficult compared to the current separations and therefore almost all multi valued logic which is used is used current mode signaling and therefore and current modes are very very easy to implement on analog blocks and therefore if I am going to add numbers why should I convert my digital analog number into a digital binary number and then add and then come back to the analog numbers. So, I may directly add a decimal numbers in analog fashion and here is the one which an example may show you that these are the fastest may be fastest I will I never say all the time, but certainly low power certainly low power. It uses current mode logic it has four cells called SDDA single digit decimal adder I repeat it uses current mode multi valued logic I should say so very clearly sorry it uses current mode multi valued logic multi valued multi valued logic it has four SDDA cells single digit decimal adder each SDDA cell contains one current comparator five current mirrors I repeat each SDDA has one current comparator and five current mirrors. Now, what is the principle I am real looking in this add part there a very simple principle is used if you have two wires and you connect them at a common node let us say each is carrying current I 1 and the other one carrying current I 2 let us say I 1 represents one decimal number I 2 represents another decimal number then if you connect them we know by a simple circuit theory that the net current will be I 1 plus I 2 that means corresponding to I 1 plus I 2 which is the addition of I 1 and I 2 which is same equivalent to addition of your equivalent decimal numbers the wiring as above will produce addition of a and b at the common node this is the simplest principle of addition we do in the case of currents please remember if I connect it to voltage do you believe that this will be possible because then it will actually compete each other to get common voltage here you just add. So, for adder application current mode seems to be very very simple solution for you I just now said a typical S D D I cell contains five current mirrors there are two kinds of current mirrors I am using one two current mirrors uses what we call using super MOS transistor I will show you a little later what they are and they are two kinds one is STN other is STP please remember you have two source currents and sink currents possible P channels normally are used for sourcing and channels are normally used for sinking. So, you have two source or sink currents mirrors coming from STN and STP they use super MOS transistor we will show you what they are a typical super MOS transistor is a 22 transistor circuit using a normal folded circuit structures which improves output resistance in a very very large fashion please remember if there is a gain of the cascode gains of 1000 or 10000 then the R output will be 10000 square times the output available for a single MOS transistor. So, it boosts up the output resistance this why are we looking for higher output resistance because for a good current source the resistance of a current source should be infinite as preferable super MOS allows you to do as good a current sourcing or sinking as possible it uses other two current mirrors which are called cascaded current mirrors which are called high swing full swings larger current swings and they are named as HSN and HSP for NNP channels. So, the typical the way we define the typical decimal digit is represented by current levels I already said we are talking a multivalued current logic. So, we say there are let us say 10 digits are 10 0 to 9. So, we have 10 currents 0 micro amp 1 micro amp step of 1 micro amp you can have any other step but example is 1 micro amp they may represent numbers 0 micro amp 0 1 represents 1 9 represent 9 9 micro amp represents 9 in decimal. Now 4 digit adder receives an input current I in which can vary for 0 to 9 the why I have 4 I said you can see from 4 digit adder if you have this if I add two numbers 0 micro at best 7 plus 8 is 17 as a 15 9 plus 9 is 18. So, the highest number you will require is only 19 is that correct. So, about 10 the next number is the two bits when you add is 19 in decimal. So, we now say a 4 bit digit adder that is a 0 a 1 a 2 a 3 a 4 kinds it varies from if you are only adding two bits in decimal you require only 19 steps 0 to 19 that is 20 steps actually. Now, we also know in the case of digit decimal numbers since you only two bits of decimal you can either have a carry of 1 or no carry if 5 plus 7 has a carry 1 3 plus 5 does not have a carry. So, either the carry is 0 or carries 1 there is no other you are never going beyond 20 in this numbers. So, at no time the two carry or three carries are available. So, you have only possibility of two carries either 0 carry or 1 carry 0 stand for 0 micro amps 1 stand for 1 micro amp. So, I have already created requirements I implement a partial sum and output which I will show you a circuit of S D D A you just look at the expression representing that a sum output is equal to input current we will show you this how it happens if i n is less than 1. Please remember i n is nothing but i 1 plus i 2 whatever number you are looking for is that ok. Now, if it is less than 9 the sum will be always equal to the whatever i n you are introduced same sum should come 5 plus 4 is the 9. So, sum is 9 if it is 7 plus 8 then the sum is 15 which means you have 10 above. So, i n minus 10 is the new sum now you generate carry out of it then the sum term is only 5. So, S out is i n minus 10 if i n is greater than 10 by same logic C out can be always 0 if i n is less than 9 because 5 plus 4 3 plus 5 any time no carry anything beyond 10 will be carry. So, you say C out is 0 if i n is less than 9 C out is 1 if i n is less than equal to 10 quickly we show you the circuit and we will switch off this adder part. Here is 4 current I already told you that there are mirrors this is S T P this is H S P this is H S N this is S T N this is H S P 2. Now, take the circuit from here the first one has a sink current of 1 micron the second one has a 10 micron these are that switch over above 10 anything above i n greater than 10 or less than 10. So, we have 2 currents created for that 1 and 10 please remember 1 stand for carry 1 and 0 stand for carry 0. So, what we do is the following if it is more than 10 you have to subtract and generate 1 for that. So, this is what we are doing we create a reference current which is 19 by 2 the maximum current divided by 2 which is calls. So, that 50 50 above and below. So, 9.5 and we have the input current which is coming from where i 1 plus i 2 addition of that which is my input current which can be 0 to 19 any number which is sourcing to S T N current mirror please remember current can always be created out of a standard 2 to 4 transistor circuits Vildahl current source Wilson current source or even normal current source or current mirrors. Here is a current comparator the way it operates the way it operates is if your i n is less than 9 ok less than 9 we know S out going to be 1 sorry S out is going to be i n. So, I want this output at that time you want C out to be 0. So, switch 1 and switch 2 let us remember this is less than 9 you switch off this both of them output remains 0 initially no pass. But it would have been greater than 10 it would have pulled it down and this 1 would have transmitted is that clear. So, the carry 0 and 1 can be created depending on the C C output some can be created from this if it is 10 micro amp and this which closes down this is 10. Now, please remember this is anything between 10 to 19 and if you see the sum has to be either i n or i n minus 10 depending on whether this condition is more than 9.5 or less than 9.5 this sum will be either this sum or coming from this sum. So, this adder allows you single digit adder allows you to do this function you can go and look yourself once again then you will verify. You want some to be i n if this is current source and let us say this is less than 10 or 9 if it is less than 9 this of course would have actually this is less than 9 means this is 0. So, whatever is there will be output it if it is more and greater than this will be output it which will switch on this and then the 10 will be output it is that clear. So, that is the way the logic goes is that clear. Now, how to implement the 4 digit here are the last slide for this adder part you have 4 SDDA shown here SDDA 1 SDDA 2 the first 2 bits a 0 b 0 which means currents equivalent of that this is our normal adder which is nothing but connecting wired end and nothing more. Please add is nothing there in circuit just connect this creates your i n for this SDDA we have just seen the operation of SDDA it will create output carry and the first sum as 0 is out carry is generated next 2 bits add and now add this addition add additional current c please remember I can add any number because these are 3 parallel current paths I joined the sum 3 will actually appear the only thing you should see here all are in the same directions it should not subtract there otherwise there is a hogging problems as soon as I get a 1 b 1 plus whatever see there I get new i n immediately creates s out with a less than 10 or less this generates new c 1 or 0 whichever it is take next 2 bits use this c find out again generate c last 2 bits generate this. So, you get s 5 s 3 s 2 s 1 s 0 please remember last c out is your s 5. So, you have created without going through any logic per say as voltage logic does only adding operations or comparing operations and we can generate the 4 digit decimal adders. Now, this is a current mode please remember these are small in analog transistors the currents could be less than microns or tens of nano ions therefore, the power dissipation will be extremely small. So, in any case since there are current driven they are there are not capacitively charging or discharging on them they are just passing the current through it on the circuit. So, there is no question of propagation delays through that there is a stability issue which I did not discuss any analog has an issue, but otherwise current mode multivalued logic can create a decimal adder which is extremely low power very high speed which can compete any of the digital block any time is that ok. So, in a multi mode system which requires both analog digital applications there sometimes there is a even an advantage of converting your digital data to decimal like a b c d converters and actually do decimal operations and then go back to decimal to binary converters which may still be faster and low power. So, this is the new technique which in the current technology of 2012 find more and more use because as we say effort in 2000 onwards is to reduce low power. We are working on 0.6 volt process now small channel devices all kinds of things, but current requirements may not be large high because in a all digital logic that is the major issue on current has to be high enough to reduce the charging or discharge time. Here we do not require large currents because we are not using charge discharge techniques therefore, they will be always high speed they will be always low power, but then why do not you use use as I say is there are many analog issues which probably we are missing here and we do not want to discuss in this part of the added course some other time some other day. Thank you for the day.