 So, we talk today something about adders, specifically we are looking for arithmetic, but one and half hour is not sufficient to cover all parts of arithmetic. So, I will just give you some flavor, but I talk something more about adders. My talk will have some part of introduction, then I will talk about arithmetic as an operations and then types of adders, then I will look into adder circuit from VLSI point of views. See some CMOS implementations and I will give some comments and finally references. If you see any digital processor it has what you can say some input output data, it has one memory in it, it has a control block and a data path. So, essentially when you input something either you load into some kind of temporary memories or some kind of RAMs or ROMs and data is then transferred to some arithmetic block which essentially is data path and then relocated back into memories and control is through your CPU part. So, essentially we are interested in this part of the lecture only in the data paths, processors there are many who may be more expert than me. So, typically as I repeat again you have arithmetic unit which is normally has a bit slice data path which has some adders, multiplier, shifters, comparators and things of that kind, your memories RAM, ROM, buffer, shift registers, you have control which is typically FSM which is based on either random logic or PLH, then your controls also done from timing counters and then of course your interconnect switches, arbiters and buses. So, our interest as I say is only right now in this part, some other day maybe we can discuss the last part which is very very important in real life. This is an Intel microprocessor nothing great to show here, it is Italian has six integer execution units like this and you can see there are some muxes, there is some units which is registers, sums, cash, so these are essentially logical units which is carry generation, sum generation, so arithmetic is most important. A typical bit slice design you have a data in which is registered pass through adder, shifters and multipliers and you are an output. So, essentially if you see very carefully even the multiplier part is consist of only adders. So, if I am really looking for a design, I am really looking for a design of an adder in case of CMOS implementation or any MOS or bipolar implementations. This is a lecture given somewhere else, so I just copied, so there are many other things there. This is my Italian integer data path and you can see whatever I shown here are also shown on the block diagrams and this is roughly 4000 micron of length which is typically few millimeters. Why I brought this slide is since many of you may not be just working on microprocessor based systems, but most of you may be actually working on DSPs, maybe for filtering or maybe for image processing or most of the applications these days you see trans receivers whichever you see at the end there is a DSP block. So, I thought when we are actually designing an adder or arithmetic we should also look into what DSP wants because microprocessor adders we know how much what to do and what timings it has. Whereas DSP slightly it is a more general purpose, microprocessor more general purpose this is specific and therefore we like to see for example applications of DSP in communication 3G and of course now 4G may appear 3G has just come many other wireless block WCDMA, WLANs and all kinds radars and there are lot of consumer applications robotics, speech, music, audio, video. So, actually the money right now in VLSI is on the more DSP applications rather than really microprocessors Intel has itself now almost thought that they are the only people who will make microprocessor AMD is trying very hard lost in this quarter lot of money. So, I am not sure what will happen next quarter Intel has huge profit this quarter and they have lost. So, this is very happy news for Intel bad for AMD. So, what exactly we are issues for a DSP there are 4 things we look into a DSP chip design which is one is the battery size if it is a mobile kind of system what size of power supply you are getting how much programmability you want to provide do you want to speed up an operation by making parallel architectures and what clock rates one can use system clocks. So, a design of a arithmetic is essentially governed by what this DSP applications are depends on one of the applications what rates they are talking what power they have one can design a different arithmetic for them. So, when I implement a VLSI chip of a DSP system since there is a complex signal processing is done in DSP operations most of them are frequency sensitive operations we do FIR filters we do IR filters we do all filtering in most cases essentially they are more time dependent as our time frequency transformations are required FFTs we do DCTs we do so essentially if you see the algorithms either they are standard algorithms or may be ASIC DSPs or it can be nowadays FPGA based reconfigured ones. So, in all this complex design flow if you see if you know the behavior description one knows what algorithms you are using then one can decide the hardware. So, the idea here just to give an introduction to this saying that do not go by only saying I use carry save adder or carry multi this bypass adders or this is better or this is worse for an application which adder will give better performance what you are looking for should be actually used not just that you will be saying that there are two things we have to worry about all VLSI people worry about area delay essentially speed and power whereas DSP people more look for programmability so they are two now not same so what happens VLSI designs normally are optimal designs we try to minimize power we try to improve speed and as small an area as we could whereas DSP since you want a programmability some part may not be used some things has to be kept just for case someone needs so it is a non-optimal designs. So, when I am designing a chip for a DSP application it is not same as what I would have designed for a micro processor. So, this fact has to be brought in mind in real life otherwise you know you can say I will pick it from the book this is the adder it may or may not be I am not saying it will not be but depends on a specific applications this is a typical system and chip design approach is shown here you have a system specs then your system design chip design manufacture this is typical flow of any design we normally go for design which is more structured because reduce complexity we do lot of partitioning so that top down part top and down parts are separated and show you maybe and we try to reach middle of the path example is specs validation algorithm scheduling architecture floor plan is the upper part which is more from the system side and if you see lower side gate sizing modules cells layers these are VLSI parts. So, when you are designing you should come partly from the top you should come partly come up from the bottom side and meet in between so that you try to get as good a complex circuit done with a better performance from the VLSI and that is the design issues and therefore this is not to only for adders or any this is application for everywhere so you have to optimize which hardware I have to use for a specific requirement. So, ideally one should not think say if I use the best adder it should do good everywhere but there is a money part going on many a times larger or better circuits actually cost you hell and therefore that chip may never be sold because the cost of chip itself may be very high or something non-standard custom design you do it may be very very costly for specifications so you mean still may not be able to actually sell the chip. So, the design is as good as the money you hold so this is the crux of all designers that they should not get carried away by you know theory or you know someone I in the course I will say oh this is very good this is very good in the context of the last adder I should or last multiplier I should but this may not be ideal for your specification. So, never get into an idea that if I know one or two kinds I will be able to because different specifications will require different kinds of architectures and different kinds of hardware's. The major worry with DSP as most people realize is this real time and you are really looking for large throughput and of course one interesting part we shall see later in the case of microprocessors since the system clock is running at may be 1.73.4 whatever gigahertz most of the hardware has to run around that and we are we cannot actually tolerate latencies there very much because we are throughput is I mean we have to we have to balance both latency and throughput very strongly whereas if you see a DSP structure may be one of them I may show latency is not very dominant but we are looking for throughputs that is major worry. So, first thing we define differentiate between a microprocessor and DSP processors as such is there the latency is not is an issue in a microprocessor whereas in the case of DSP latency is hardly an issue where throughput is of course both needs low power any system needs low power as a VLSI man I like to have a smaller size of die because it will cost me lower I should be able to upgrade my design that is most important because once I design if someone just I do not have to redesign everything so it should be feel upgradability user end that is most important user should be able to program it also it should be customizable in the sense that if the block I see any one tomorrow I must have another standard cell equivalent which I can replace that it should be feel testable not only I will test on my side but I will also give you the whole block you may actually have both hardware as well as software testable and at the end of the day the time for making that system design should be very low because otherwise it cost you help so and finally of course it should be modifiable. So, all these criteria should be chosen these are nothing to do with what I am going to talk now later this is just to give an introduction to many of the hardware people who come maybe like there is a someone said from Bipro or someone because they normally use IPs from somewhere and they fit it and sometimes it works sometime it does not replace the IP and redo it but that is not very ideal if ideal designers of course I do not say all we produce that please do not get that most designers at least the new ones they think that you know if I have this I can this no it is not true if you want to optimize something cost wise performance wise you want to improve then you will have to think twice what you have to do. So, these are the issues which any designer of any kind of chip particularly DSP is require many more things you have to worry about. So, there are a rebates book which you all hold there are three laws which a rebate has suggested which is what I have already stated the right structure has to be chosen for functional unit before attempting optimizations the rule 2 says the critical path of the circuit which should be identified and length should be minimized and the third rebates rule is only the number of transfer does not decide circuit size interconnects actually design most cases decide the performance as well as the size okay we may please skip this just to show you there are number of number systems for example you have a number which is written as X with WD which is written WI Xi so it is like a radix number r to the power i is your word you have different kinds of arithmetics which you can use in the case of DSP for example you can use sine magnitude numbers you can use once compliment number you can use two's compliment numbers you can use binary offset numbers. So, this is again an issue which arithmetic be used to actually implement in hardware generally as I say if you see most system they use two's complement and you can see from here an example if you take addition of two numbers overflow does not matter actually that is very important because otherwise you will have to keep additional this to see overflows. So, it so happens that two's complement actually allows you to do partial sums without much worry on overflows you can see the example shown here you can try yourself and very well there is another system which many of the chip designers use why I am showing I repeatedly telling you that options need not be only from the circuit side options should be also from the algorithm side and from the number side which will make what adders of what kind of multiplications you should be multipliers you should use. There are of course three kinds of representation digit codes are known which these are called redundant number system sine digit code SDC calonic sine digit code and online arithmetic just to show you these are the availability for you. Another system other than digit codes which uses operations which are called residue number systems most of our problems in arithmetic is particularly in adders is the carry because unless you generate a carry you do not know next what to add on and this carry generation takes all the adders which I show you is the major worry is the carry. So, we are only looking for carry generation in all adders the other some part is very trivial in fact in most cases is the carry part which is actually reducing the speed. So, if you make an arithmetic which does not actually do any carry you are faster this system which I shown here is a very old Chinese remainder theorem which uses what is called residue numbers in which carries are never required and therefore, relatively these will be faster of course to generate this residue number itself may be hardware intensive. So, some power may be lost in generating those, but doing that I assure you that the speed certainly can be improved very fast just to give an idea we start with some called modulo I mean modulo is a choice which is basis we choose say for example, I choose a base which is 5 3 2 may be 5 3 2 then we say the maximum bits data it can store is the product of the 3 that is 5 into 3 into 2 is 30. So, total numbers which this is RNS modulo can calculate is up to 30 and I give an example how do I calculate this is just for the additional part I am not saying this is adding anything just to give an idea what actual designers think many of us we think that you know since the rebe has given this I will follow or ashrangan has given it or many papers have appeared, but actual designers see very differently many things that is why highest designers a highest bed designer is one which has the highest experience because he has something in mind which others is not documenting itself on a system and therefore, designers experience matters a hell or may matters a lot in fact. So, if I see 9 I want to represent 9 in this modulo which is say 5 3 2 all that I do I divide 9 by each of the modulo number divide by 5 divide by 3 and divide by 2. So, if I divide by 5 the residue is 4 if I divide by 3 residue is 0 if I divide by 2 residues 1. So, I say 9 is represented in RNS as 4 0 1 by same logic I can find any number which has represented its residues. Of course, if you choose other modulo it will have different numbers, but for a given modulo these numbers are fixed now. So, I know this number is 27 this number is 26 in residue forms. So, given an example if I do a addition of 9 and 19 9 in the case of is 4 0 1 in the case of 19 is 4 1 1 if I add them all that I have to add them individual residues 4 plus 4 0 plus 1 1 plus 1 for the modulo this first one is modulo for 5 second for modulo 3 a part of 3 and third is 2 and if you do it 8 8 by 5 residue 3 1 by 3 residue 1 1 plus 1 2 by 2 is 0. So, you have a number after adding is 3 1 0 which in 28 which is equivalent of 28 which is what you are looking for by same logic you can see the multiplier of 8 in 3 3 2 0 3 0 1. Now, multiply each number and you will find 4 0 0 at the output which is 24. So, this idea was very important and this has helped a lot in speed up, but as I say depends on the bit size you are using. So, for DSP this works very well because you are recursive systems and there it works very well, but for a micro processor no one uses RNS. So, now come to what we are looking for is arithmetic and if you see arithmetic there are two kinds mostly popular bit serial and bit parallel all of you are aware and may be expert on that more than me it requires less hardware and therefore, less silicon area and in the case of bit parallel it is much easier to implement it also looks to be faster because this parallel processor it is a high hoe high hoe it is off work we go. So, if you have a bit task to handle if 10 people lift at a time that task can be handled in one go that is the idea behind bit parallel. However, this is again a catch in reality ratio of speed in two cases is not very different everyone is telling all the world that parallel processors are faster and here is a objection from my side at least on most side parallel implementation normally has a long carry chains a long paths propagation paths area bit parallel chip will be largely governed by world lengths how much time it is and both are roughly similar in power dissipations. So, therefore, just to use a serial adder or a serial multiplier or serial this all only parallel may not be the ideal solutions and therefore, you should try to use circuits designs which are more serious parallel calculations this has the ideal situations not every time you will be able to do the last of this part I will show you there is this standard carry say adders or some other multipliers they do use this bit serial bit parallel viewing pipelines. So, we will show you that kind of structure essentially is ideal for many applications again I repeat this is decided by specs you have a serial bit specs are sufficient for you better go for it if parallel is the way go for it, but if you are do not know ideally anything. So, you can try the combinations a typical bit serial adder is very simple you have a full adder circuit shown here full adder circuit shown here this is two inputs on X I Y I I can be 1 2 1 4 8 16 any numbers any bits you can have the it will create some and it will create a carry which will actually be stored on a flip flop or a register and every clock it is returned back as a last carry to do a next operations and this is how number of bits. So, you need a register on S I you need a register on C I and you keep doing that operations as many times inputs appear bits appear and you are finally the register you store your sum and final carry of course, that carry itself is the last bit of your sum ok MSB if you want to do a subtractor all that you did you put a one of them as C X minus Y if you want put Y as complement of that and you can then start doing the same operations. So, mathematically full adders are essentially says X I X R Y I X R C I minus 1 I minus 1 is the last carry and the C I which is the due carry generated is actually a majority of X I Y I C I minus 1 the I repeat majority gate is 1 in which of the 2 of the 3 is 1 the output is 1. In the case of bit parallel arithmetic here is something which we do we of course, this will come back again essentially what we are now saying that your S is a A let us say A and B are input and C 0 is your initial carry. So, S is A X R B X R C 0 and news carry is A dot B A C 0, but this itself can be represented in this format and I think we will come back this little later this because I am going to discuss this anyway. In most applications you may actually use this kind of hardware for a fixed point integer operations probably this kind of operations which is essentially using a serial adder in the operations. So, Y serial so, many operations in you find can be done much easier because there is a data coming at certain rate and therefore, that time is sufficient to go through a bit calculations. So, this is my introduction now comes to Brazztak why I give this table many books give it may be same form if you see very carefully this table and that is what we want to study this table and based on the study of this table will be able to see that we have different adders to design. Let us look at the first table A 0 B 0 C I 0 we see sum is 0 and carry is 0. However, if A is 0 B 0 and C I is 1 the sum is 1, but carry is 0. So, essentially what did I say that if A and B both are 0 I am going to have no carry this is my. So, this I call delete that may I do not need carry because I know if A is 0 and B is 0 what even if input carry is 0 or 1 the output carry is going to be 0. So, I do not have to need and that operation I call delete if you see the next numbers are the 4 of them one of the bit is 1 for example, in the case of third B is 1 A 0 in this case same except that in the first case the carry is 0 next case carry is 1. So, in such case I figured out that carry whatever is your C I will be C 0 if any one of the input is 0 or 1 take the last next to 1 also 1 0 it can if either of the bits are 1 not both then depends on C I the C O will be same. So, I will also figure out that C I propagates the first time I saw when A is 0 B is 0 I say I do not need second time I say if A and B are different then I am always get whatever input carry will transfer to output and if you see the last 2 which says 1 1. So, it says if the carry is 1 then the output is 1 otherwise even if the carry is 0 or 1 the out carries always 1 that is you are always generating the carry. So, if now I have figured out that if I see this functions and represent this table in a functional form then I can implement any additions much faster because I know this procedure I figured out this is your full adder and this does this way. So, using this technique of course this is the standard binary adder I just wrote down what we all these years our second year student keep telling some is A XOR B XOR C I and can be expanded XOR functions and C 0 can be written as A B B C I A C I, but what we now want to express is not that form and that is the table which I showed I want to put them into some kind of expressions. So, I say I need I can express my sum and carry in 3 functions P G and T. So, we say G the variable G is essentially A B we define G as A B we define propagated function P as A XOR B and we define the delete function A bar dot B bar. So, if I use this and go back and see that expression which is this expression then I can see the sum now is which is function of G and P is nothing but P XOR C I because you can see A XOR B is P XOR C I so that is very simple whereas if you see the carry C 0 now we go back and look for the table. If A is 0 and B is 0 which is that means A B is 0 that is G is 0 then P cannot be 0 because if A is 0 and P is this A XOR whatever B you do both are 0 output is also. So, if A A B is 0 A XOR B essentially does not if that also is 0 so output carry is 0. So, independent of what C I it have this C 0 goes to 0 that was my first 2 on my table. So, I got delete function I say if A and B are 0 both G and P are 0 from the function I see C is always 0 fair enough that table creates it. The next I say A is 1 B is 0 or opposite of that either K 0 1 1 0 now if I say A XOR B exist that is 0 1 means 1 0 means XOR of that means 1 that is P is 1 but A G is always 0 because if A is 0 and B is 1 or B is 0 and A is 1 either case G is always 0. So, if I look at the carry I always see P is 1 G is 0 and only C I. So, I now got C 0 is always C I as long as P function is 1 and therefore P function is 1 therefore G is 0. So, I do not have to worry about once I found out that it is G P is 1 I know it is G is 0. So, I can directly transfer my carry to the output and the last of course was 1 1 in which we say if A is 1 and B is 1 then G is 1 and P has to be 0 then that is what we said P has to be 0 because A XOR B will be 0 and if that is 0 C 0 is G. So, we anyway calculated that so we know what is generated. So, essentially you can see the full adder table was represented by these two generate P functions through this C 0 H 0 functions and that is the crux of all adders. So, if we understand that this is what I am going to use I can have different implementations of adder and I prove that all that I am trying to see that if C I has to propagate if C I has to not to propagate if I has to be killed I will make a decision and making hardware okay here I may kill I may actually allow one if I pass this this is the kind of approach I took in design of an adder is that okay of course some cases but particularly look ahead and this we do define P little differently it is R gate A plus B rather than XOR before that we will come. We as I say this is very simple. So, the first adder in my list which is shown here is the standard ripple carry adder as the words just ripple carry that means the carry is propagating from each block what is essentially means I have well let us say 4 bit adder going on. So, I have 4 full adders chosen here the input to these are A 0 B 0 A 1 B 1 A 2 B 2 A 3 B 3. However, when I do first adder I need an input carry because that has to appear and in general 99.99 unless it is a middle part of that any adder system this will be 0 because initial carry will come with 0 that is initial if it is coming from some other block then it is not initial. So, in some sense initial carry so many hardware assumes that and does it and saves the time because it knows that initial carry is going to be 0 okay that is the trick also mini system as we say look ahead carry this is what we will use again I know initial carries are 0 okay. So, this is exactly how so you can the point I am trying to make simple 2 expressions or 3 input variables and 2 expression this actually constitute the operation of adder and they can be implemented on an hardware to improve your speed of carry that is all that we are doing at the end of the day okay. So, if we see now unless the carry is generated out of first adder which is to be fed to the second unless the that is available the second operation cannot proceed till the second carry is created third cannot do and on so forth. So, you can say as if carries are repelling from one adder to other and if they are in such systems you will require n minus 1 times the that much delay will almost occur because you have nothing else to worry about however the final we can see apart from this n minus 1 that is 4 minus 1 is 3 3 carries times plus some time because the final sum is to be created. So, the total adder time is the final some time plus 3 please take it the second carry time or third carry time has taken care of addition time anyway that is how you generated the next carry. So, we do not have to worry about those adder times the C itself is taking care T carry is taking care of that total time available from here to there which assumes all operations in between okay. So, we say adder operation is very simple it is n minus 1 T carry plus T sum. So, what is our goal we want to make a very fast data paths essentially we want to reduce the adder time. So, there are two ways one is somehow you reduce the speed you improve the speed of summer adder itself which will reduce T sum, but in reality that time is hardware time is not very very high as we can see later whereas the number of bits may decide the speed because if there are n minus 1 n is 4 3 times n is 8 7 times n is 16 15 times. So, as n increases the adder speed will go down and that is our major worry in the ripple carry. Now why people want to still use it this use it because it is very simple to implement whatever functions we see it is much easier for me to put on a CMOS okay whatever C and S I have if I see the function I can without thinking I can put complex gates and I have my adder going on. So, the first ripple carry adder which was available is shown here using a CMOS circuit I remodified my C 0 and S operations in some different form okay this you know I am not able to get bars on this whenever there is a slash appearing before anything it is a bar of that function this is essentially many computers data inputting also required bar of that compliments. So, compliment of A plus B plus C I is C 0 into that okay. So, the implementation of this is you can see first let us implement carry. So, I see a function A B B C I plus A C I oh so if that is your function I say okay here is my function B C I A B this function essentially is implementing C 0 bar this is the bar of this because you know all CMOS or NMOS or pseudo NMOS everyone will give a bar function this has to be understood it will give only a bar function. So, if I put A B C like this I will actually get the compliment outputs. So, at x one can see from here that at x there is a C 0 bar and if you want C 0 you need an inverter but once you see this you are slighted worrying yourself from the sweet side if you see at this node x there are too much capacitance it is seen okay how many there are six gates at that node you can see one here one here one here one here one here one here one here. So, there are six gate input capacitances are coming at node x plus two drains of P channel and N channel their drain capacitance to bulk also is coming plus the wire which is shown here in small one but in real life it may be depends on which line I use poly or metal or which line I have there is a sufficient wire capacitance available. So, essentially now you see just to get your C 0 not only you put so many transistors you can see huge numbers 3 plus 2 5 10 12 transistors to generate your C 0 but you are actually reduce the speed all that you are trying to improve something and here is the device here is the which simplest implementations show you that actually it is slowing down to improve this what you would do you will actually put a b all this transistor larger size and we know other day when I give you a talk larger size means larger logical effort simple both g will go high and h will also go high capacitance will increase. So, you are drawing capacitance is higher and your input capacitance is also higher and your current has to be pushed so you increase the size and you are lost lot of effort. So, the time which you thought you will improve by increasing the size actually did not okay. So, that is my major worry so that means can I not improve speed now okay it shows that oh I may increase size of N channel P channel and I may improve speed you know you do not actually your logical effort actually is not lower okay which essentially now saying that just by improving sizes do not get into feeling that you will improve speeds okay. So, you must change something more similarly if you see a some architecture or some implementation two things both for carry as well as the some part I see if you choose a some way half line where this is appearing the upper part is not same as the lower part. So, it is called non dual something is different above something is different below in ideal layouts if you if you are a CMOS person I would prefer every layout to be symmetric okay and as symmetric I make more chances that I get accurate data accurate values of the W and L's we know in the case of analog today you must have heard from someone that there it is even not only the symmetric but it should also have a centric okay centric and if you do not do symmetry on all sides it is unlikely that your mirror may work like mirror it may show you some other ratio and may show other GM's for all your other opams or other defams. So, essentially layout becomes very very difficult if it is different in two sizes this is also called loading effect we do not probably or you are not aware may be just to give an idea I am a person unfortunately who keeps flipping into a different areas in IIT today I am talking of digital part but today afternoon if I o'clock I will be teaching course in technology where I am going to talk about oxidation models resource model great model groomer and also we will talk about high case. So, the problem there we realize that if the size of the transistor layouts are different etching becomes very difficult it is called layout effects or layout loadings. So, there is a huge variability comes and in lower technologies now that is major worry for us. So, essentially do not go by only what is shown on a circuit when it actually gets implemented you find does not work even half the way you wanted okay and therefore as much as symmetric you can do you have better chance of getting success I am not saying it is still guaranteed there are many other problems but at least there is a more chance of getting correct results. Also you need additional some inverter you can see this is the additional inverter to get s because you are getting s bar so you need s so you need two inverters you just put to get your bar avoided also non-symmetric logical effort is larger. So, this CMOS static ad remember static CMOS is ideal CMOS compared to any other CMOS whatever anyone says okay I mean defeat static CMOS is the ideal CMOS than any other dynamic dvcl or Nora flipper whatever people say it cannot beat static as far as his performance is concerned the only thing it does consumes power and not very fast but otherwise it is very stable very robust designs. So, if you are a first starter you should always work on static designs because that will always work even if it does not reach your exact specs it will work in worst case the actual specs are not available but in otherwise you try and dynamic it may never work okay first time as so dynamic circuits are little touchy tricky and should be handled very carefully. So, I thought I should do a static I tried and I for I said no I am not getting what I want one of the thing I figured out that anyway I am getting inverters inverted C0s and S0s something so why not utilize that itself okay so I figured out that if I have a full adder and I give their inverted inputs I get inverted output size also okay so I said okay so if I already get an inverted I will use that itself for the next stage anyway using the second inverted stage which is showing you the for example S bar A B C I is S A bar B bar C I you can do simple explain expansion and see it is correct same as per C0 so I say okay this inversion property I can now utilize you guys you know I am getting inversion I am inertia putting inverters I remove them so what I do is I have alternate stages called odd and even cells odd cells I put inverted ones even cells I do not put okay so this exploiting the inversion property is essentially what the next adder is going to do this is the architecturally shown here the first one is A0 B0 second is A1 B bar second is A2 B2 third is Ath and so on and so forth now this I use as an adder this thing I utilize now I know C0 GP is G plus PCI I also know some is P X for CI this is what we started with this is the expression we defined for full adder so we wrote them we need not write here just for showing I wrote that okay so what did I do I said okay here is that carry block this red one is your carry block this is summer block first thing you see the way I have plot showed there it is very symmetric across this line if you see this line what is above is what below exactly below okay so it is extremely symmetric so first worry some part I got rid of this I also tried to see that I can improve little bit of speed by actually whatever you want we have seen that full adder table if G is 1 P0 P is 1 G0 I know that okay so I utilize that fact okay so I made a structure which shows for an internal device you have AB in parallel and CI in series to that and complement of that is AB in parallel and CI in series to that okay please remember this is not static because in static AB would have gone in series and see would have come in parallel this is not static structure and that is most important there it is a static logic it is not implemented as static that is the trick we used okay then shunting across I put AB series which would have actually you could say that I did that anyway AB series I put but I put same for P channel and same for N channel symmetry whatever is series there I put series down there now I see this can generate all 4 possible combinations 0 propagate 1 propagate delete what should we call kill or generate sound you just quickly see from the expression if A and B are 0 A and B are 0 we know A and B are 0 means P is 0 G is 0 so and P also is 0 so you do not want anything so one can see from here A is 0 B is 0 both this P channel turns on VDD appears at C 0 okay VDD appears at C 0 which essentially since you can see from here if A and B are 0 so okay so A and B are 0 means P is 0 G is 0 so no carry is required so whatever the last carry it is like a this VD bar this is VDD VDD bar means 0 so I have passed 0 to you and no carry is required so I killed the carry okay I am passing C 0 bar so essentially I am not using 0 but I am passing 1 on that okay so C 0 0 I have created so I killed it if say A is 1 and B is 0 which is XOR case that means P is 1 G is 0 then I say fine if P is 1 and G is 0 I had to pass transfer C I so if A is 0 A is 0 means this is on but B is 1 this is off A is 0 this is off B is 1 this is on but that means whole this series chain is off however I now see A is 0 or B is 1 either of them this line will work same way one of them will work okay either of them will work whichever is 0 that will work okay and now this C I have input if C I is 0 this P channel will work this N channel will block okay so now VDD will appear which is the 0 part 0 is propagated if it would have been C I being 1 this would have been switched off this would have been switched on so this this path is available from here to here or here to here so 0 is pulled out because this node is pulled down to 0 and since 0 is complement of 1 so 1 is transferred so I have my same logic I can see if A is 1 and B is 1 this you can see if A is 1 and B is 1 both this transfer turns on independent of left blocks independent of this blocks itself will make it 0 and 0 means complement so generated the C okay 1 is generated on that so I have now a case which I have reduced the number of transistors on this C 0 line you can see now I have how many gates 1 2 and 2 other diffusion gate capacitances 2 oxide this capacitor no 4 4 plus 2 so I have from 6 I have gone to 4 wire length I cannot change but the advantage now I get that I generate all 4 functions without actually going for additional hardware for it okay so this is excellent kill generate this you can also do the same operation for some which is nothing or PX or CI CI I just generated the CO 0 use that to get output for the next sum this requires 24 transistors which is not a small number however logical effort has gone down okay that is the trick logical effort has gone down because the net load capacitance is not very high also what is the size of these transistor A and B on the right they can be minimum so I will have 2 plus 4 that is 6 by 3 so I have logical effort of 2 coming from the series arm so I have a much smaller logical effort compared to 8 by 3 or 3 times what earlier I had I have reduced my logical effort I reduce my edge and therefore I have improved my delay part so this will be a faster less transistor circuits than the last one so you can think how a VLSI person thinks from the same logic so this is exactly what I say when you design your ideas should not come from the only expressions but you should start thinking what logical effort you are getting and correspondingly choose your function so that logical effort is also minimized okay by the way before I quit I think many of these slides or rather most of these slides have been taken from Rebase course website okay so here is the implementation you can see it is very symmetric across okay no where there are bends no where there are additional lengths coming of course this C 0 which is going here normally is on second metal to reduce the wire capacitance okay if you are single metal then I mean you still have single metal but then this should be the other metal side normally if you are multi metal technology which is 6 metal a 7 metal these days this is very easy to implement so in nutshell the other which I showed was called mirror other mirror in the sense what is above if you flip down it exactly mirrors okay whatever is P you make it n and it mirrors okay the n mass and P mass chain are completely symmetrical a maximum of 2s that is what I said maximum to everywhere I have only two transistors except this last part okay yeah normally logical effort is coming from only two transistor parts okay please in carry case only two transistors that is logical effort goes down please take it we have said other day as many series transistors you will have that much logical effort will increase number of input increases logical effort will go n 2n plus 1 or n plus 2 times whatever the way n number increases so we reduce that number so and by doing this not only we reduce the number we also provided all four possibilities with time saving which does not have to go full expression and if it is so it calculates from here if it is so it calculates from here if it is so it automatically saves the time of four separate calculation depends on input you have okay that is the trick of the trade when laying out the cell most critical issues the minimization capacitance node C reduction of diffusion capacity particularly very important the capacitor node 0 is composed of four diffusion capacitances to internal gate capacitances six gate capacitances in a total added cell okay which is less than the earlier what we showed about ripple carry the transistor connected to CI are placed close another interesting thing which may be many of you are aware but still I may stress I think Professor Sharma must have talked this is very important in every design that we should look for something called critical paths critical path is defined by that input which comes the last because that is going to limit the final speed if that one input of out of 4 or 3 is coming the last it will the output will be decided when that appears that is the critical so it is always suggested that in most cases the critical path should be close to the output okay the critical path transistor should be close to the output the reason can be you can see from here since this both are close to the output you can see from here even if this is the last coming because it may come from the last stage this may be this is first stage but your end stages this may come from the last stage and it may not this A and B will be available to day 1 P0 G0 P1 P2 they can be calculated independent of C because it is a function which is only function of A and B so you can calculate them immediately A and B are known to you for the next stage all bits are known to me so A and B is unknown to me but C I do not know because C I had to get okay so I see if you see A and B here simple case if both A and B are one per say so I have already settled or say A and B are one here so I have settled this but I have to settle on this because C I has not come if C I would have been lower the upper two also I mean A and B input they also cannot be evaluated because they know ground for them because if C I is below that that they it can't provide you ground for it so even if they want to discharge because their inputs were one they are not able to discharge because there is no ground path for it in this specific case it is not so very true but otherwise so what we do normally the path which is closest to critical should be let us say that means this is already settled out ground is provided A and B 1 let us say so whenever this comes immediately grounding path is provided to you so in all cases the critical path transistor critical input transistor should be closest to the output which will reduce the average delay for you okay and that is one trick which is not just for here everywhere so we put that I mean we put in means the way it has it will automatically come okay only the transistor in a carry stage have to be optimized for optimal speed all transistor in some stage can be minimal size because some is essentially is coming from once the carry has come till the next carry comes you have enough time to coming okay so in some sense the limit is not coming from some part so they can be anyway at the lowest transistor carry takes time so okay that you do some better technique so that carry is better