 Good morning. We will continue with our effort to understand the arithmetic systems in most of the digital systems. We have so far seen the generalities about the systems which allows you to do all the arithmetic operations like adders, subtractor or multiplication or division. Today, we shall start looking into having completed the kinds of adders which we can implement. Please remember any of the choice of a adder circuit or adder system is decided by three important features which are decided by VLSR requirement. Necessarily they are the speed of the operation you want and the power which you can tolerate to consume and finally, of course, the area which this particular system will occupy on silicon. Based on these three, we are discussed which kind of adders can be used when depending on these three parameters and we have also seen that many of the circuits which we used in adders have multiple advantages in some cases. Some are better when the bits are higher, some are better if they are smaller, some are better if there is only speed is the criteria or some are very, very good if they are very low power like say current mode circuits. Now, we will go to the next and the final version of our arithmetic which we say multipliers. Multiplier is most important operation in many of the real life systems and the way multiplier operation occurs is basically requires some kind of generating partial products and then using them then using adders to add them out. So, let us see what is exactly what the multipliers are and what are their options available for us to actually implement. So, this talk of multiplier I will talk about briefly about introduction which probably I did. Then we will talk about arithmetic operations types of multipliers, individual multipliers circuit performance and the finally, we will look into Booge's algorithm which is the most important multiplier operation these days in most of the digital hardware. We will just show you since any operation in the multiplier requires addition and shifting we will quickly see a 1 or 2 circuits of a barrel shifter which allows you to data to bits to flow shift towards either left or right and these are called barrel shifters and finally, we will make you some comments and then I may provide you a list of references. Now, types of multipliers which are required in any digital hardware are of 2 kinds one of course, we all know are is called fixed point multipliers among the fixed point multiplier the most popular one is the integer multiplication and of course, it can also have decimals it can have fraction multiplication, but anyway any fractional number can be handled same way as the integer numbers and therefore, they only need some addition and shift operations to do the fixed point multiplication. The other possibility of course, is the floating point multiplier in which the functions are represented as 2 to the power or in case of decimal 10 to the power numbers. They use as fixed bits for sign exponent and mantissa for example, a single precision floating point number is generally represented in 32 bits which has 1 sign bit 8 exponent bits and 23 mantissa bits. So, for example, it can be written as minus 1 to the power s which is the sign bit into f which is essentially your exponent your mantissa bits and 2 to the power e e stand for the exponent bits. So, let us look into the easiest of multiplication which we see normally in any operation in decimal and now here it is something a binary multiply multiplication shown to you. Let us say I want to multiply a number which is a multiplier is 1 0 0 rather I should have said the other way the upper ones are always called the multiplicant and the lower ones is multiplier, but does not matter it is only a matter of definition. So, if this is your multiplicant just reverse that we should name. So, 1 0 0 1 is 9 and 1 1 0 is 8 plus 4 plus 13. So, we like to see if I multiply 9 by 13 what is the number I am going to get say. So, with this on the thinking let us start 1 1 1 7 should be our answer. So, the way we do it we take this multipliers first bit multiplied to the each bit of a multiplicant and write then here and then shift for the next bit multiplication and write down the these are all called partial products. We keep writing partial products for each bit of multiplier and then finally, we add vertically to get this number 1 1 1 0 1 1 and since this number is around 1 1 7. So, we know basically what we did we first figured out that among the multiplier bits how many bits are you available whether it is a signed bit or unsigned bit. Then you have a multiplicant how many bits it has and then we start multiplying from the first multiplier LSB to the multiplier multiplicants numbers and keep generating partial products and then finally, vertically we should please remember every second bit we actually shift to the left and then we add the vertical lines. So, that we get the sum this is standard multiplication even in decimal. So, we use the same thing. So, traditionally for example, just for the sake of this there are 2 things we are doing in the case of multiplication. We are first evaluating the partial products and then accumulation of shifted partial products what we call sum is then created. Now, example is 1 1 0 0 is 12 10 this is the multiplicant and 1 0 1 0 1 is 5 in decimal and the multiplication in decimal is 60 we know very well. So, if I do the same thing 1 into 0 0 0 then 0 into 0 and then 1 into 0 1 1 0 and finally, I start doing accumulate. Please remember every partial product we shift and then finally, after completing all partial products we add vertically to create 0 1 1 1 1 0 0 which is nothing but decimal 60. So, basically what we did binary multiplication equivalent to a logical AND operation one can see 1 dot 0 what sort we did or 0 dot 1 we did. So, these are essentially a binary multiplication equivalent to a logical AND operation. So, the step one consist of logical ANDing of multiplicant and relative position of a multiplier bit. Each column of the partial product must be added and carry if any generated should be passed on to the next columns. So, this is a typically what all of us been aware on case of decimal as well as in case of binary numbers. Now, before I start ahead maybe I will give you a class of multipliers which are popular in the digital hardware systems and the choice of course, of any system requirement is as I keep saying all the time speed throughput area these are of course, VLSI requirements or system requirements and the finally, one of the major requirement in many of the digital hardware system is how many bits you should continue to work on which is essentially say numerical accuracy, how much accurate functions are how many other number you want to have finally. For example, 0.00000099901 how is that is what that is or 0.0001 is good enough is your choice and depending on the accuracy you provide one may have to decide which kind of system you will require. Typically there are three kinds of multipliers basic operations possible one is called serial the other of course, is parallel and the third and if not the most important being is the serial parallel. Here are the three multiplier shown to you here. Here is a circuit which has a serial multiplication for example, here you have some kind of a circuit which essentially I already said there is a reset requirement this is my adder this is my shift register or register which actually can give me desired delay one bit delay in this case it may be a flip flop which runs through a clock and you have a clear signal as well. So, when you reset this clears the flip flop. Now, the way it operates that you are two numbers x and y to be multiplied. So, you and them out and this output if there is a last carry coming from the initially the carry will be 0. So, the output of this flip flop is cleared. So, it is a 0 plus this and you generate the first partial product. And this partial product you pass on to this register in the first bit in the next bit please remember I am actually feeding this output back to the input to this which is the adding the partial sum. The last partial product should be added to the next one by one bit shift. So, this is what is going to be done. So, every time after one clock cycle whatever is available in the LSB on this register here will be transferred back through this to the adder. Now, the next bit will appear with the last partial bit the new carry which would have been generated here is now fed back last carry and the process runs through. The advantage of serial multiplier is obvious that you only one adder requirement even if there are 16 bit or 32 bit or 64 bit operation to be performed. However, the disadvantage can be obvious that if you have a large number of bits to be multiplied obviously one clock cycle only one bit operation is performed. So, if there are n bit numbers n clock cycles will be required to generate the full multiplier output. So, it may be little slower, but it is much less hardware intensive compared to others. For example, here is another one which essentially shows a serial parallel multiplier essentially you put all the bits simultaneously together create partial products in this. All that we do is between the two adders since you need a carry out you provide one bit delay here which is essentially a register and that provides you the last carry and then the this process keeps continuing ahead. So, it does not say that it is it will be very very fast though this operation will be simultaneously done. So, the one AND gate delay is only required. However, 3 bit delays of the delay part will be certainly required before an output is there though it is faster than the serial multiplier it is certainly not our case that it is the very very fast as we would have wished to. In general therefore, the make one of the cell of this is essentially called the serial pipeline architecture in which essentially you provide data as well as the output partial from each of them for the next this in after a clock delay and please remember if this chain of delays are provided the way it is shown here it does not really delay very much because once the pipeline is full every clock cycle you have a data as we said in earlier pipeline circuits. So, here is the first add and shift multiplication operation which is the most common principle of any adder circuit. I am showing you here a parallel adder shown here as the block you have a parallel adder which is shown here which receives the data from the multiplicand register. For simplicity right now I have only taken a 3 bit data 1 0 1 which is essentially 5 and you have another multiplier register here which stores the multiplier which is 4 here 1 0 0. So, now what we do is we actually have all 3 bits parallely fed to this parallel adder 1 2 3 here then we have a accumulate register here which is 3 whatever bits you have plus 1 you need here. So, it is a because last carry has to come here. So, this is 4 bit register here for the 3 bit data and the it receives the data as the output from the parallel adder. However, initially it is cleared and after every clock cycle all these 4 bits which you are showing here are returned to the input of the parallel adder. So, you have 4 bit coming from here 3 bits coming from here and the 8th of course, is the control bit which is coming from the multiplier register the LSB part. So, the idea is in this adder circuit is add shift register multiplication is that this mode control M signal here if it is 1 here then we say addition operation is performed by a parallel adder. If this is received 0 here it does not do any add operation only every clock cycle shift operation will be performed. So, please remember whether you do addition or you do not do addition every clock when the new data appears the shift has to be performed. So, the way it is the circuit shows load multiplicant in multiplication register as we said here we load 101 multiplicant register and load multiplier number in the multiplier register LSB of multiplier register please remember LSB of multiplier register is essentially is taken out as the mode as the mode value M. So, please remember initially this is 0 when next clock when 0 comes out this will receive M 0 next when it further shifts another 0 may come so no add operation number when this 1 will come out there will be an add operation here. So, please remember every clock this will shift to the right and when you shift to right this bit will move out and the last bit which comes out acts like the mode control value. So, load accumulate register and then we initially as I said load accumulate register is already initially cleared and whenever the multiplication accumulation register receives any addition it stores the new data and then it shifts the operation and gives whether M is 1 or 0. Please remember again I will give example little later after every add or no add operation accumulator and multiplier shift register does a right shift operation under clock control because the next bits of multiplier and multiplicant will be now in question to operate. This creates change in LSB of a multiplier register and which means that one has new value of mode control signal M. The new mode signal again decides whether adder should start addition of bits of accumulator register with multiplicant register or FME 0 it is not. This process is continued as many times as number of bits of the multiplier register in our example 1 will have 3 shifts and add or no add operations because we are using 3 bits. So, let us look at the example here initially you have 1 0 0 and 1 0 1 as the operation. So, we start with initial accumulator register with 0 0 0 0 then the other multiplier register has 1 0 0 since the last bit here is mode signal 0. So, we expect that the parallel operator parallel adder does not add any do any add operation. However, as we said even if we do not do any add operation or we do add operation shift 1 bit on the right is necessary every clock. So, we shift this data 0 0 0 of course we add if the since this will move away the blank one is now added 0 automatically and now 1 0 0 will shift to 0 1 0 and now mode is again 0. Since mode control signal is still 0 and which is returned to parallel adder mode control there is still no add operation required by now, but we still have to shift this data. So, we do another shift operation once again. So, you have 4 0s again and 0 0 1, but now mode is 1. So, since mode is 1 the mode control signal 1 will start the adder operation. So, you have a multiplicand which is 1 0 1 and now that is added with this 4 bits 0 0 0 plus multiplicand which is 0 1 0 1 and the addition of this is 0 1 0 1 and that is then 0 1 0 1 0 0 1, but since we have already done an addition operation this is the status of a cumulant register this is state a multiplier register. However, every add or no add operation needs shift. So, we shift the data on the right. So, you get 0 0 1 0 1 0 0, but the next time you see m is now 0. So, no add operation is required. However, again the data will be since they already all the 3 bits are over no more shifts are required. So, you get 0 0 1 0 1 0 0. So, if you write this number 1 0 1 0 0 which is your output if you see very clearly 1 0 1 is 5 1 0 0 is 4 in decimal multiplication is 20 which is 1 0 1 0 0. So, this is the easiest of multiplication operation which one can perform in the normal serial kind of registers which we have. The advantage as I keep saying in this kind of you can see you need one adder and two registers and one accumulator register to perform. The only thing is as many bits you have as many times you will have to shift and that means that many clock cycles you have to go through before the final operation final result is available in some of this accumulate and multiplier register area. So, it is much less hardware intensive, but comparatively slower comparatively slower. This another operation which we will like see parallel to be done add multiplier operation add shift operations multiplying large number of bits. One technique which we often use in this case is to find out how is this multiplication can be performed. So, let us say I have 2 numbers which shows x is equal to x i 2 to the power i y i 2 to the power y j 2 to the power j as the y number and the product is x into y. I will do this again little more detail, but just to give you x i 2 to the power i y j 2 to the power j sum of all the bits and if we want to find the product then it is x i y j 2 to the power i plus j and if we put k as the product this term x i y j is p k then k is equal to 0 plus m plus n minus 1 p k 2 to the power k i plus j is k in number case. So, in that case this is the product which you get. So, each p k 2 to the power 2 to the power k gives the position and p k is the partial product. So, if your n by n multiplier needs n into n minus 2 full adders. So, please look at a simple adder multiplier cell you have 1 x sorry this is your x and this is your y. So, the first AND gate gives me partial product of x y if you are the if this is not the first adder then you will require a carry if it is this it can be half adder because you do not need initial carry. So, the output of a AND gate which is x y is transferred to this adder if it is initial first cell then the initial p also a partial product is 0. So, a partial product is does not exist. So, but naturally it may have for the next month. So, p is also inputted here carry is also inputted here the partial whatever is the partial product x y for this we create here x i y j then we add create output carry output product and please remember this is my x and this is my y this is my input carry this is my output carry and this is my initial product for the last case and then the new product is this one this will then become the new input product and the next x and y will appear and this process will continue. So, if you look at it since the first one where you receive first x and y you do not have any carry to generate. So, those places where that happens you do not need any full adder you may need half adder please remember half adder is a less hardware intensive less number of gates and also is relatively faster. So, you need n into n minus 2 f h full adders n half adders and obviously for each of this n into n square. So, n square and gates to create all partial products for the worst case delay one can say tau g is the worst case adder delay for this block then 2 n plus 1 tau g is called the worst case delay of this kind of multiplier. Typically if you would 4 bit multiplier partial products if you see let us say I have 2 numbers multiplicand is x 0 x 1 x 2 x 3 multiplier is y 0 y 1 y 2 y 3 represented by these 2 numbers then we do partial product y 0 into x 0 y 0 into x 1 y 0 into x 2 y 0 into x 3 and then we repeat with y 1 y 1 into x 0 and so on and so forth. And then all these partial products are added these columns are added x 0 y 0 transfers here then x 1 y 0 plus x 0 y 1 is transferred as some of these and we keep doing. So, here is only 1 term then 2 terms then 3 terms then 4 terms 5 terms. The way we operate is whenever we get this first term next time we actually will add this up and then we only add this one when this happens we would have already added these 2 in the same operation we would have done in the next this operation in the finally, in the generation 4 we will do fourth operation. So, the method is repeated product availability in the earlier game can be reused as I show product input and then new x and y can be added every now and then in the same column and new product some can be obtained. The first and the foremost multiplier uses which uses this algorithm which shown here which is simple algorithm is due to credited to Bohr and Wolley multiplier which is an algorithm for 2's complement multiplication. It adjusts partial product to maximum regularity of multiplicationary moves partial product with negative signs to the last steps and also adds negation of partial product rather than subtract. Please remember no negative number no subtractors as probably we want to use. So, we can do it by actually creating the do some negative adds as we call instead of using a subtractor circuit. Now, I will before we go to Bohr Wolley circuit which is shown which is a standard array multiplier let me again do some little bit of maths once again which is not very difficult, but just to see you which is used in Bohr Wolley multiplier. Let us say x is a multiplicand and y is the multiplier and both numbers are represented as their complement 2's complement numbers. Then x is equal to can be written as please remember how x can be now written is minus x n minus 1 2 to the power n minus 1 and then sum of i is equal to 0 n minus 2 x i 2 to the power i. This is the 2's complement method of representing numbers similarly y can be represented as minus y n minus 1 2 to the power n minus 1 plus j to the power j equal 0 to n summation of to the n minus 2 y d 2 to the and we know the product term is x into y. Now, if we do this if we do this we rewrite this terms let us see how we rewrite we take the products remember there are 2 terms in x and 2 terms in y. So, the x y will produce 4 terms. So, p is x n minus 1 y n minus 1 2 to the power 2 n minus 2 then you have i is equal to 0 j is equal to 0 summation for i and summation for j up to n minus 2 then the partial product x i y j 2 to the power i plus j. Now, you have 2 more terms because there is a minus x n minus 1 2 to the power n minus 1 and minus y n minus 1 2 to the power n minus 1. So, those terms will also get added now multiplied. So, there will be 2 more addition terms the first is minus x minus n j 0 with this that is x into y now y into x and the power will be 2 n plus j minus 1 to n plus i minus 1. Now, this essentially means that these 2 terms are first 2 terms are going to be added however, the last 2 are subtracted we will not like therefore, any subtracted to be used please remember subtracted require additional hardware. So, we do not want to do any subtracting operations. So, what we do is we do negative addition as I keep saying and therefore, we represent these negative numbers in this format. This is the format please remember 2 to the power some number when I say it actually gives you for example, let us say let me tell you what I am trying to say if I am a number 1 0 1 1 0 what I am essentially saying is 2 to the power 0 into 1 sorry 0 into 1 plus 1 into 2 to the power 1 plus 1 into 2 to the power 2 plus 0 into 2 to the power 3 plus 1 into 2 to the power 4. So, every bit position here here here essentially gives me the 2 to the power coefficient there. So, if I say I am here and I want to subtract something or this I can move my sub add by this position and if I do shift the position I am actually doing the essential equivalent of a subtractions. So, this method of actually doing subtractions through a negative number can be represented as minus x n minus 1 y j n plus j minus 1 x n minus 1 minus 2 n minus 2 plus 2 to the power n minus 1 please remember this is 2 to the power n minus 1 minus of 2 to the power n minus 2 j is equal to 0 n minus 2 similarly, I can write for y n minus y term which I said the minus y n minus 1 n minus 2 can be rewritten in the same form is y n minus 1 2 to the power 2 n minus 2 plus 2 to the power n minus 1 i is equal to 0 to n minus 1 x j 2 to the power n plus i minus 1. So, I can write these 2 terms in this formats I have these 2 terms are anyway positive terms and therefore, now we have an interesting situation that we can then only need positive operations or add operations in this case and 2 to the power numbers essentially the shift operation shifting this is a shift operation. Typically array multiplier shown here if you see here is a typically array multiplier shown here these are all y's y 0's y 1's y 2's y 3's then each vertical line is x say this is x 0 this is x 1 which is not shown in each gate receives x 0 x 1 x 2 x 3 similarly you need x 2 again you will need x 2 you will need x 3 x 3. So, there is this x is essentially traveling diagonally each x is diagonally whereas, y we are taken as horizontal lines. So, what it does is the first partial product is x 0 y 0 which is your z 0 the next partial term can be created by x 0 times y 1, but you need now this addition with x 1 y 0. So, this is x 1 y 0 is coming from here x 0 y 1 is coming from here and since it is the first time you are doing an addition operation there is no carry available. Therefore, a half adder is good enough and if that happens the together half adder creates z 1, but now it generates carry for the next this you get x 2 same way and now this whichever you carry you are generated with the this two numbers plus this x 0 y 3 y 2 numbers can be again it will not have any carry because this is the first time appearing. So, you need half adder. So, one can see from here the last x wherever x 0 is appearing you actually need half adders, but whenever x 1 y 1 or x 2 ahead you will require full adders for all those operations. Here again you see there is no full adder requirement because there is no additional x or y coming from this side. So, no additional carry because already one only there is this term is not occurring here. So, you need does not you do not need any carry inputs here. So, you need half adders here. So, typically what you are doing is successive creation of products and some through full adders is transferred to the next bit. So, you can see this is the total addition going through this whatever is added here is now added with this whatever is added here is now added with this with half adder you get the z 3 by same logic you get addition of these vertical lines and you get z 0 z 1 z 2 z 3 z 4 z 5 z 6. Now of course, the last carry which will generate will be your z 7. Now, if you see the kind of operations you may have to perform for subtractor or minus values can be shown through there. There are five kinds of cells or block cells which you use in a Bawalli multiplier. The first one of course is the generation of x i y j term this is x i this is y j simple AND gates this is block cell 1. Then you may require x i bar y j this is subtraction kind of requirements if you see then you may require an inverter here this is block cell 2. Then if you see this another cell you may require is your x i y j you create an x i y j term you have the last sum which is coming which is what full adder will give it may receive a carry may generate a carry and the final sum out which is what that full adder circuit which you are seeing there you can see the other one is you may require x bar or y bar kind of things this is x bar y x this is x bar y and same operation as this you require and finally, you may require some kind of an x or equivalent for example, in the final adders this is nothing but x i y j the complemented is x j x i bar complement of this is y j bar. So, this is x bar x i y j plus x bar y j bar plus carry this kind of operation can then lead to an x or x not kind of operations. So, these are the four blocks which are normally you will find in a Bawali multipliers. If we see carefully these four figures five figures you can see from here in this multiplier we need to complement generator and gets to get partial products and full adders to do additions. So, these are the only three gates which we probably will require three kinds of system you will blocks you will require to do a multiplication. To save an area and also to improve speed an m by bit multiplier is always arranged in an array which is what this slide is showing you can see this is x y array has been done. However, a better arrangement is also possible and which is shown in my next slide. You can see from here actually it is exactly same it is not different. So, only thing why I am showing you is the place what is the kind of delay you are going to get let us say your m by m or m by m this. So, this is same as x 0 y 0 then x 1 y 1 x 0 y kind of thing we are doing as we did. So, the path is this, this, this, this and this. Please remember the path is this, this, this, this. I remember I please tell me this is the critical path of the circuit. So, if you look at the delays associated with the critical path, critical path I repeat is coming like this. So, you require all four adders. So, you have a n minus 1 kind of full adders only some operations will be required. So, you have n minus 1 t sum because please remember when we are performing this operation, the other operations are simultaneously done in the earlier cycle. So, you do not need to know this. So, you have only n minus 1 operation. Since x and y are created simultaneously, so you have only one AND gate delay and then you have a carry path. This is what I was trying to show you a carry path. So, you have m minus 1 plus m minus 2 carry for x and y I mean for the word this is one is vertically down one is horizontally down. So, it is the delay is associated with m minus 1 plus m if this is of course, m cross n carry. So, this is the net multiplier delay. I remember I may tell you again the delay is essentially from this path. This is the time delay I evaluate. Since as some of the partial product do not need carry, so hardware can be reduced as I keep saying only half adders are sufficient. A better arrangement for the same which is shown here you know this is my x which is traveling vertically down, this is my y, this is my AND gates which receives x and y here at this AND gate. This is my adder which can be mostly full adder in case the carry appears otherwise if a half adder if there is no carry appears. Now, this is my output carry this is my initial partial product which is then added to next partial products this and keep generating the new one. So, if I keep a array of 4 by 4 in this fashion you can see from here this is my x sorry and this this is how diagonally I am crossing x and y and please remember these are my x and these are my y slightly shown in a better fashion and the product is traveling in a instead of product traveling vertically actually in my this circuit is product is traveling in the AND gate which is going to the adder is essentially looked into this directions. So, this is x y with the last this p g p comes initially of course, this is 0. So, it keeps doing. So, if you can see using this kind of arrangement which is shown here the multiple which is same if it is m minus 1 n minus 2 carry n minus 1 t sum plus t AND is the net delay which is this is same as what I already shown and one can see from here that delay can be minimized delay will obviously increase if the array size is larger. That means if you have 8 bit by 8 bit multiplications or 16 bit by 16 bit multiplication the multiplier time will keep increasing. So, the adder part does not really increase adder part does not really add substantial this please remember this is m plus n kind of things this is only n kind of things in general some time is not that high compared to this product of this. So, this essentially dominates over this term and time of course, is very very negligible. So, we have seen earlier in our carry save operations earlier in a adder things we know the carry save adder has the biggest advantage we know about it it is little faster for the simple reason that carry save adder allows you to do 3 bits addition first and generating 2 terms c and s the s essentially is the sum of those bits x y and z for example, and without taking carry into consideration and the carry term c is generated without taking the sum into consideration and then we add c and s with the any other initial carry if you have to with a simple c l a or any normal full adder to generate your carry save operations full adder adder. Now, and we said since it does not propagate carry nor it has to look ahead the carry it has it is the fastest adder. So, the same circuit which we discussed earlier instead of using the normal adder we can use at least those please remember I have already shown you the critical path in my circuit there. So, those full adders which are in the critical path to derive the time at least those should be utilized using carry save adder. So, you can see these are the carry save adders these are called vector merge for the simple reason that here the actually you are adding generating the first please remember first one will be can be always half adder if there is no carry generation. So, the way we operate here in the carry save adder 2 z addition and then the third is added here 2 z addition and the third is added here we keep doing this operation 3 3 2 3 2 operations and can generate z 1 z 2 z 3 this is my x then these are my y which are fade here. Now, this is 4 by 4 carry please remember the critical path is all that matters to me for the worst case delays and therefore, the multiplier delays n minus 1 t carry now because there is no carry propagation except for the actual carries which are required for the next stage 1 carry only. You can see 4 plus 3 7 this is 4 3 t carries are only required plus this only 3 t carries plus merge time merge time is essential in which all z's are parallely available please remember these are parallely available to you. So, you have t merge and of course, and get to generate x y terms. So, a carry save multiplier has half a multiplier cell full adder multiplier cell and there are among the half adder full adder some of them are carry saves and the others are these are called vector merging. The same figure can be little better way shown here these are x s these are y's and you can see 3 terms x 3 y 0 y 1 create c n s the s is now transferred to the next cell with the then x 2 y 0 y 1 is transferred here and then it receives these two numbers and then generate another two numbers and diagonally passes back there and since it simultaneously passes from this side from this side from this side this is called vector merging and therefore, the delay essentially is what I have just now discussed. So, they are half a multiplier cell they are full multiplier cell they are vector merging cell please remember you can use all carry save adders which may do, but some of them in the last circuit if you see these need not be one carry save because anyway I am sorry they are all carry save they need they are not the ones in the critical path because they are simultaneous I am very sorry what I said these are full adders same carry save, but these are the only ones which will transmit the data at the end therefore, in the critical path. So, only the critical path delays are of relevance which this is this. The other possibility of generating a multiplication is using what is called as a vales tree we know in a tree operations any tree operation reduces the depth of adder chain we still use carry save adders. So, you have three input a b c which produces two outputs y and z c and s in last case and we create these terms we know this we already done this carry save operations. How do we do a vales tree multiplication or this? So, first thing we do is in your multiplier let us say these are the position 0 1 2 3 4 5 6 bits this is the first multi partial product terms x 0 y 0 x 0 y 1 kind of things then this is again x 0 y 1 x 1 y kind of thing and there are 4 by 4 product I am showing you. Now, what you do is we know this is the operation these are of course, 0s here. So, we instead of writing in this format we write for each position 1 0 to 6 we write bit position we write whether 1 or 0 exist. So, for example, this 4 4 first 4 will exist because of this, but the next 4 exists from 1 to 5 1 to 5 1 to 4. So, it is 1 to 4. So, this is 1 to 4 then you can see from here the next is from 2 to 5. So, from 2 to 5 and then from it is 3 to 6. So, 3 to 6. So, the first we write for each of them whether see if you go in this column this has to be added this has to be added this has to be added. So, we now start looking for actual additions we say in the first bit position only 0 next you have only 2 then in the third you have 3 in the fourth you have 4 in the fourth you have again 3 in the fifth you have 2 and in the sixth you have 1 you have just inverted it nothing big same thing rewritten in this form. Now, what we do in the first after we put it this is called the first stage operation to create the tree the next stage is in the first stage itself circle the last ones for example, for the third and fourth you circle the last 2. So, if that means, we do this addition it will create only 1 numbers here and then you will have 3 3 2 kind of thing can be operation can be created then it will be 3 3 3 all 3 operations and then we can see in the next operation if we see therefore, if 1 2 of course this is also 2 operations of course, we have been written directly here, but so we look into second position there were 3. So, you create for the 2 then you will create 1 out of that and then 1 plus 1 is 2 this we already created 2 plus 1 2 plus 1 2 plus 1 this. So, if you rewrite this it becomes of course, this is 0 we take. So, 2 2 2 2 since 2 2 2 operation is very simple to add. So, you are partially doing your summing here and bringing at the end only some of the 2 by using this kind of thing we can reduce substantially hardware can be saved a substantial saving in hardware the operation will be very high speed we will see this and the delay will be now log 3 by 2 n of course, since it is the kind of operations you are performing the it will be a irregular structure not an array universal structure and therefore, many times the layout becomes very inefficient. Here is the tree multiplier basic concept. So, basically you have a carry save multiplier shown here you have a full adder this is y 0 y 1 and the next is y 2 creates some c and this and add and the next is carry. The last similar adder must have come we will add this. So, now, they are 3 bits create this and these 2 and you keep create in a vertical diagonal directions. Here is something which better looking figure which is same as this but lightly better. You have y 0 y 1 fit to y 2 create c i and then this is your s term which is nothing, but sum of y 0 y 1 y 2 which is then fed with the carry generated out of this full adder sum partial sum and you add with this to create the new carry and new sum the next carry is now fed here and create new carry and new sum. So, it is log n multiplier times order of log n is the delay here y represent the partial products and x represent this. So, essentially trying you are doing time multiplexing you know you are trying to save some operations are because they are carry saver operations no carries are required in the self operations and these are transferred to the only next stages and the next sum simultaneously is made available to you. So, this means using the stream multiplier concept one can save the time that is high speed. The number of adders now required will be only 4 to do this operations and 6 bit operations as we see and this is therefore, less in hardware high speed, but you can see if I lay out this block it will be very difficult. Now, if you look at your multiplier any time very carefully what is the problem with normal multipliers if you see a normal multiplier your partial sums you are going to create depending on your multiplier and multiplicand kind of numbers it can be find that there may be large number of ones available to you and if you have a ones those many mean terms those very partial products will be available. For example, I may show you what I am saying if I have a number 1 1 1 1 and I multiplied by 1 1 0 1 0 1 you can see from here I can create this. So, many terms 1 1 1 1 0 1 0 0 0 0 0 0 then the next one is 1 1 1 1 0 1 then again 0 0 0 0 0 0 again 1 1 1 1 0 1 1 1 1 1 0 1. So, if you see an operation except for these two which is shown here every other partial product you are creating larger the number of partial products you have for example, in this each of this for example, here you are 3 ones you have 3 ones. So, the larger the ones availability in your partial products larger will be the operation of addition and therefore, even if you use carry saves you will require larger times. However, if I actually convert into say let us say something like this for the sake of completeness then I have only two terms associated with this multiplier 1 1's the rest terms I need not even write because these are the terms anyway going to be 0's. So, I will only do addition of two terms which is very fast this is essentially called a multiplier term is coded or recoded this is your two's complement number let us say and you recode it into a format which allows it to deduce number of ones and if that happens the number of adders which reduce a number of additions and therefore, it will increase the speed enormously. So, one of the major criteria of any wayless achieve as we discussed here also is to show that now here is before I go to the actual bull's algorithm I may just show you what I am really talking in bull's multiplier we recode two's complement number since we use binary number we observed that j long sequence of ones is equivalent to j minus 1 long sequence of zeros. So, replacement of ones by zeros reduces the partial product terms this is what is called recoding is that word clear I the sequence of ones can be converted of sequence of larger number of zeros and therefore, reducing the partial products and since the partial product terms are smaller the time taken to add them will be also smaller. So, this is basically the principle or basically the need of a recoding and that is what both way back in 1900 odd years as first time suggested that this is mathematically possible because of the this law that j long sequence of ones can be equivalent to a j minus 1 long sequence of zeros and using this theorem both has arrived at algorithm for additions. Now, what is this it says that a both recorded multiplier recoded sorry it is not recorded it is a recoded multiplier examines three bits of multiple this is a further at x 4 this is not this is essentially modified kind of thing, but let us see what I am talking about a both a recoded multiplier examines three bits of multiplicand and time to determine whether to add 0 minus 1 plus 1 2 or minus 2 of the rank of the multiplicand. Before we go to this may be I will actually discuss the same issue little later, but let us look at the kind of things we do are here before we will come back to this expression little later this one, but just look at the number x is can be written as i is equal to i to some let us say I am using 16 bit numbers 15 x i 2 i minus 1 minus x 0. So, this is very important what I wrote is first to 0 of course, I have taken out. So, it is x i 2 to the power minus i minus x 0 2 to the power 0. So, if you have this we can remove this term from this. So, we are 16 bit numbers this can be further written as i is equal to 1 to this 8 now I divide into this 8 7 7 kind of thing. So, x 2 i minus 1 2 to the power minus i plus 1 i 1 to 7 like this plus minus 2 to i minus 1 and again x to this. If you collect these terms then x can be written as x 2 i minus 1 x 2 i and minus 2 x i minus 1 into 2 to the power minus 2 i plus 1. Now, this is essentially what I am going to do in my evaluations that any number has 3 bit equivalence x 2 i minus 1 x 2 i x 2 i minus 1 x 2 i minus of that of course, with a minus 2 signs which is equivalent of the x i 2 i to the power minus. Now, this is what essentially we know how can we represent the x numbers and before we do ahead let me tell you how do I do the recoding. Consider a positive multiplier consisting of a block of 1 surrounded by 0s. So, it is 0 0 1 1 1 1 1 0 the product is given by M M is the multiplier you want a multiplicand you have and this is your multiplier which can be written as M into 2 to the power 5 2 to the power 4 2 to the power 3 2 to the power 2 2 to the power 1 in. So, the M in this is 62 M where M of course, was the multiplicand the number of operations can be reduced to 2 by just simply rewriting it 2 to the power 6 is 62 minus 2 to the power 1 is 2 which is 64 minus 2 which is 62. So, now, what operations I am performing right now I was performing 1 2 3 4 5 operations I can now reduce to only 2 operations. This is essentially the basic thinking in recording please remember this number if I recode in this format this 1 1 1 1 0 in this format your larger number of 0s minus this please remember 2 to the power 6 will have larger 0s because it is 1 0 0 0 0 kind of term this 2 will be of course, minus 1 0. Now, you can see from here this number has most of the 0s. So, you can see it is the only positional you got it you only have to do now 2 operations because most of them are 1 only these 2 operations I may have to perform to actually perform this whole multiplication this is essentially recording this into this format. Continuing with our Booth's multiplier operation it takes values of 0 plus minus 1 minus 2 as we just now says and number of partial generated are reduced and they are simple multiples of input operand minus 2 y minus y 0 y 2 y if y is the multiplier and x is the multiplicant. Please remember this is a table which will right now I do not want to discuss this table I mean coming back to this table again. In a Booth multiplier encoding scheme reduces number of stages in multiplication it performs two bits of multiplication at once requires half the stages each stage is slightly more complex than the simple multiplier, but adder and subtractor is almost as small and small as fast as normal adders. Just to give you again the same 2's complement number can minus can be represent like this rewrite 2 to the power something is 3 minus 2 to the power a therefore, minus y can be written in this format then we already discuss this earlier it is same representation consider first 2 term by looking at the 3 bits of y we can determine whether to add x or 2 x to the partial product and I will give an example what I meant even before this let me share tell you what I really could the simple Booth before we go to the modified one which I started let me first discuss the Booth's algorithm which is the simple Booth's algorithm. Booth's algorithm involves repeated adding one or two predetermined values of a and s to a product p and then performing rightward arithmetic shift on p. Let us say you have m and r be multiplicant and multiplier and x and y represent number of bits in m and r. So, the algorithm says determine the values of a and s and the initial value of p all of these numbers should have a length equal to x plus y plus 1. Now, a fill the most I will come to example. So, you will see it fill the most significant bits with the values of m fill the remaining y plus 1 with 0's for the s fill the most significant bits with the value of minus m in 2's complement notation and fill the remaining y plus 1 with 0's for the p fill the most significant x bit with 0's to the right of right of this append the value of r fill the least significant right most bit with a 0. I will give an example and I think that will be very clear to you, but before that what is the operation to be performed? Determine the two least significant bit of p if they are 0 1 then do this operation p plus a and ignore always overflow if these two last bits are 1 0 then do operation p plus s again ignore overflow if they are 0 0 do nothing use p directly in the next step and if they are 1 1 again do nothing p directly in the next step. So, only if it is 0 1 or 1 0 you do the p plus a or p plus operation otherwise do not do just move now here is an example I think that will clarify what I said a is 0 0 1 1 and the rest is 0 0 8 bits and the finally, I append a 0 here s is please remember this is 3 and this is again 4 plus sorry 8 plus 2 4 12 plus 13 13 3 is the 39, but if it is in the minus numbers and this is minus 3 and this is 4. So, I am actually looking for 4 into minus 3 as my number. So, initially you have the p is 1 1 0 which is first term perform the loop 4 times. So, first p is equal to 0 0 0 1 1 0 0 at the 0 the last two bits of p are 0 0. So, arithmetic operation is right shift since they are 0 0 do not do anything just shift one side. Now again we see the last two bits are 0 0. So, just shift again. So, you get 0 0 0 1 1 0 now the next two bits are 1 0. So, do operation p plus s. So, this is your p add s to that and if you do this operation you get this and again shift to the right again this side. Now, we see in observe 1 1. So, we see 1 1 means against the last two bits are 1 1. So, no operation to perform only shift. So, this and if this you see this number 1 1 1 0 1 0 1 1 which is nothing but minus 12. So, if I perform this operation I can always create this number. I can always create this number. This is our basic idea of Booth Recording. Instead of having only 1 or 0 this you can have the number in 1 and minus 1 codes and by doing this we can generate a number in minus itself. I will give an example before we go this the Booth Recording. It advantages disadvantage or it depends on the architecture potential advantage might reduce the number of ones in multiplier. In the multiplier that we have seen so far does not save in speed still have to wait for a critical path, increase area, recording circuitry and end subtraction. So, a new idea was figured out. So, what do we do really in the coding part? So, I may actually show you how do I code it as an example before I actually look into what I am really this. Here is my operation. Let us say in 2's complement I have this 0 0 1 1 0 1 as 13 and 1 1 1 0 1 0 as minus 6. So, I recode the multiplier x. My initial number is 1 1 0 0 1 0. So, the way I recode is the following. The way recording is done I think I will go back and show the other slide, but this is to simplify before I go there. For every the first of course you leave 0's, but the next one whenever you see 1 just below that put minus 1 and plus 1 in the next bit position. For 0 you put only 0 0 for 1 minus 1 plus 1 for this 1 minus 1 plus 1 for this 1 minus 1 and then you just write down the numbers. This is of course initially was 0 0 here. So, you say first number is minus 1 0 the second please remember second is minus 1 and plus 1 plus 1 and minus 1 this is 0 and 0 this add is minus 1 is 0 minus 1 plus 1 is 0. So, for this. So, please remember I am actually doing this operation this operation and this operation. If I do this I get minus 1 0 minus 1 plus 1 0 0 and finally 1. Now, both encoding or recoding as we said essentially says and then we will now go back and show what I am talking about is this operation is say do multiplication do operation which is called minus 2 times the a is the multiplicant and add to that this is addition of minus 1 e and this is 0. So, just no addition. Now, I will come back to this little later once again when I come back to this number evaluation. So, how do I get this path is the following. So, this is how I do I have a group of pairs leaving minus 2 minus 1 this and as I said reduce is the number of partial product by half. So, how it is done it gets read of three sequence of once in general and suppose I have that expression here is that one what I am saying we can see both simultaneously to some extent. You have x 0 to x n minus 1 as your number in 2's complement add x i minus 1 which is always added to the L S B extreme L S B side and is always 0. So, if you see this table this number you have 0 1 1 0 1 1 1 0 0 and 1 and this last 0 is appended by me. Now, what do I do I said for everyone I write minus 1 and plus 1 I do not have to write 0's because 0's do not add for this one I write minus 1 plus 1 for this one I write this for this one I write minus 1 plus 1 for this one I write minus 1 plus 1 this one I write this and if I then add vertically down. So, it is minus 1 0 then it is 0 plus 1 minus 1 0 0 0 minus 1. Now, we know both encoding or recording says this is equivalent of minus 2 this is equivalent of 0 1. So, from here now we come back to this what was the problem in both normal recording the normal coding has some difficulty one can see which is not very obvious to many. In a normal both simple recording it may create if you just do the normal recording as we did earlier then you may have initial number which has certain number of 1's but when you recode it you may have larger number of 1's minus 1 or plus 1 whatever it is. Here is an example this happens particularly when there is 1's are very sparse. For example in given in a book this is the number K Roy's book it says that 85 can be a number which is 0 0 1 0 1 0 1 in say in say 2's complement and if I both coding it this will give me my 0 1 minus 1 1 minus 1 1 minus 1 which essentially means now there are more operations to perform 1 means there is operation to perform 0 means no operation to perform you only 4 operations here in normal case here you are 3 plus 3 6 operations of 1's. So, in case of both multiple normal both multiplication there is a possibility of error in the sense you do not save partial product sums actually increase sometimes in this particularly occurs when there is a sparse 1's. There are large number of 1's any booth recording will reduce that 1's to more 0's and therefore number of partial products will reduce and this is very very relevant in what we call modified booths recording. So, in a modified booth recording what I am going to do is I have I generate I have 0 to x n minus 1 and I add this append this number 0 I leave this number and look into the first 3 bits from LSB this is additional LSB plus we are not counting in inspection. So, we say we will start inspecting first 3 bits and using the inspection of x 0 x 1 x 2 I can recode it into y 1 y 0. However, in the booths normal recording I would have done x 1 x 0 x 3 x 0 and then there would have been possibility they may both would have been 0 and 1 alternately. So, sparsity would have come. So, now what I do is I take the last one once again and now with this. So, even if it is 0 and 1 with this it will be taken care and then I will generate another recoded value which is y 3 y 2. I start again with x 4 go to x 6 I create y 4 y 5 I start with x 6 go to x 8 and so on and so forth create y 6 y 7 y 8 y 9 things of that kind all odd numbers finally. So, you have x 0 to x n minus 1 is the original number and y 0 to y n minus 2 is modified with code number. You can see we are using a 4 bit radix 4 scheme here we inspect 3 and every time we inspect 3 2 bits gets eliminated because common this is there. Now this is essentially what boots encoding is about or recoding is about. Example here is I have x i 0 0 0 then the recorded bits are 0 0 I will come back the table again and you will see the same thing. This is 0 0 1 the y i y i minus 1 is 0 1 0 1 0 is 0 1 0 1 1 is 1 0 1 0 0 is minus 1 0 1 0 this is code 1 0 0 is 0 minus 1 1 1 0 is 0 minus 1 1 1 1 is 0 0. Now what the operation we have to perform is called a is your multiplicant. So, how many times this is recorded digit times that multiplication has to be done, multiplicant has to be added. So, the actual from 0 does an operation of 0 0 1 does 1 0 1 does 1 1 0 does 2 minus 1 0 minus 2 0 minus minus 1 0 minus minus 1 essentially it says the operation should have this is 0 addition 0 into a addition this is whatever is the last this you add 1 times your multiplicant 1 times multiplicant 2 times multiplicant then add minus 2 times means actually subtract kind of thing minus 1 a times minus 1 a times 0 time. This is called Booth's encoding table. Now if you look at the Booth encoding table in this expression how to get that please come back to slide again. So, for every 1 I represent minus 1 plus 1 minus I given a color because I will not say this 1 gives blue 1 is plus 1 minus 1 this black 1 is plus 1 minus 1 green is plus 1 of course 0s are all 0s we not added at all you can write 0 0 if you finish then 1 is plus 1 minus 1 and this 1 is plus 1 and then you add vertically. So, you get minus 1 and of course 0 is here so minus 1 0 then the next stress is 0 plus 1 then you have minus 1 0 0 0 minus 1 plus 1 and from the Booth's table we know minus 1 0 is minus 2 0 plus 1 is plus 1 minus 1 0 is minus 2 0 0 is 2 minus 1 plus 1 is minus 1 plus 1 is 2. So, I have I know what operation to perform when I convert the recorded system into this and here is what I do the same thing which I said earlier can be rewritten I I minus 1 I plus 1 are the 0 0 1 and the kind of operations you put on. Since Booth's recording got read of 3's generating partial products is not the that hard because it is only shifting and negating has to be done this is same thing again explanation is given more detail number of strings of 1's in the side end of strings of 1's this is called isolated 1 this means end of strings of 1's this means beginning of string of 1's end of 1 string beginning new 1's beginning of string of 1's and continuation of string of 1's the kind of operations add operations you perform has this explanation. In summary what do you do grouping multiple orbits into pairs or orthogonal idea to the Booth recording reduces the number of partial product to half if Booth recording not used we have to have been able to multiply by 3 which is hard shift plus this 3 multiply addition to be done. Applying the grouping idea to Booth modifies the recording as it is caller encoding we already got read of sequence of 1's no multiplication by 3 numbers just negate shift once or twice and that is the idea. Use higher addicts to reduce number of intermediate addition operands can go higher you can have addicts of 8, addicts of 15 of course you will have to implement 3 minus 3, 4 minus 4 large number of such this per operations to be performed. But it will be more accurate and sometimes much faster recording and partial product generation becomes more complex than of course it can automatically take care of sign multiplication. Typical Booth multiplier is shown here but before I go now I will show you the example of that here is my example which I just now was talking to you. I have an operation which is shown here multiplicand is 13 0 0 1 1 0 and your multiplier which is x which is in 2's complement of minus 6 is 1 1 1 0 1 0. So, I recode multiplier x this is 1 1 1 0 1 0 I again put minus 1 plus 1 minus 1 plus 1 minus 1 plus 1 this and add. So, I get minus 1 0 minus 1 1 0 0 1 and I know minus 1 0 from Booth recording table is minus 2 a this minus 1 1 is minus 1 a 0 0 means 0 a. Having known the operations to be performed I start looking for the actual things which I want to do. Here is your decimal number 13 into minus 6 you expect an answer minus 78 you want to do this operation of 0 minus a minus 2 a. Let us say initial product or sum is 0 0 0 0 0 part of this is 0 0 0 0 partial product sum call it. The first operation you want to do minus 2 a and a is please remember multiplicand. Now, if I do this 2 of this and shift I get 1 0 0 1 0 1 take complement and shift you can get this 1 0 0 1 0 1 and then add since it is 0 this number will be in 1 0 0 1 0 1 then shift 2 bits because you are 2. So, shift 2 bits 1 1 1 0 0 1 0 1 to this now add minus 1 a minus 1 a by same thing is complement of that is 1 1 0 0 is complement of that please take it complement once complement is 1 1 0 0 1 1 and then append since there were 2 additional number here because of shift you have 0 0 here and add. So, you get 1 0 0 then 1 and 1 0 1 and 1 0 1 1 1 1 0 1 and 1 carry 1 and what we say since it is overflow this part is a overflow. So, neglect. So, the number which I got is 1 0 1 1 0 0 1 1 then we have to shift this the list number by 2 bits 1 1 1 1 this and if I do it and after shift I get add to this 1. So, I get 0 sorry add 0 a to it of course, now add 0 0 means no addition. So, this is the number 1 0 1 1 1 0 this last 2 of course, a sign bits 1 0 1 1 0 0 1 which essentially with the sign bit this is 78 with a minus sign. So, what has that booth encoding has done you can see since booth encoding actually only uses those terms which have once and by booth encoding or recording we are reduce the number of once the net partial product sums required a much smaller you can see in a 4 step operation in the first of course is recording 1 operation then 3 operations here and 2 operations here in 4 5 operations I am able to generate multiplying of 13 into even sign bit multiplication. Before we leave this part I may like to show you of course, you require booth encoding what kind of circuits we use you need an XOR you need 2 inverters 2 AND gates or an OR gate you can say and a multiplexer this is B I B I minus 1 these are the bits which are entering the XI is of course, is XOR of B I please remember this is B I minus 1 this is B I and this is B I B I minus 1 XOR is XI then this addition of this or complement of this is past as 2 XI and this is directly past as M I this is what the recording was asking the other part circuit you need is to create partial product generator. So, you need 3 AND gate and XOR gate to produce the partial product and you need a modified version you which does not use AND gates, but only maxes 3 maxes then these circuits are taken from Madrid's paper in IEEE transaction bilis and 1993. So, you can see basically you require only few maxes for encoding and decode encoding and passing the partial products and shift operation because there is every time you are shifting you need shift registers. So, AND it should be able to shift the data left and right. So, before we leave this part probably the one circuit which I have already shown is efficient booth multiplier which is same as what just now I said this of course, is a slide need not worry about adder which we already taken earlier this is essentially we are looking for let us say carry save has a unit 1 unit 1 and compare to this if you look at the speeds for a valestree you will have 0.05 and if you do the other 0.05 if you do booth it will be 0.001. So, essentially power delay product this is essentially a power delay product can be reduced in a booth encoding there are a variety of version carry save multiplier 32 bit is the reference then you use if you use tree it will be 1.20th of that it will be 1.25th if you use booth encoding further on that it will be 1.00th of that. So, that is the idea of improving the speed power product of any multiplier. Before we leave to shift operation the last part of my circuit requirement multiplier requirement is a floating point representation we know integer operations. So, we also should be able to do some kind of floating point multiplications before we go to that let us look at the numbers we can see that typically any number x is represented as minus whatever sign bit plus some integer numbers here before the exponent and then you have e to the power some exponent in the maintenance also. So, it is called a 1 bit field for the sign bit 8 bit field for the exponent you may have a bias integers 0 to 55255 and you are 23 bit maintenance. So, totally in typically a floating point number is represented in the 32 bit representation which is called single precision which will have for example, 0 bit 8 bits 1 bit for sign 1 do not actually store it you have 1 bit 8 bit for the exponent field and 23 bit for the maintenance of field. So, here is number to show the same thing this is your number m meant size s 1 s 2 s 3 dot 1 plus 0 s 2 in this form. Example minus 0.75 in 10 in single precision is half plus 1 by 4 this is can be written in 2's complement and 2's binary minus 1 1 dot 1 2 this if you write this format 2 to the power 126 minus 2 to the power 127 s is 1 sign bit sign bit exponent is 126 and 127. So, can be represented in binary this and mentissa is 1 0 0 all 0's and therefore, a number which is shown here in 32 bit is the following this is 22nd mentissa. Then there are exponent bits please remember what did I say you have 1 sign bit 23 exponent bits sorry 8 bit exponents and 23 mentissa field. So, this is 23 mentissa bit then 8 exponent bit 30 and the last of course is your this is your last 0 so 31 this is your sign bit which is shown here. So, how do I do addition in this first let us say you have a number minus 1.610 in decimal add to this. So, in decimal what do we do is represent that number in 10's numbers and see to it that their decimal points are aligned. So, both are represent in 10's only in to power 1. So, this can be 0.016 in to power 1 and this can be 9.9910 and then because this is common we do not have to do anything just add these 2 terms you get 10.01 in to power 1. Then we normalize the next operation is we normalize. So, what do we mean by normalize we do not want 2 decimal numbers bits before decimal should be only 1. So, multi we say it is in to 10. So, 1.015 then we may round it of how many accuracy you want say for example, 1.002 is good enough if I neglect 5 5 is more than less more than 5 5 or more than 5 we make it 2 or one aspect. So, 1.00 we need to repeat step in 3 if the normalization is after rounding is not correct really normalize. If you have a double in this is the single precision number any number is represented minus 1 to the power f f into to power e this is how the IEEE 5754 standard to write a floating point number in single precision which is 32 bit 8 bits of exponents 23 bits of significant and range is 2 to the power 10 plus minus 38 30 and double precision is 64 11 bit exponent 52 bit significant range is to the power this. So, once we know we can since each is an individual integer number fixed numbers the operation can be independently performed using integer theories and one can do this. For doing a multiplication compute sign exponent significant normalize shift left right by 1 check for overflow under flow round it and normalize it is identical to the same sign is p s a x x or b s exponent is a b due to bias excess must subtract bias kind of thing significant is a f b f standard integer multiply use valence tree for the addition creating partial products. So, please remember floating point numbers are independently handled and 3 this exponent is separately handled many signs are separately handled and because of that we can put it into a different shift positions to actually get the multiplier operations. Now, the last but not the least part of this whole circuit is we are keep talking of shifting we keep talking of shifting the data to the left a typical shift register based shift as pass gate based shift register is shown here which allows the data to move to right or left. One can see here only pass gates have been used and a buffer of course since you may have to drive. The first one is the for the right the second is for no operation and the third one is left. So, if you want to move the data from right. So, you make this as one and let us say this is my a i no operation means 0 left is not operation. So, this data is appearing here right shift. So, you can see from here it has gone to the right i minus 1 if you want left you can see that if I want left I must go above I must I want to reach here. So, obviously I will go up this is high and since this is high this is transferred. So, left and right data can be transferred and no operation can be also. To show you this data wire and control these are this is a 4 bit shift barrel shifter is shown here. There I shown you 2 is 4 it is identical shift 1 shift 2 shift SH 1 SH 2 SH 0 are the 4 shift signals. It can be depending on right and left whether this or these are turned on data can be transferred here data can be transferred here or transferred here depending on which pass gates are switched on. This can be controlled by a small logic which will allow this to create shifts signals and those shift signals will allow you to create a 0 to go to b 3 or a 0 to go to b 1 or vice averse are coming down. Each bits can be repositioned left or right by using this kind of barrel shifter 4 bits simultaneously can be given they can be put like this or they can be put directly like this. So, obviously shift register of shift operations can be easily performed left and right using a barrel shift register. So, we are now seen in a multiplier you need the adder you need a shifter. We are already seen all kinds of adders in our earlier implementation we have today seen all kinds of multiplier possibility we also looked into floating point possibilities and using barrel shifter and those 2 blocks different kinds of bomb depending on the area power speed and of course, the accuracy one can choose difference hardware circuits and different hardware circuits will lead to a different performance index and based on that you can choose it and implement any adder multiplier in your actual hardware. These are the books from where much of my work was taken basically you can for the first level of understanding you can use Rebase book. There are other 2 books we all know is Ashramian West and this I already given my other references to you. Of course, there are many thanks to my students because they are ones who created many of my old slide these are of course, my many VLS my post graduate students in VLS I in last 15 to 20 years may be more than 15 years they ask me many things which allowed me to understand better. And of course, there are some good book references you can see some of the slides from the UC University of California Berkeley course sites due to Rebey and others and their book credit to Prentice Hall for allowing it to do that and there is a book by Edison Wesley which is one of the very famous old book West and Ashrangian's principle of CMOS VLS design which I published in 2019-94, but still seems to be one of the best system design device to system design book. Many of the circuit shown here have been taken there then there is a book on DSP processors which is written by Mendecette Vijay which is published by Butterworth Heineman. So, some slides some data things were taken from this book and there is a book on compiler arithmetic by Van this is John Wiley which is one of the oldest book in fact, but if you really read classic books you really understand much more and therefore, I recommend those who are looking for advanced VLSI should look for the last two books last three books very carefully. The other book which you can see is Lars Mannemers DSP ICs which is published by academic praise and last, but not the least the very recent book appearing from McGraw Hill which is written by Keatsen Yeo and Kaushalra and much of the data of power speed and this have been taken from Kaushalra's book and due regards to them. There are many references on number systems, there are references on adders, there are huge numbers you can see there are lot of numbers actual this available for multipliers and this with this part we complete the total arithmetic operations for any processor or any hardware in this part of the advanced VLSI course. Some of the problems which I gave during this all I will add at the end of them they will be my model problems as we have already solved them some problems I will add to it later which you can solve. Many times most of these problems can be only solved on what I would say on the using a spice. So, you must have at least the initial version of spice if not the advanced if you are a cadence tools or a synopsis tools or model if you have the mentor graphic tools you have a good spice available on it. You can choose any of the hardware shown here for any given technology you can try implementing many of those blocks in your real system design and verify which ones which I actually have given you as an hint to take whether they work. Thank you very much for the day.