 Welcome to another lecture in the area of VLSA design and Sandoor Kirvan's gain. Today, we start with the major activity of VLSA implementation. Namely, we will talk about arithmetic implementation and please remember, most of the processors or most of the signal processing requirements will require a large amount of arithmetic processing and therefore, these circuits are of great relevance. My talk will, this one hour maybe or little more, I will talk about arithmetic in some form of this. Introduction, I will give some arithmetic operations. We will talk about adders and then we will talk about individual adder performance, then there are CMOS implementations, comparisons and maybe we will give comments and then my next talk maybe I will then go over to multipliers. Till then, let us start with the something to do with adder operations, which are most important. But prior to adder, let us start with the basic arithmetic which is used in all computing systems. A typical digital processor, maybe called microprocessor or a DSP processor, whichever way you look at it, has a, this is typical structure or architecture which one does have. You have a input data, input output port here, data is inputted from here. Then there is a component or there is a system part, subsystem part which we call data path and this is what is under what the current, today's lectures we are talking about data paths. Then there is a memory which exchanges between data paths as well as it is controlled by a controller. Some kind of a logic is built here and it allows you to do some operations in this data path to be stored here or return back to part itself to data path once again to do further recursive kind of operations and then you may output it out. So, this is called generic system and in this, this data path essentially is nothing but the arithmetic part which allows every other data to be processed in this processor part. So, let us see into this data path. A typical digital architecture of building blocks which are shown there, I repeat, is an arithmetic unit which is with, it may be a bit slice data path which may have adder, multiplier, shifter, comparators and what not. Then there will be memory elements like RAM, ROM, buffers, shift registers and the control will be some kind of logic I said. It may be random logic or based on PLA essentially implementing a finite state machine and you may require timing there because of the data has to flow with given time. So, you may have counters sitting in the control block and the finally, if not the important shown here in the last figure, but there you can see there are switches, arbiters and buses which actually constitute rather than smaller part, the larger part in a whole digital architecture. A typical Intel microprocessor is shown here. You can see here, I just want to show here. These are different blocks shown here and the data is arriving from, this is the input data line, this is your output data line, may go to cache and then there will be some kind of a, these two blocks which essentially do the maths, this is what control it does, these are bit slices and after all this depending on the logical operation you want to do, you can actually do further ahead, there is a register here, there is the buffers here and finally, you have some kind of an output. This is the, Etonium has six integer executions units like this and it is interesting to see why they are so fast. A typical processor or arithmetic block which I was talking about has, is called bit slices. These are bit slices, each is a slice of that, you can say it is a bit 0, bit 1, bit 2, bit 3 and bit n and each is processed simultaneously or at times serially depending on the architecture you have, you have register block here, you have adder blocks, you have shifter block and you have multiplier block. So, you get data in, first is registered in, then during the control it gives data to the adder and once the addition operation is done or multiplication operation is done depending on, sometimes you can see adder can be helped by doing an operation called shift, add shift which may add to become a multiplier as well. So, we will see that these are the processing elements, I repeat in a normal arithmetic you require register an adder, a shifter and a multiplier. Please remember this, most of the slides have been taken from Rabais book and I wish that they will not have any objection because this is just to show you the figure which is nothing specific. The bit slice data path which is shown here is again, you can see from here there are number of multiplexer, shifter, adder stage 1 and 2, then there are wiring to connect this and this is your number of bits coming logical systems and this is how the data actually is flowing from input to the output and therefore it is called data path. A typical itanium integers data path is shown here, there are 4000, this is roughly 4000 microns. So, you can see from here integer data path IE is shown here, this is a digital control, data path control, this is also a bypass control, this is the other thing then this is how these are register files. So, one can see that any of the processor part will require lot of data processing which is essentially arithmetic. The major application of data paths, a number of data path blocks required in any processor application will be larger in the case of DSP digital signal processors and why because they are becoming the most important elements in most of the systems which we often use. For example, in communications you are either you are 3G or 4G or in wireless or any radars you will require lot of DSP processing, it may be data processing, it may be a voice processing or it may be a video processing, but you will require lot of data processing in this case. In the consumer applications, all of you are using mobiles and what not, TVs and everything you require lot of processing these days and therefore in most of the consumer applications we need DSP systems. Robotics is another area which is very common and is becoming very important for integrated manufacturing in many big large industries. And finally, as I say in the video, audio or music system or speech processing you require a DSP. So, what are the issues when I design a DSP processor essentially digital signal processor, one is most of the cases they may be handled systems. So, there may be issue of battery size because that is availability of space as well as the weight it has to carry. Then for example, it may all signal processing applications will require programmability which is in contrast to normal VLSI system where you may have other parameters to control, but not necessarily programmability. Whereas, in the signal processing or any other microprocessor applications programmability is the dominant requirement of any such processor designs. Parallel architectures there are may be you know whether to reduce whether to improve the speed and reduce power there may be many parallel architectures built around there where the processor is designed and fabricated. And of course, one is expecting that it should run at very high speeds typically gigahertz rate is what is expected and therefore these are the applications for these kind of applications. These are the issues which one has to take care when one designs a DSP processor. If you are looking using a let us say this signal processing system may have a large amount of circuitry or blocks. So, you will like to implement on silicon for example. So, we say what are the issues in VLSI implementation based system design. So, here are some things which we mean we are worried about one is complex signal processing is done using DSP operation huge complexity of operation is very important large numbers as well as complex operations are done. There will be a frequency selective operations time frequency transformations and therefore it may lead to large complexity and therefore VLSI are also a circuit will also be very complex. Then in the case of DSP algorithms can be implemented using standard DSPs ASIC DSPs are direct mapping on reconferral hardware. In complex design flow begins with behavioral description of the functional algorithm on the hardware itself like you can use FPGA itself to implement your DSP processor. Of course in turn you may get you may not reach a very high speeds, but at least functionally it may be working at relatively sufficient speeds. So, if you look at the VLSI and a processor design you know one can see we are seen already in the case of VLSI chip design typical specification there we are worried about area, we are worried about speed or delay and speed are same of course delay and power and of course in asynchronous systems speed may not be directly involved in the delays which you are controlling. Whereas in the case of digital processor design or signal processing designs major requirement is programmability. Now based VLSI designs are optimal and therefore they are specific. Once you say you are optimal means they are specific. Once you want to implement a digital processor which has programmability because it should allow you flexibility and therefore the concept of generalized architecture and hence non-optimal. Now one can see that if you have a good chip design which is hard completely dedicated hardware then I can design a very good chip a very good VLSI design VLSI IC which is optimally performing the desired specification. Whereas in the case of programmability one does not know what exactly has to be done every now and then same processor will be used for multiple applications and in that case and different kinds of data may also appear and because of that programmability is very relevant for the case of DSPs. And since they are you cannot use the specific architectures this must have generalized architecture and obviously such circuits or such design blocks will certainly not be non-optimal. So how do we go about? So it is said that if you are implementing a high speed some kind of a signal processing chip then you should take a middle path look for this and this middle of them is ideal. So typically what are the phases in design of such signal processing systems? We have a system design and you have a chip design. So you start with this is the kind of flow you may have a system specification then you design system on the behavioral structural as well as RTL level and from there you may design a chip down physical design can be or can be used asics can be used or programmable chip so also can be on which you can put your design and finally once they are manufactured you should be able to test them. The approach as I say is structure design to reduce complexity, capture design problem, partitioning, top down approach, bottom up approach one has to decide which way to go about and meet in the middle approach is what basically we will like to do. So here is the first part is the signal processing part the other is of course a VLSI part in some way. So if you are doing a top down design you start with algorithm, scheduling, architecture, foreplanning and if you are designing the bottom up kind of situation then you say your large gate transistors, modular cells, layers and layout. So in between what you do is you get some kind of a pre design blocks or some kind of IPs available to you whose layout are tested and you know about the performance may be use them to implement these algorithm using a standard scheduling for the given specification of the system and design and take a architecture through which use some of the already pre design blocks and do design something additionally if you need. So this is one top down and this is we call bottom up applications and this is most cases any system design will actually use this approach. So if you are really looking for DSP designs then real time performance throughput and latency are the two major criteria of any chip design DSP chip design because it is a real time performance and given how long what is the throughput that is the data available per unit time or something and how long this data can be made available after inputs are impressed is called the latency. So these two issues are very crucial in design then the next and the most these days the foremost among even among them of course DSP will require all this but will the major chip design requirement will come from the low power. We also require a small die size lower cost we want field of regularity I should be able to modify online on chip that is programmable it should be good and you may be able to add additional blocks that is pre design blocks may be available to you and you can add upgradable to the earlier versions. It should be customized ability that you know for a given application you should be able to convert to a customer's application then it should be testable in the field because if it is testable only at the fab end then it will be very difficult to modify. So it should be field testable and an obvious reason to save the money the lower cost design time should be very small comparatively and of course they should be modifiable designs. There are three Ray Bay's rules which you should follow in any chip design the right structure has to be chosen for function unit before attempting optimization. Two the critical path with circuit should be identified and its length should be minimized first thing you see that critical path we know all know is the path from input to output which has the largest delay. So first look at it can be minimized by actually connecting or by algorithm or by this whether critical path lengths are smaller that so can you can have larger speeds and the finally only the number of transfer does not decide circuit size it is indeed the interconnects and are decisive for area as well as for you know for example you may say that I may have million transistors or 100 million transistors but you will not believe that the area of 100 million transistor will be less than 30 percent many times on a chip whereas the other 60 60 percent area may be actually taken care or taken by interconnects and therefore in deciding the size of transistors is not the only criteria of area or the chip size you are going to have but also the requirement of interconnect is also going to decide. Now we start with our arithmetic today we will look into variety of number systems having given you some introduction let us do some basic thinking once again though all of us know about it in many ways but we will just go through if not very detail way but at least go through the basic arithmetic requirements. So, we start with number a number system can be defined by the set of values that each digit can assume and by interpretation rule that defines mapping of sequence of digits and numerical values this is slightly in a definition which looks very funny but at least if you read you do not make really a head and tail of it but if you see in real life you do without much worry about you can define a number correctly a number x for example which has a sequence of x 0, x 1, x w divided by 1 then we say x is equal to sum of i is equal to 0 to i is equal to w d minus 1 w d is called the word length and is sum of w i x i where w i is associated weight and x i is essentially the bit at position of the bit here conventional i th power of fixed integer r called radix one can say omega i is r r to the power i now this statement if you all know very well is not that difficult to understand if you see it you take it take a decimal number and you say you have a number 178 so how do you actually think it is 178 so the way we write 100 into 10 to the power 2 that is 100 plus 70 7 into 10 to the power 1 is 70 plus 8 that is 8 10 to the power 0 is what so we say here the radix which initially is 10 i is now varying i 0, 1, 2 and then the w i can be written in this form so a width of the this word length is 3 bits here 178 but essentially it can be represented as r of course as I say is 10 so it is a decimal number in the case of binary what we say earlier we know about it in the case of binary radix is 2 and in hexagonal it is 16 octal it is 8 so all that we are saying is the radix is for a fixed integer radix r to the power whatever we write we can get each bit and for that if we add them together as we wrote here in the case of add all of them this is w 1 x 1 plus w 0 x 0 plus w 1 x 1 up to the word length then we can get the net x this is what the digit 10 digit of sorry the decimal numbers can look into similarly one writes if I write 1 1 0 1 4 bit number which essentially what I am saying this is 1 into 2 to the power 0 in the case of binary then 0 into 2 to the power 1 plus 1 into 2 to the power 2 plus 1 into 2 to the power 4 so one can see here this number in decimal essentially is sorry I am very sorry to the power 3 so this number is 8 plus 2 12 plus 1 13 so this is 13 this is say 2 and is equal to 13 in 10 so what is essentially trying to tell that if I know my r I can always represented any number in terms of w i and therefore some of all such w i x i can give me up to the word length can generate my numbers now there are apart from the integer numbers there will be some fractional fixed point arithmetic is used in most of the processor application for calculations now for example there are number of ways in these numbers can be represented one of course is called sign magnitude number this is of course we are talking of binaries but true for all but right now this is only shown for binary number so we say number x is equal to 1 minus 2 x 0 summation of i is equal to 1 to the power w d minus 1 which is nothing but i i 2 to the power minus i so what we have done is essentially we said the first the last all the bits are kept as a fractional power so rest power is kept as it is but the first bit before the integer is essentially the sign bit this of course depends on the signs of the operand will show into some numbers for that then the other most commonly used mathematically used number binary number is called one's complement so there we say it is x is equal to minus x 0 1 minus q 0 plus sigma i is equal to 1 w d minus 1 x i 2 minus 1 where q is 2 to the power minus w d minus 1 now in this multiplication is more difficult in the case of once this so we then change over the once complement to another complement number x is equal to minus x 0 now we do not need to define q here because we already q is not required in this directly we can write minus x 0 plus i to the power of x i 2 a minus 1 now by doing this we what we did essentially we have added oh sorry the q must be here so what we are essentially saying by doing this what we have achieved is we will see this wise 2's complements are commonly used it finds that these kind of representation can be done if we do then this can be used extensively in almost all arithmetic there is of course a joke from the book van Bammer's book we say 3's complement is a crowd and so no more so we 2's complement of course logically one can say you can have 3's complement, 4's complement and multiple complements and finally the version of complement is also possible in what we call binary offset and we will see little later about this this can be x is equal to x 0 minus 1 i 1 w d minus 1 x i 2 minus 1 so these are the 4 possible ways numbers can be represented and one normally uses 2's complement some one survive binary offset rarely signed magnitude and very rarely once complement now why 2's complement is so very commonly used the number if the some lies I suppose all of you are aware when I say you are you have a number which is once complement I think you are fully aware the way we do 2's complement and once complement is that given a number for example if you have a let us say it is a fraction number 0.578215 any number we can take then if I want to represent this is your decimal number I want to represent this into binary number so first thing I multiplied by 2 now this one is essentially the first bit will come 0.1 in binary the next bit can be take this value 15643 0 multiplied by 2 so you get 6841211 and 3 and since there is a 0 so the next bit is 0 now take this again 2 multiply 017526 and since again there is a 0 here the next bit is 0 keep doing this till you achieve the sum number equivalent number of binary the this is essentially we say we convert this into a normal binary so if I want to make any binary number into a 2's complement all that we do is we add we actually complement number whatever 1's and 0's and once we do those that number is available then we actually add 1 and we get 2's complement or for take a simple number of say 37 in decimal then it is essentially if I divided by 2 so let me divide by 37 by 2 and let us see 18 sorry 36 so I get 18 so 1 then I keep dividing 18 so I get 9 so I get 0 and I keep doing this number so I say 37 is essentially 32 plus 4 plus 1 which is 2 to the power 5 plus 0 into 2 to the power 4 plus 0 into 2 to the power 3 plus 1 into 2 to the power 2 plus 1 into 2 to the power sorry 0 into 2 to the power 1 plus 1 into 2 to the power 0 so the number is 1 0 0 1 0 1 so this is your 37 so if I want to take the complement 1's complement just complement this 0 1 0 1 0 is 1's complement and 1 so I get as 2's complement now this is very standard arithmetic nothing great about the only why I am actually showing just to show you this numbers was if you have a fractional numbers you can do similarly by actually multiplying by 2 on the fractional side and divide on the integer side so you can always get this so let us say the example which I am showing here you have 2 numbers to add 3 numbers to add 1.101102 0.1002 and 1.001 also 2 that is in binary and let us say I add these 2 numbers 1.102 plus this if I add this one can see from here 0 plus 0 is 0.0 plus 1 is 1 so that is fine 1 plus 1 and there is the problem 1 plus 1 is 0 but then there is a overflow has appeared and that is the additional bits have appeared. Now one can see from here this 1 coming here essentially tells you it is a sign bit so the number is minus because this number was minus so who is this product as add is also minus because this is larger than this so if so the net number is minus and you can see from here this is 2 to the power minus 1 2 to the power minus 2 however since it is a complement number to get the back you must actually keep these first two digits so this becomes 110 110 is 6 divided by 8 because 1 upon 2 4 so it is 8 2 to the power 3 2 to the power minus 1 2 to the power minus 2 2 to the power minus 3 which is 8 and this number which I convert decimal to be 110 which is 6 so minus 6 by 8 but you actually if you do not truncate the extra bit you can never get this 6 by 8 so there is an overflow. Similarly if I do this the last 2 numbers 1.00 into this then I get 0.011 which essentially is 101 which is 3 and sorry this is 3 this is a this is both minus therefore the net number is 0 here so values plus 1 1 is 3 and each 3 bits so 1 8 so 3 8 so I can see from here this because of the additional bit which appears because of the number I chose in the sign bits are available. The 2's complement automatically rejects the overflow and that is why probably in most of the partial sums in any adder or even later multiplier we probably will look for 2's complements. Another number system which is often used in DSPs at least or ASIC based DSPs is called redundant number system. It is used as I said in most of the DSP chips which is based on ASICs. The advantage of redundant number system is it simplifies arithmetic. It speeds up arithmetic as no long carry chains. Its cost is high as more complex it is involved. 0 and sign detection as well as conversion from conventional to redundant may reduce the speed because it will take some time finite time to do these operations and but still one of the major reason why RNS numbers or redundant number system is used is it is very easy to implement using ASIC chips ASIC based designs because it simplifies the arithmetic and to some extent you may lose some speed here but you may gain little larger speed because of there is arithmetic chain is no long carry chains here so you may gain some speed here lose some speed here net you may get little more speed than what you thought otherwise. However it is not free because it has a cost as high as more it becomes extremely complex chip design and therefore cost of design increases. So, one says cost wise is higher speedier simpler to implement however these two cost as I said. Now if you look at the three possibilities in which this redundant number system uses digit codes there are three presentations known one is called sign digit code SDC, canonic sign digit code CSDC, online arithmetic code which we shall see later. As I said three most of the sign magnitude number has only one bit representing one either it is plus or zero however in the case of sign digit code each digit has its own sign that is it can have a minus 1 value, zero value or plus 1 value. Since it has three values so long carry propagation is avoided because you can you do not have to do carry occurs because there is a plus s sign or minus s sign appears during the operations since each bit is carrying its own sign bits. So, one does not have to evaluate carry too long and therefore it may be speedier. I may show you some actual design using digit codes or at least maths behind it may reduce addition operation some multiplication algorithms because not every algorithm but many algorithms if you use which are used in multiplication. A number x in the range of 2 plus q x less than 2 q where q is again 2 to the power minus w d w d is the word length. In the case of SDC which is called sign digit code one can represent this as i to the power 1 w d minus 1 where x i is minus 1 it is a weight x i is minus 1 0 or plus 1. So, take an example 15 by 32 in this in a two's complement this number is 0.01111 5 1's sorry 4 1's then in SDC I can represent that since it is a 0 so positive number. So, I can now represent this number as it is 0.01111 or I can also represent the same number into its complement that is 1001 and either in the true form or in the complement form this number can be represented and wherever you need complement you can use this code if you want the true number you can use this code and your operations will be much simpler particularly like in subtractor you do not have to then use this code you may use this code actually. Similarly, minus 15 by 32 can be represented in two's complement as 1.0001 now either now since it is a 1 and it is a sign magnitude so you know it is minus. So, you write 1.0001 SDC as this number which is the minus sign or you actually get a complement of this complement of all of it 1 remains 1 but the rest bits are complemented. So, 0 so this represents minus 15 by 32 this represent plus 15 by 32. Now, what is the advantage of SDC which I have already said it to some extent but the major interest in implementing all arithmetic hardware using SDC has that it will create minimum non 0 digits you know because if your number or the coded data becomes has larger 0's then then real multiplications those 0 terms can be directly put to 0's and therefore the partial sums which you are creating then may actually you would not require to evaluate every now and then and this idea that since the SDC reduces the number of 1's compared to 0's this can be useful particularly in multiplications. The other possibility is called canonic sign digit code either actually it is a SDC but each number has a unique represent instead of two possibilities we define this is the one number which will the way we will define here no two consecutive digits are non 0 whereas in the first case it can be. So, the idea in because of the way we define our canonic form we say the two nearest numbers can never be 1 and that is the great thing because you are certainly 50 percent of them will become 0 at any cost and because of that we may have much lower 0's or much larger 0's in the data which reduce the CSDC has minimum number of non 0 digits which is WD by 2 is normal by see normally you may have a possibility of 50 percent 1's and 50 percent 0's this will always be less than 50 percent. So, number of 0's will be always larger which means in multiplication is specific the time taken to multiply will be very very small and therefore you have a faster multiplier using canonic sign digit codes. A third and the interesting one is online arithmetic as it is code this code is used as latency that i th digit of result is computed using only the first i plus delta digit where delta is the positive constant. Once first delta digits are available the first digit of result can be computed and following digits can success successfully computed for the new digit of operand. This is a very useful method of doing arithmetic because if you see a typical DSP chip or DSP processing system like an FR filter or take any such recursive filter which have a regerci algorithm. You have number of adders and multipliers in chain of depending on the tabs you use in this case they you can delay you can do once till some a delta bits are complete and then the first bit start coming. So, initial delay but then every time cycle you have another bit coming out. So, overall the speed may improve this is like using some kind of a pipeline. So, online arithmetic is equivalent of putting some kind of a pipeline in coding because of this kind of coding you can do pipeline circuit equally saying that it will be faster. The another possibility of representing digits in the case of any system particularly for binary which we are interested in ok. Why are we doing all this? I keep saying that I have some things to always look for I am looking for high speed arithmetic operation at low power. If your number of operations are smaller then one may say the power is reduced. If your data flows faster and gets the data output faster then you say your speed is higher. So, we are looking for both specifications one the high speed and low power and we keep looking for variety of ways or variety of codes in which speed can be minimized a speed can be increased or the delay can be minimized. There is in most arithmetic which you use for adders take it any arithmetic serial or bit parallel or any of them. You have two numbers to add and you may generate a carry and till the next two digits are to be required. Maybe I can give a very trivial example if you have a number 1 1 and you have a number this. So, you add 1 and 1 0 and now this 1 is carry. So, if you have another 1 0 or 1 0 here this 1 carry has to appear here and then you will add. So, this operation cannot be performed till this carry has been generated which means for the successive bits carry must come from the last operation of addition and that means it will always be delayed. If you have a large amount of bits 8 bits 16 bit 32 or 64 bit data then the last bit carry will be coming so slow I mean by so much delay per bit that you will find that the net system output will be very very slow. So, one variety of techniques which we are going to work today on later we are going to look into this feature saying that can we actually avoid the carry propagation itself. We will see in circuit ways we will see different codes we have seen. We will see that another possibility of avoiding carry propagation in a real system is Chinese remainder theorem using a Chinese remainder theorem. Now, please remember this is I am not very sure, but it is some kind of few B C or few A Ds time this remainder theorem was a few A Ds not B C this theorem was created and that time probably it was a mathematical thinking which was later used by electrical people. So, let us say what is the theorem? The theorem says for a fixed point integer the R and S number is represented by a set of residues obtained after integer division by mutual prime factor which is m i is equal to m i which is i is equal to 1 to up to p and the integer number x then is q i m i plus R i where q i m i R i are integers then we say x is equal to R 1 R 2 R R and things of that. Now, this not very obvious after we read this statement of Chinese remainder theorem one does not really get a feel that it is really going to do a great job, but let us see here some examples and some method I will the way it is this m what is called modulo we say we can choose a modulo for say 3 bits 5 3 2 is arbitrarily chosen do not worry these are only numbers any number any basis numbers can be chosen choose 5 3 and 2 then we say the product of the 3 5 3 into 2 is 30 is the range of bit number of possible range in which this operation can be formed. Now, we want to know what we are going to represent any number in decimal 9 10 12 200 whatever it is we should be able to represent in this modulo form 5 3 2 3 bits of our number will come corresponding to base 5 3 and 2 and we will figure it out what is the remainder of that and once we get that remainder that will represent the number 9 or 19 or 3 or 8 here is for example, take a question of 9 and since we are starting with modulo of 5 3 2 so we first pick up the number 5 and divide it to 9 9 is divided by 5. So, if I divide 9 divided by 5 I get 5 into 1 is 5 or maybe I can show you 9 ok I divided by 5 so I get remainder is 4 ok. So, the first residue number representing 9 in the 3 bit modulo 5 3 2 is 4 next take the case of 3 so 9 divided by 3 so 3 into 3 is 9. So, we now get a residue 5 into 1 is 5 3 into 3 is 9 the residues number is 0. So, I get residue 0 and the finally, the 2 and 9 so I say 2 into 4 is 8 and I get residue number. So, I get 4 0 1 as the number which represents as the residues the number 9 for a modulo of 5 3 2 please take it if I choose any other modulo for example, I choose 2 3 1 or 2 3 4 for example, as modulo then the same number 9 2 into 4 is 8 so 1 3 into 9 is 0 and 4 into 9 is 0. So, now this number in this modulo will be representing 9. So, it is not that it is unique, but choice of a modulo decides the range of bits you want to work at and based on that one can choose the depending on the range you have an RNS can number can be represented. Based on this we are done calculation for 19 5 into 3 15 so remainder is 4. 3 into 6 18 remainder is 1 2 into 9 18 remainder is 1. So, you say 4 1 1 for this modulo represents 19 for 3 one can say if I divide 5 3 by 5 3 5 into 0 is 0. So, residue is 3 3 by 3 is 1. So, residue is 0 2 by 3 residue 1. So, 3 is represented as 3 0 1 similarly 8 is represented as 3 2 0. Please remember if I change the modulo these numbers will also change. Now, the statement I made that you do not need carry because you are representing only their numbers in residue. So, we have an addition for example, 9 plus 19 is the addition I want to do 9 is of course, 4 0 1 19 is 4 1 1. So, if I do this I add this to like 4 plus 4 which has under modulo 5 0 plus 1 which is under modulo 3 and 1 plus 1 which is under modulo 2. So, if you see it this becomes 8 1 and 2 8 1 and 2. Now, this 8 1 and 2 can be again represented in terms of 5 3 2 8 is 8 divided by 5 is 3 residue. This is 1 divided by a 3 which is residue 1 because 3 into 0 is 0. So, 1 and 2 divided by 2 is 0 1 2 by 2 into 2 into 1 is 2. So, residue is 0 and you can see whether this 3 1 0 represent 28 in 5 3 2 you can just take it 5 into 5 25 residue 3 3 into 9 27 residue 1 2 into 14 28 residue 0. So, which means the number 3 1 0 represent 28 19 plus 9 is 28. Now, you can see from here this digit numbers which we are used here does not require any representation of minus signs. So, if you represent your numbers in the this there is no carry required because there is no carry generated actually. Now, let us look at this number same numbers are now getting let us say I multiply 8 into 3. So, this 8 into 3 is 3 into 2 into 0 is 8 3 into 0 into 1. Now, I make a product of the 2. So, I do individual products 3 into 3 is 9 2 into 0 is 0 and 0 into 1 is 0 and each other modulo 5 3 1 this gives me 9, but if I have a 9 in terms of modulo 5 this is residue 4 0 0 will give me residue 0 0 in this case. So, since there is a 4 0 0 you can again divided by you can check it 5 into 4 is 20 residue 0 3 into 8 24 is 0 2 into 12 0. So, 4 0 0 represents in this modulo 5 3 2 24. Now, this residue number system has been tried successfully. However, one can see from here though the overall the speed may be faster because there are no carry chains in addition and therefore, in products, but there is a requirement coming because you may you will have to convert all your numbers for a given modulo. Firstly, you have to make a choice how to choose a modulo. So, some hardware is required to make a proper chart depending on the data making that choice then you will have to convert them your numbers into RNS form and then do these operations they themselves may be very fast. One does not know how fast these can always be for a given data and therefore, on an average whether one can always say that RNS system will be faster is slightly talking too big, but in most cases in most digital data which you see this may be valid statement and therefore, we may say that because of this we may able to solve the arithmetic much faster. For your information I have designed an FIR filter using I means my student and one of my colleague, so this I we are designed a 4 bit 8 bit FIR filter using RNS number and this has been actually accepted by TXR instruments. So, before we start having seen the coding part or digit part we now look into possibilities in which we can make the arithmetic go. Now, there are two ways one can do maths arithmetic in the case of circuits one is called bit serial or other is called bit parallel. We will see this as we go ahead the less of hardware. So, saving on the silicon real estate yeah I mean since the it is a it will takes it will be slower obviously, because every bit is entered some processing has been done some part may be fade back for the next data to be processed. So, on an average it will be slower, but not necessarily every time we will see some faster serial bit processing. However, in normal case it looks slower, however it may require only one block to design because at a time only single bit is entering. So, your silicon area is smaller. Now, bit parallel yeah it is all the bits are simultaneously being processed obviously, it will be faster. However, the hardware will be very large comparatively. So, we are trying to see can we do in between bit serial bit parallel partially this partially that and can we do better and in reality ratio of speed in two cases is very small as parallel information has a long carry propagation paths. Area bit parallel chip will be larger both are roughly similar in power dissipations and therefore, that is not a big criteria. And therefore, as I keep saying the optimum may be series parallel combinations. A typical circuit shown here is bit serial arithmetic processing of LSB begins the operation. This is your full adder for example, this is your D flip flop or a one bit register. You have a data x 1 x 0 y 0 first appear out of x i y i and you know x this is the sum is x r y i c i minus 1 and the carry generator is majority of x i y i c i majority gate of these three. So, in the this full adder this is what has been performed you create s i and you get 1 first s 0 c 0 and this has to be now going for the next x 1 y 1 data when it comes this c 1 has to be fed back or c next bit last carry has to be now supplied to this and therefore, a delay element has been added when the next bit appears second bit appeared you are already one bit delay. So, one bit delay is provided from here and when you the three now three thing x i y i c i minus 1 do another s i c i and you keep doing operations till the bits are completed. Please remember the last when the bits are clearly over the you still have to do one more round because the last carry has to be refit which will go into the MSP of the sum. So, you say one more additional operation then the number of times you go through please remember this is also a requirement in bit series. Now, the subtractor circuit is just add that you use complement numbers once complement x y x i y y and then you do the same thing can be operated for a subtractor. Whereas, if you look at bit parallel here is that bit parallel circuit full adder is the basic element and uses 2 c number system 2 is complement this c is slightly big but does not matter. So, a b c f a sum and c 1 this is the sum and carry number carry terms this can be rewritten in the generate form. So, we now generate a term two signals we generate we say generate signal which is a i dot b i and we generate a propagate signal which is a i x or b i and one can show there without next time we will come back and see that s i is c i minus 1 bar if p i is 1 please remember p i is x or. So, if a 0 and b is 1 or b 0 and a is 1 only then p i is 1 otherwise p i is 0. So, if any of the bits are same 2 of the bits are same then p i is 0 and that case the sum is equal to c i minus 1 in the case of c i c i is equal to c i minus 1 if p i is 1 c i is equal to a i if p i is 0 you can show you this next time by putting a table and 2 bit table to clarify what I am talking about the bit parallel you can have all bits simultaneously. So, so many full adders you will have a 0 b 0 a 1 b 1 a 3 b 3 and you can actually connect the output of carry to the next block, next block, next block of course, the carry has to propagate in even in the case of bit parallel though the net processing may be faster. The last part before we go to the actual ones is called distributed arithmetic which is what most processor will use your data has been input data is stored in a ROM which is 2 to the power in words then you have an add subtractor or any other operation particularly add subtraction here this receives the data and the output is either given if you wish or it is goes through a register for delay elements to create 8 bits can be simultaneously this is like a parallel serial this is partly serial and then all bits are returned if they are carries they come back through a ROM again and this together now you create the new sum and new carry. So, this distributed arithmetic is the most likely arithmetic which probably one will use in the case of most of the processes. So, we will stop here for the day we will start with the adders specific adders particularly kinds of adders their circuits their possible speeds which one to use when what is the criteria of choice of an adder to be used. Please remember if you are a VLSI person your choice is only based on either you are reducing the power or increasing the speed and at times reducing the number of transistor and hence the area. So, somehow based on VLSI implementation requirements next time we shall discuss this adder designs and will have many of them in fact there are now large number of possible ways in which addition can be formed and the basic idea in every time is see how to reduce the carry part if possible. Most of this work which I am going to use the basic work which is given in Rebe's book on digitalized Rebe and others Nikolay Kintzendrakasen. So, you can also look into them however I will explain them which sometimes one does not read properly in the book. So, I will explain and then and then I may argue you some other adders like link adders or some other adders which does a lot of good things will also look into possibly low power implementations and then show you that in a real advance VLSI chips now there are many possibilities in which adders can be implemented. Thank you for the day.