 Dear learners, welcome to the session. Today I am going to discuss with you something about floating point numbers. In fact, the format related to floating point numbers, what are they and how are they represented in computers. To begin with, let us try to identify what we mean by floating point representation. Floating point representation consists of primarily two parts, signed fixed point number, which we also call Mentesa and binary point position, which we basically represent in the form of an exponent. Now, that is very surprising to you, isn't it? How a binary point position will be represented as an exponent? Well, if you, you must be habitual of writing decimal numbers. So let us say if I write 2.3 into 10 to the power 2, what does that mean? It means that the actual number is 234 and the position of decimal is actually after 234. So the number is 234 dot. So can you relate now the, the Mentesa, the point of Mentesa to the position of the decimal? Right. So now let us look at this slide once again. The Mentesa can be an integer or a fraction, right? So this is what we can always have a Mentesa fraction as well as integer. Please note that the position of binary point is assumed and it is not a physical point. This is something which you must know. Why? Because the position, if we are talking about the binary point position, we are assuming it to be at some place. In computers, at no place any binary point is positioned. It is just the assumption that this is where the binary point lies. Let's discuss this with the help of an example. A decimal plus 12.34 is a typical floating point number, notation can be represented in any one of the two following formats. One of them is, if I say the sign bit as the first bit, which is, which happens to be zero followed by 0.1234. Now since I have represented the point at this point of time at after just after before, in fact, let's just before one. So what do I need to have the exponent will be two because the number is 12.34. If I shift the point before one, the exponent becomes two. And that's what you can see. This is what it represents 0.1234 into 10 to the power plus two, and that is 12.34. Now the second thing which we can do for the same number, it happens to be if I represent the whole thing as a mantissa, mantissa as representing, mantissa just representing a whole number. So suppose I simply say it is 1234, right? Then the power of exponent is going to be minus two because we have shifted the position of decimal after four. So two points to the right, right? So in that particular case, what we will have point 1234. This is where the mistake happens to be. It actually should be 1234 into 10 to the power minus two, whereas it is written like this. So you should always see that there are some catch points in slides in everywhere and you should try to correct those things as per the proper requirements. Now let's look into the decimal number plus 10.125. What is the equivalent binary number? Now first of all, what we must understand, when we are talking about computers, we are talking about binary numbers. We are not talking about decimal number. Decimal number, definitely we took as an example. Why? Because you are most comfortable with decimal number. But what about the binary numbers, right? So we are looking forward to the binary conversion of decimal numbers. And that's why this particular thing. Now 10, you know, is 1010. You can calculate it. It is going to be 1010. And point 125, if you once again try to calculate it, it will come out to be 0.001. So this is the number which we want to represent in computer. Now how do we represent this? Now floating point numbers are often represented in normalized forms, which basically means fractional mentisa does not contain zero as the most significant digit of the number, of the number is considered to be normalized form. Now basically, if what we are trying to say that the mentisa is fractional, it is not an integer as it was the case of 1234 into 10 to the power minus 2. It is the case of 0.1234 into 10 to the power 2. That means the mentisa is supposed to be fractional in nature. Now that simply means what we are assuming our point to be before this particular position. So our point will be before the whole number which we have represented. And that is an assumed location. That is not a placed position. It is an assumed notation or location which we have assumed for this particular place. So what you can see we have is the sign which is zero. We have the fractional mentisa which is equivalent to 100001 which is coming over here. And since we have moved our point four onto this particular side, so our exponent has to be plus four. That's why this zero is appearing over here and four happens to be 100 in binary form. So you have represented a number 10.125 in the form of a fractional floating point number. But is that all? No. There are many more things as far as floating point number is concerned. So let us identify a hypothetical representation. Let's try to present some information on this particular basis. So what we are doing in this hypothetical 32-bit representation, we are assuming that sign is being represented with the help of one bit. One bit means only one bit. It does not mean zero or one. It means one bit which can be zero or one. Then we have an exponent which we are assuming that can be eight bit long. Eight bit long means if you want to represent any number in eight bits, then it can be from zero to 256 maximum. Not even 256, 255, mind my words, there may be some mistakes. So clarify those mistakes with your consulates. Then we have the significant, significant or mentisa which we call which can be 23 bits because we have in total 32 bits. So the 23 bits are remaining goes to the mentisa or the significant. Leftmost bit is the sign bit of the significant. So we are assuming that the leftmost bit is going to be represented with the help of a sign bit. Mentisa or significant must be in normalized form. What it means that mentisa should always start with the one. It should never start with the zero. The base of the number is two. Obviously we are dealing with the binary number. So base has to be two. The value of 128 is added to the exponent. Once again, there is a question why we want to do that. This is called a bias. Now as I told you, if we are talking about the exponent, what we are doing in the exponent, it can represent it is an eight bit exponent. So it can represent number bit from zero to 255. But in a generic representation, you would like to represent exponent from minus 128 to plus 127, including zero in between. If you want to do that, how are you going to do that? That kind of a thing. If you want to do that once again, what you really need to do, you need to add the bias. For example, if you want to represent the exponent minus 128, just add plus 128, the exponent becomes zero. So minus 128 is equivalent to exponent value zero. If you want to say just zero exponent is zero, then obviously 128 will be added. And the representation for that exponent will be plus 128. There is no plus and minus for the exponents. The exponents are represented as bias. So whatever is the exponent value, just subtract 128 and you will come to know what actual exponent is. This is the way the representation work. Why? Because you have just one sign bit and that sign bit is meant for minty soundly. It is not meant for exponent. So that is why we have this kind of a situation. Next thing, a normal exponent of 8 bits normally can represent exponent values as zero to 255. This is what exactly I told you. However, as we are adding 128 for getting the biased exponent from the actual exponent, the actual exponent values represented in the range will be minus 128 to plus 127. As normalized mentisa, right, since we are using the normalized mentisa, which implies the leftmost bit cannot be zero. Therefore, it has to be one. This is one of the requirement. Now, since the leftmost bit has to be one, that's what we are expecting. Why represent it, right? So this is precisely what we are saying here. So do not store this first bit, all right, and can be assumed implicitly. So what we are implicitly assuming that our mentisa will have 0.1 implicitly available over there and rest digits will follow. So a 23-bit mentisa can represent 23 plus 1, 24-bit mentisa in this representation. So you can, I mean, could you see the importance of utilizing even a single bit with the help of logic, although IEEE 754, which we are going to talk about very shortly, does not use this particular logic. But from the point of view of representation, you can do many optimizations while representing numbers in the form of binary digits. And this is the whole art of computing. This is the whole art of computer science. This is the whole art of digital circuits. So this, now let us take some examples of this situation. The binary number plus 1, 0, 1, 0.001, how it is going to be represented in our situation. Sine bit is 0, mentisa happens to be this, 1, 0, 1, 0, 0, 0, 1, and exponent happens to be 4. So 0, 1, 0, 0. Now in the proposed floating point representation, how this information will be translated into the computer. Exponent add bias. So we added bias into the exponent. This is 128. We added this to this particular exponent. You can see the color is making that whole difference. So what it becomes, this becomes the exponent for the number, which is to be represented with the help of floating point situation. Okay. Sine directly comes without any problem. Mentisa is coming over here, right? So 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, this, and this one is basically the assumed one, alright? So this is what you will find the mentisa to be coming. So this is what you see over here. This is the exponent, which is coming over here. And this is where the mentisa actually starts, alright? So this is first the exponent. This exponent is appearing over here. And this is followed by mentisa. Now please note in our mentisa, it is 1, 0, 1, 0, 0, 0, 1. So this one is implicit. And okay, so this one is implicit not to be represented in that particular situation. And rest everything will be appearing like that, alright? So what you can see, this red one actually is not going to appear in the number format because this is the way we have represented it, right? This is the way we wanted it. You can have different formats of floating point numbers and you can represent the numbers accordingly. The whole idea is take the floating point number, the number which you want to represent, find out what is the bias, find out the exponent, then find out the mentisa part of it, see if it is represented what way, normalized, I mean most of the cases you will find it is normalized fraction. So represent it in the form of normalized fraction and put the sign bit as it is for the mentisa. So this basically defines the number representation. But what is the number range this particular floating point representation format is going to represent? Well, for doing that we calculate several things. What we do? We first find the smallest mentisa value and the smallest mentisa with the implicit one bit followed by 23 zeros becomes, this is 0.1 followed by 23 zeros. So these are 23 zeros 4, 2, 8, 16, 20, 23 zeros, right? So this is 1 into 2 to the power minus 1 which is 0.5. So smallest mentisa in this kind of a representation is 0.5 which is not acceptable. Most of the time we represent number with the help of a zero, right? The maximum value of the mentisa in this particular case will be 0.1 followed by all ones. Now what is the value of this to once again in computing you do a lot of intelligent things and that's what we expect from you too. So rather than finding this value 2 to the power 1, 2 to the power 0, 2, 1, 2, 3 like that you just add 0.000000 last 1 into it just like the way it has been shown over here. Just add it and what you will have which actually happens to be 2 to the power minus 24. So when you add it the whole number becomes 1, alright? And therefore the value this value which is appearing over here actually is 1 minus 2 to the power minus 24. Sometimes you have to do these kinds of calculations too without loss of generality and all. On the other hand the smallest negative number happens to be maximum mentisa and minimum exponent, alright? So which happens to be once again the smallest, so now we are calculating the smallest negative number which is negative, maximum mentisa and maximum exponent. Why? Because that's how the negative numbers are the smallest is going to have the largest magnitude, right? So this is minus 1 minus 2 to the power minus 24 into 2 to the power 127 which happens to be the largest mentisa. So this is the smallest negative number. Smallest negative number is the minimum mentisa and minimum exponent. So it happens to be minus 0.5 into 2 to the power minus 128. Now this is going to be really, really close to zero. It's not going to be exactly zero but it is going to be really close to zero. So you are meeting that particular thing but not exactly in this particular case. Largest positive number is going to be 0.5 into 2 to the power minus 128. Once again it is going to be very close to zero but not exactly there and the largest positive number is going to be largest mentisa and the largest exponent. So this is how you are able to, you are able to find out, you are able to calculate the minimum, maximum kind of a thing. Now when you are talking about floating point numbers, if you remember the stories floated by IBM and floated by Intel, long time back when the Intel Pentium processor was launched. People used to say this, the precision of Intel Pentium 5 was somewhat lacking. What is this term precision means? Let's try to dwell on this particular term. After all this is represented or this is related to floating point numbers. The basic tradeoff is between the range of the numbers and the accuracy. So whenever you are dealing with floating point numbers, the tradeoff is how big the range of the number is and how accurate the number is going to be because you have the fixed number of bits available for you, maybe 64 bits or 32 bits, whatever may be the number thing. The accuracy of numbers will be higher with bigger mentisa. So if you have bigger mentisa, the accuracy is going to be higher. For example, suppose you are dealing with decimal numbers, 10.23, this has a precision of only two decimal digits. Whereas if I say 10.2345, this has a precision of four digits. Similarly, pi 3.14, this is having just two bits as precision. Whereas the value of pi is so large that even the precision number of bits which can be used after decimal can exceed even 50 in that particular case. The accurate value of pi is very, very difficult to find out. So what do you get it, 3.14 is just a, I am cutting down a lot many digits after four. So this is what is the procedure. So the accuracy of numbers will be higher with bigger mentisa. For example, with one bit binary mentisa, you can represent only 0.010 and 11, why? Because this 0.1 is implicit in both the cases. So that can be 0 or 1 in the normalized form. Whereas with two bits, the values such as 0.100, 2.111 can be represented as complete numbers. Remaining all the numbers can be represented, but they have to be truncated. So that is what we call loss of precision. Loss precision will create a truncation or round off error in number calculation. And we should be careful about these calculation errors because for computing, when you are doing computing, these errors are very, very difficult. I mean, you can imagine that whole launch of a satellite can depend on a very minute precision, right? The higher the number of bits in the mentisa, better will be the precision. So you want more number of bits. Now keeping all these things in mind, let's look at some very, very standard floating point representation, which has been developed over a period of time. This is called IEEE 754 standard for floating point numbers. So IEEE standards for single precision numbers basically says that it has an exponent significant and then what are the values and comments I have. So exponent, maximum exponent can be 255. If the significant is not equals to 0, it do represent a number, logical number. If exponent is 255, significant is 0. It represents minus or plus, minus or plus, infinite depending on the sine bit. If exponent is greater than 0, but less than 255, for 255, there is nothing, right? So if that is any, then the number is, in fact, there is a slight mistake once again over here. 255 do not represent the number. That's what you should think. Okay, within this particular range, any mantisa, which is represented by n, will represent plus minus 1 point n 2 to the power e minus 127, telling us that the bias in this particular case is 127, okay? So this is an example over there. So suppose in case n was 101, rest 20 zeros and exponent was 207, then the number which is going to come up is 1.101 remaining 23 zeros and 2 to the power 207 minus 127, which is 1.101 into 2 to the power 80. So this is the way you can represent if the number is like that, then this is what it is. What actual number is, all right? Then zero is represented when significant is, when you have zero power and significant is not equals to zero, then it represents plus minus zero point n 2 to the power minus 126, which takes care of other numbers, which are very, very close. As I was telling you that we are not approaching zero in the format which we were representing. Here you can do that. And finally there is a representation for zero where you have even the significant as well as exponent both are zero, then it represents a zero. But remember the zero is plus zero or minus zero, which normally is not the case when we are talking about the integers. The similar kind of standard exists for double precision. What will be the case in the case of double precision? The number of bits will be quite high, the Menteza bit will be quite high. And that is what you can see that over here, the exponent, this is going to be 64 bit. This is 64 bit representation out of which hardly 11 bits are devoted for Menteza, rest one bit for sine 12. So remaining 54 bits are for the significant or your Menteza. So this is 54 bit Menteza, almost all other things being the identical. So you can imagine that when whenever you used to use in programming, what terms you use float, right? It is a single precision number. It is 32 bit, right? And if you perform calculations in float, the chances of errors are quite high in comparison to double, which takes 64 bit with the 54 bit Menteza. The chances of errors are very less, not that way, the round off and truncation errors, not any other kind of errors. So this is what in nutshell I have to say. Let me try to recap the most important thing for floating point representation is to identify that point is never represented. The second thing which you should understand is that in floating point number Menteza determines the precision of your number. Exponent is always represented in a biased form. The bias is half of the range which you are representing, all right? And the Menteza is represented mostly in the normalized form. In IEEE 754 representation, there is a representation for 0 also, but it is plus minus 0, which sometimes is not that optimal, but otherwise it is okay. At least we have a representation for 0. So floating point numbers are very important numbers set as far as computer is concerned and I hope with this particular discussion you should be more clear about them. If you are not, you are most welcome to ask questions to your consular. Thank you so much.