 This video we're going to be looking at the basics of floating point arithmetic. We're not going to go into as much detail as we did with integer arithmetic, but we'll still see how it works even if we don't see all the details of the hardware itself. We'll begin by considering what our floating point numbers actually tell us. They have a format that looks like this. We have a significant, we have a base of two, and some exponent that we also include in the number. Our number can also be positive or negative, but we'll have to worry about each of these parts when we're performing our arithmetic. We'll start by considering a couple of numbers. These numbers wouldn't be too large. This one, for example, is eleven, so we're not going to get huge results out of this, but it will be enough to illustrate how our problems are working. So if we start with addition, and I want to add these two numbers together, my algebra tells me I can't just add these two products together. They have no common terms in them. So I'm going to have to find some common term. And the way I can do that is by shifting one of these significance. Since I'm interested in keeping the most significant bits of my number, I'm going to want to shift whichever one is associated with the smaller exponent. So in that case, that's this one. This one is a smaller number. It's closer to zero. It's not as large as this one. Even if these were both negative, I would want to shift this one. So I'm going to shift the binary point in this number in such a way that this exponent should equal this one. That will mean shifting the binary point one place to the left. Now I have something that's in terms of two to the fifth. So now they have a common term, and I can add them together. So now I'll align my two significance with the binary point, and just add them together. So my floating point format tells me I'm only allowed to have one bit in front of the binary point, and it has to be a one. So in this case, I'm going to now shift the binary point one more place to the left, and that will allow me to increase my exponent one more time. Now I also have to recognize that I started off having three bits in my significant. So I'm only going to be allowed to have three bits in my significant when I'm done, which means that I would round those two bits off. In this case, they're below one-half. So I'll get dropped, and I'd be left with 1.001 times two to the sixth. I'd stick six in as my exponent, 001, as the significant. That would be the result of my floating point computation. If I was going to do subtraction, I'd pretty much do exactly the same thing. You make sure that both of your exponents are the same number, so that you can then line up the two significance and do the arithmetic. At the end, you make sure that you've only got one bit in front of the binary point, and then round off however many bits you need so that you're back to your original number of bits. Multiplication will be a little bit different, because when I multiply these two products together, I can just multiply their terms in any order I want. So I can decide to just multiply the two to the fifth by the two to the fourth, and then multiply the two significance together, and then do the product of the two of them afterwards, which is great because then I've got one significant over here, and something with the exponent over here, and those will fit nicely into my floating point format again. To do the multiplication here, I can start with the exponent, and the exponent is really easy. I can just use laws of exponents to tell me that two to the fourth times two to the fifth is two to the ninth. I just add the two exponents together, and that goes in as the new exponent. I still have to do basic multiplication for the significant though, so that will work the same way we would expect, and then I need to set my binary point, it's in three, six places, three, six. So this is the number I'd get after just doing the arithmetic. This time I have two bits in front of my binary point, so I need to shift my binary point over one place. So now I'd have two to the tenth for my exponent. For this example, I only get three bits in my significant, so I have these three zeros in there at the moment. But I've got all of this stuff that came afterwards, so I'd like to round that up, which means basically adding a one in this position. So at the end I would get 1.001 times two to the tenth. And that's what I'd want to store back into my floating point format. I have my exponent, I have my significant, and then I would drop that leading one again. I needed to use it for my computation, it was useful, but afterwards I can kind of forget about it again. One other thing to keep in mind here is that I'd like to just add my exponents, but if you recall, I have a bias in both of these exponents. I actually add 127 to them. So 127 plus 5 gives me 132, 127 plus 4 is 131. When I add those together, I get 263, which doesn't look much like 10. Even when I add the bias, 127 plus 10 would give me 137, which is nowhere near 263. So I actually want to go back and subtract off the second instance of my bias. Now I get something that's actually two to the ninth, which is what I expected to get here, because then I was going to shift it over one more bit and I'd get two to the tenth. So when I'm calculating the exponent, I have to remember that there is a bias in here and I have to account for that. Division will work similar to what we'd expect at this point. If I have my larger number divided by my smaller number, then I'm going to be able to apply the laws of exponents to the exponents again, and I'd have two to the fifth minus two to the fourth, so we'll have a two to the first in it. Then I just have to do division for the significands. So I would have something like this. Now I'm allowed to multiply both of these by 1000 so that I can just ignore the binary point effectively. It will still show up in my answer when I'm done, but for most of my computation, I can just assume these are integers. So this number is not going to go into a three-bit number, but it will go into the four-bit number, and I'd be left with a remainder of 10 there. Then I would need to pull down lots of zeros. 1000 is still larger than 1011, but 10,000 is larger than 1011. At this point I'd have some remainder, I'd throw that away, and I would just keep this 1.001 times two to the first is my answer. So this time I subtracted one exponent from the other. Again, if I try that with biased exponents, 132 minus 131, I get one out, which is the right number, but no longer has the bias in it. So I need to go back through and add 127 again to get 128 for my actual value that I put into the floating point format. Then I would just be able to keep, at that point I might also need to shift the bits in my significant so that I have exactly 11 in front of the binary point. I don't want more bits, I don't want fewer bits, and that would adjust the value of the exponent as well. But after that we're basically done. The features of the floating point division aren't too much different from what we had with the multiplication, we're just doing the division operation instead of multiplication. We still have a whole lot of the same concerns that we did before with making sure that our format lines up properly. Actually building hardware for this would be rather more complicated, and most processor developers go to a lot of trouble to make their floating point hardware run very, very quickly, which means they're probably not going to be doing exactly what you would expect. They're going to look for as many tricks as they can find to implement these algorithms so that they run as fast as possible and using relatively little hardware. So we won't go into the details of that, but they're still looking at the same general concept here. We've got two products and we're just manipulating them.