 So in this video, we're going to look into detail into the floating point data type. So to do so, let's open a new file and rename it into simply floats. And let's look at a couple of examples. So by now you know that any number you write that has a dot in it can be interpreted as a floating point number or is interpreted as a floating point number. So as before in the video on integers, this way of writing a number is called the literal syntax. So we are basically writing some digits. They must not start with a zero, as we saw with the binary representation in the previous video. This is forbidden. And then the numeric literal needs to have some dot in there, some place. And a dot is what basically makes this numeric literal go into a floating point object when it is parsed by the Python interpreter. So the dot must be there if we need a float. So if I wanted to write the number 7.0. If I want to save the dot 0, I can simply write 7. So sometimes you see people write 7. Instead of 7.0 to save one character of typing, but it is the same, of course. So that's floating point numbers. So what else do you need to know? So how do you get floating point numbers in real life? So let's say if you don't have a floating point number from some data set, let's say you're reading in some data and you're given either textual data from a CSV file because it is interpreted as text, maybe, or you get integer numbers and you want to convert them into floating point numbers. Then what you could do is you could simply use the float constructor. So if you call the float constructor, let's say you call it with the integer 3, you get back 3.0. And also if you go ahead and you call it with a text object, a string object, let's say 7.12, we get a string. And why is it useful that the float constructor can take a string as its argument? Well, we saw that in the previous video when we did so as well for the int constructor. As I said, sometimes there are sources, external data sources that only provide string objects, just like the input function that asks the user to type something into a text box. And whatever the user types in will become a string in memory. And sometimes we want to ask a user to type in some number and then we need to go ahead and convert that string into a number, into either an integer or a floating point number and we can use the constructors to do that. So that's pretty trivial. So let's look at a couple of other things that are worthwhile to know about floating point numbers. So the first one being the so-called scientific notation. Well, maybe you remember that from high school or you see these numbers in your college career. So a scientific notation is basically the way, the python's way of writing the following. Let's say we write 1.23 times 10 to the power of, let's say, 3. This way the language I'm using here in the markdown cell is what is called the latex language. It's not important here because this is a python course, of course, but to make it look nice we use it here. So if you want to write a number like this, you can do so in python as well, using python's scientific notation and this simply goes like this. You write the significant digits, which in this case the 1.23 and now you want to multiply it by 10 to the power of 3 and you do that by writing a lowercase e. I think an uppercase e also works, but the lowercase e is kind of a standard. You don't have a space here, so you simply write e and then you write the exponent. So that is the shortcut version and this will give you in this case 1,230. So sometimes you want to use that depending on the data you work with. So other things that are worthwhile to know about the floating point numbers. So let's call that so-called special values. So first and foremost, the floating point standard is standardized by the organization called IEEE. So basically all of engineering, there is an organization that standardizes things and the floating point standard is standardized there. So no matter which programming language you use, if you use the programming language or just floating point object or data type, then they will all behave no matter what programming language you use. And the standard says that there are also so-called special values. So let's see a couple of them. So we use the float constructor and as a text object, I give it some letters NAN. I can write that either way in terms of flow and uppercasing. So let's write it like this, NAN, not a number. And if I execute that, I simply get back NAN, so not a number. So what kind of number is that? Well, if you basically, if you go ahead and you divide by zero, you should get a zero division error. But sometimes some operations in such a situation will not give you an error message, but they will return to you an object that says, well, I don't know what this is, basically. So a value for which we don't know what the value is. And that is kind of what the float not a number object is. It's kind of similar to the built in none object that basically indicates the absence of a value, but float NAN is basically the floating point version of that. So you must be careful. In real life, and that is why I include that in this video, in real life, if you load in data, let's say CSV or Excel like data from some file and you load that using Python, then sometimes you can easily imagine that in an Excel file, some cells that should contain numbers, they are simply left empty. So for example, maybe you have a CSV file where you have different columns for different measurements of things. And maybe for some observation, for some unit that we are trying to observe, there is no measurement for some variable. And then there are two ways that people do in real life. So some people, they put in a dummy value like zero or negative one to indicate, well, this has no meaning, this cell. But other people are a little bit smarter and they just leave the cell empty. And an empty cell is, of course, better than putting in some dummy number to mean nothing, basically. And in this case, when you leave the cell empty and you load it in into Python, then it turns out that oftentimes the different Python libraries you can use for doing that, they give you back the not a number object. And the thing is, with the not a number object, it does not lead to any arithmetic errors if you, for example, add some number to it. So if I add, as we see here, the number one to the floating point number, not a number, then I simply get back not a number. If I add to something unknown, the number one, I don't know what I get back. And so what happens sometimes, and you have to be aware of that, if you're loading data, and let's say CSV data, and you know that there are some empty cells in there, then don't just go ahead and use the value of the cell to do some arithmetic, because you will never see any error message. The only thing you see at some point that you are getting back some meaningless results, and that is, of course, a catcher that you just need to be aware of. Okay, so this happened to me in the early days when I did Python programming sometimes, and then usually what you need to do is you need to redo the entire analysis, okay, so just be aware of that. Two close relatives, but values that are less frequently used are probably the infinity value, and of course, also, if we copy-paste that, the negative infinity. So sometimes the positive infinity is useful if, for some variable, you want to initialize it with a number that is sure to be creative and any other number you can use, you can do that, okay? So if you go ahead, and if you go ahead and, for example, take any big number, let's take, oops, let's take this number here, almost a billion, and let's say if you want to compare that, let's say, with smaller than to the floating point infinity number, then the answer is true, okay? So this number here is greater than any other number, you know? So this is a useful application for that, but as I said, the infinity values are usually a little bit less known, just be aware of the not a number value. That is one big takeaway, so that you don't run into some mistakes that you could easily avoid. Maybe let's look at one more thing regarding the not a number that just comes to my mind. So if you go ahead and you compare not a number to itself, well, if you compare the number to itself, you should get back true because a number should usually have the same value, it should be equal in terms of its value to itself, right? But not a number, a number is also not doing that, okay? So that makes the not a number, not a number number a bit hard to work with. Okay, so these are all the, this is how we create floating point numbers in different ways, and these are some special values. But now let's talk about some bigger thing that you really, really need to understand, and it's not too hard. So let's call this section imprecision. So floating point numbers, they are what we call inherently imprecise. Okay, what do we mean by that? So let's do an example. So if I add 0.1, okay, so one tenth plus 0.2, you all would expect this to be 0.3, okay, three tenths. So let's compare that to 0.3, and now we would all, I guess, that this should be true, right? So let's run the cell and we get back false, okay? So is that a bug, and the answer is no, it's not. Any other language, R, MATLAB, and many, many more, they should give you the same result if you type this in their respective syntax into an interpreter. The reason being, because the numbers are simply not precise, inherently imprecise, that's what they are. So there's nothing we can do about that, okay? So I will give you another example where you see that. So let's import the math module, okay? And let's use the square root function from the math module, and let's take the square root of two. And now let's take the square root of two and raise it to the power of two, okay? So this should basically give you back two, the integer two. However, if you take a square root of a number and beat an integer or a float, it doesn't matter. We will definitely get back a floating point number, okay? And if you raise a floating point number to the power of two, we will also get back a floating point number. However, this number happens to be imprecise. So we get this four here, which I guess a beginner would not expect. Okay, so what is the consequence of that? The consequence of that is that when you work with numerical data, and you know a column or a row of data consists of floating point objects, then you basically must not use the double equals, the double equals operator here, okay? The double equals operator, the comparison operator, is basically taboo for that. Don't use it. So let's say if you wanted to formulate this condition here, or let's say if you wanted to formulate a condition that looks like this here, so let's say double equals two, and you want to do that correctly, the solution to that is the following. We are going ahead, and we are going to define a threshold, and the threshold is the range within which you want to have some equality, basically. So let's go ahead and define it to be as one times e to the power of, let's say, what is a typical procession? Let's say negative, let's use simply negative 12, okay? And then what you do is you take the left-hand side, okay, put that in parentheses, you subtract from that the right-hand side, okay? So you have a difference of the left-hand side and the right-hand side. You put that in parentheses as well, and use Pythons built-in apps function for the absolute value of that. So I mean the difference of the two sides could be negative, could be positive, but we don't really care about what the sign of the difference is, we only care about the magnitude of the difference. So that's why we use the apps function. And now what you get back is some number that usually is very small, like something to the power of times 10 to the power of 10 to the power of negative 17 here. So what you do then as you compare is that strictly smaller than my threshold within which I am ready to accept equality. And if so, then this is your comparison, okay? So if I repeat that for the other example, let's go ahead and take the left-hand side of this, put it inside parentheses, subtract the right-hand side from that, put everything into parentheses, use the apps function. Oops, let's do that again. Push the wrong keyboard here. So use the apps function, and of course compare that to the threshold. And I forgot one parentheses, so let's put that back in. I did not. So where is it? So we have one that is here. So now we have a true. That means we know that these two sides, the left and the right-hand side, respectively are equal, okay? So be aware of that. So yeah, it's kind of as a data scientist, you work with floating point numbers most of the time, and you must not do any beginners mistakes. Therefore, just be aware of that. And of course, many third party packages have built in methods or functions, usually called something like almost equal, and the almost equal function does all of this logic here, the thresholding logic behind the scenes so that you don't have to do this comparison on your own, okay? So usually this works a bit, is a bit more easy. Okay, let's go ahead and look into what is the problem here. So let's look into a little bit of an explanation for that. So let's call that floats in memory. So floats in memory. What do they look like? So I'm going to use a function, a built-in function called format, which gives me back a text representation of an object. Don't worry about that. It's not the big point here. But let's go ahead and take the number 0.1. And now the format function allows me to not only show 0.1 as the output, but to show me more digits, so to say, more decimals. So let's go ahead and use a special syntax, a formatting syntax, which you don't have to understand for now. Let's say, let's simply go ahead and give out 30 digits to the right side of the decimal point. Let's do that. And what we see is that for the first, usually it's 14 to 15 digits, everything seems to be super precise. And then at some point, we lose the procession. Okay. So to do that with another example, let's do 0.2. We also get back that. Okay. Now you may wonder, can I solve the problem if I round? Okay. So let's say if I take one over three, one third, and let's say we want to go ahead and round this number. So this is obviously going to give us back, let's use five digits. This is obviously going to give us back a zero point and then five threes. So what if I want to represent all the digits or more digits to the right-hand side of this number? So let's also go ahead and do 30 digits. This should be enough to see the problem. And also we see that the rounding works, but only until 14, 15 digits roughly speaking on the right-hand side of the decimal point. So even with rounding, we cannot solve the problem here. And this problem is inherently unsolvable. Okay. So let's look at what is the big problem here. So in memory, let's say we create a floating point number. What happens is Python will simply create a standardized box. So the boxes that model floating point objects are always the same size. And they always have the same behavior attached to it. And let's call it x for now. And let's reference that. And we have a couple of zero and ones in there. And this is of course in a way similar. Then the binary representation that I showed you in the previous video on integers. So why can we see here what the problem is? Well, the problem obviously is, if let's say I go here and let's say I write one over three. So how do we usually write that? Well, usually how we write that is like this, 0.3333 and so on. And we could write it with a dot dot dot. And sometimes mathematicians simply go ahead and say, well, it's 0.3 and put a dash on top of the three to indicate three forever. Now the problem with that is here we have obviously an infinite number of decimals. And that is true for many, many numbers called the real number. So the real numbers contain in particular many irrational numbers like numbers that are like pi or the number, Euler's number E and other numbers, but also fractional numbers like one over three, one third. Also have an infinite number of digits. So many, many numbers have this property. Now this box here that has all the ones and zeros that model the floating point number in a computer's object, a memory, they are finite. And we can make of course the floating point number, the floating point objects twice as big or three times as big and so on. And this is an usually called double precision or quadruple precision. So you hear these terms sometimes. But no matter how big we make this floating point object here, at some point our memory in the computer is limited. So no matter what we do is using simply zeros and ones, it is very hard to come up with a precise representation. There are a couple of workarounds of that, in particular using the so-called decimal data type that Python also supports and also another one. But using the floating point data type, there is basically no way around and that is the inherent problem here. So in this video, we are not going to go into detail of how the binary representation works. We did that a little bit for integers and I think it's really worthwhile to understand how this works in its basics form. So understanding binary representations for integers, I think is a must for someone who wants to seriously become a data scientist. But for floats, for now, I think we can stop here with the CS theory part. And for now simply observe, well, this is finite, this is infinite and there is no way in the world in theory that we can put something infinite into a finite space. This simply does not work. And therefore we are going to lose precision. And so one nice thing I want to show you, there are a couple of numbers that are precise and these are all the numbers that are perfect powers of two. So if for example, let's go ahead and copy paste this formula here. So let's say if I replace the 0.2 here with 0.25, which is a quarter, so a quarter is two to the power of negative two, then we have absolute precision. And why is that? Well, you can think that in a computer, as we saw in the previous video, with the binary representation for integers, all that numbers, all that anything is in a computer is just ones and zeros. So turn the switch on or turn the switch off. So in other words, that is where the power of two comes from. But again, maybe I will do another video in the future where I explain the theory behind it. It's not too hard. I think for a practitioner, it's a bit too much. So I don't think you really need that for a floating point number, but you need at least to know the consequences of that. Because if you don't know the consequences of that, you will make errors when you're doing a data science analysis. And the errors people usually make have to do with the imprecision and also with the special value. So these are the two things that you really, as a practitioner, should take out of this video. So this is it for this video. And in the next couple of videos, we will finish off the topic to talk about numbers. We will in particular classify numbers. So I will see you in the next videos.