 Hi, in this lecture we will start a new chapter, this is the first chapter of our course. This is about the arithmetic errors. In this lecture we will first introduce what is mean by floating point representation of a number on a computer and then we will see what is mean by floating point approximation and then we will also introduce a notion called mission epsilon and define various errors. Let us start our class with a brief overview of how the numerical methods are implemented on computers and how they give errors. Let us start our discussion from the mathematical problem that we have in hand and we want to solve this mathematical problem. Obviously, if we have a method that can give us exact solution that will be the highly desirable one because we have no error involved in our solution. Such methods are called direct methods, but as we spoke earlier not all mathematical problems can be solved to get exact solutions one needs to therefore, go for an approximation. One way to approximate is to go for numerical methods. The numerical methods can only give approximate solutions to our mathematical problem right. That shows that the method will obviously, involve error when compared to the exact solutions of the problem. These are the errors which are coming due to the mathematical approximation of the problem and we call such errors as mathematical error in our course. Once we devise the numerical method then we will go to develop an algorithm based on the method and further we will be implementing that algorithm on a computer right. Once we implement on the computer then the computer will generate numerical solution for our problem. Now, at this level also we have one more error coming up due to the approximation of the numbers on a computer. This is because computers cannot store numbers with their infinite precision due to their limited memory therefore, they in some sense truncate the number and only stores finitely many digits in their mantissa. This leads to a new type of error which is called the arithmetic error therefore, the numerical solution obtained on a computer will also involve the arithmetic error. Remember the numerical method as such involves the error which is the mathematical error. Now the numerical solution obtained on the computer will therefore, have mathematical error and arithmetic error both together is called the total error. Hence our exact solution is nothing but the numerical solution plus the total error. The problem is in general we cannot obtain the total error exactly if you can obtain then it is equivalent to getting the exact solution itself right. We do not have that luxury therefore, this total error is not known to us of course, if we know exact solution then one can take the difference between exact solution and numerical solution and that will give you the total error, but that also is not possible because if we know exact solution why will we go for the numerical solution right. Therefore, in practical applications we cannot get the total error however, we can get some idea about how the total error will behave in certain problems. For that we need to first understand how this arithmetic error will come into our calculation and how it goes from one step to the other step of a calculation and what kind of impacts that it will give on the total error and this is what we are going to see in this chapter. Let us start our discussion with what is mean by floating point representation of a real number. Assume that we are given a number beta which is some natural number greater than or equal to 2 then any real number can be represented exactly in base beta as minus 1 to the power of s remember if s is equal to 0 that will lead to a positive number and s equal to 1 will give us negative number into point d 1 d 2 dash dash dash base beta into beta to the power of e. You can keep in mind either beta equal to 10 which gives us a decimal representation or beta is equal to 2 which will give us the binary representation right. All these are well known to us I am just trying to define that the only new thing is that the way we represent the number and more importantly this d i's should be some integer between 0 to beta minus 1 with the main assumption that d 1 should not be equal to 0 or if d 1 has to be 0 then all other digits have to be 0 in which case we will be representing the number 0 only. As I told s is called the sign and it takes the value either 0 or 1 depending on whether the given number is positive or negative e is an appropriate integer such that this representation is a floating point representation means this d 1 should not be equal to 0. For instance if you want to represent the number minus 0.045 then how you can represent it? You can write it as minus 1 to the power of 1 into point see you cannot write 0.45 because the first digit should not be 0 therefore, you have to write 4 5 and the base is 10 I am just writing it in the decimal form into 10 to the power of minus 1 right. Therefore, this e has to be appropriately chosen to get the floating point representation of the number that you want to work with. Now, this part of the representation is called the mantissa and that can be written like this and the representation 1 is called the floating point representation often beta is also called the radix. So, when beta equal to 2 the floating point representation is called the binary floating point representation and when beta equal to 10 the floating point representation is called the decimal floating point representation. Throughout this course we will mostly consider only beta equal to 10 next is the floating point approximation. As I told earlier the computers cannot hold a number in its exact form especially when the number consists of infinitely many digits in the mantissa. So, for instance if you want to store the number 1 by 3 you know that it is 0.333 and so on. If you want to store this number on a computer it can only store finitely many numbers and it truncates all the other digits after that number. So, the number of digits in this mantissa will be decided based on the capacity of the computer. So, we will introduce the notion called floating point approximation which we will call it as n digit floating point number to the base beta. So, if we say that the number x is represented in n digit floating point form it means it has only n digits in its mantissa that is the idea all other conditions remain the same. Let us give few examples if you want to represent the real number 6.238 that can be written as minus 1 to the power of 0 because it is a positive number and then we have the mantissa like this and then to make it 6.238 we have to multiply it with 10 to the power of 1. Therefore, we are representing this number in the decimal form that is base 10 and therefore, here h is equal to 0 beta is 10, e is 1 and d 1 is 6 and so on right. Similarly, if you want to represent the number x is equal to minus 0.0014 then you have s is equal to 1 and you have d 1, d 2 or 1 and 4 with exponent e as minus 2. Note that there are only finitely many digits in the n digit floating point representation, but a real number in general can have many digits as we have seen for instance in 1 by 3 we have infinitely many threes in the mantissa right. Therefore, in general a n digit representation of a real number is only an approximate representation. Next we will try to understand what is mean by under flow and over flow of memory when the value of the exponent in a floating point representation exceeds the maximum limit of the memory then we will encounter a over flow of memory. On the other hand if the exponent e goes below minimum range then we encounter under flow of memory. Therefore, in any computer if you take there will be a minimum limit that it can hold and maximum number that it can hold that depends on the exponent. If the exponent is less than a minimum limit then the number will be just considered as 0 and if the exponent exceeds its maximum limit say capital M then that number will be considered as infinity. The number of digits n in the mantissa as given in the definition of the n digit floating point representation is called the precision or length of the floating point number. Now, given that our computer can hold only finitely many digits in the mantissa how is it going to truncate a number which has more digits than it can hold. Well the computers have very sophisticated algorithm of rounding a number, but for the sake of understanding here we will only consider two types of truncating numbers one is chopping and another one is rounding a number and these are in their simplest form just for the sake of mathematical understanding. Let us introduce first what is mean by chopping a number suppose you are given a real number and its floating point representation is given like this. Then chopping this number to n digit means you simply forget the digits from n plus 1th position onwards and take only the digits from 1 to n and therefore, this representation that is the n digit floating point representation just by chopping this part is called the chopping approximation. We will use the notation f l of x for any floating point approximation of a number whether it is chopping or rounding. Let us next see what is mean by rounding a number again you are given a real number which is written in this floating point form then rounding means you take the n plus 1th digit if n plus 1th digit is greater than 0 of course, but if it is less than beta by 2 then we will just follow the chopping idea whereas, the n plus 1th digit d n plus 1 if it lies between beta by 2 and beta then we will add 1 to the previous digit. So, that is called the rounding of a number we have more sophisticated algorithms for this, but we this is the simplest idea of rounding a number and we will consider only in this form in our course. With this let us try to understand the procedure of performing an arithmetic operation using n digit rounding remember we have four basic arithmetic operations plus minus multiplication and division the procedure that we are going to give here holds for any of these four operations. Therefore, I will not restrict the notation to any one of this I will take a symbol dot circle which represents any of this four operations. Let us see how to perform an arithmetic operation between two numbers x and y that is we want to compute this number x circle dot y remember circle dot can be either plus or minus or into or division. So, what you have to first do is to get the floating point approximation of x and the floating point approximation of y. What we are going to do here is simply to understand how a computer does an arithmetic operation like this. Suppose you give x and y to a computer it first takes x and write its floating point approximation as per its capacity that is it will truncate the digits which it cannot hold in its memory it is instructed to do mostly it is instructed to do a rounding approximation and then it also takes the number y and writes f l of y. So, it takes the number x and y and writes in its memory only its approximate value as per the rounding algorithm and then it will go to perform the arithmetic operation. Therefore, in the step two the computer will do the arithmetic operation of the approximated value of x and the approximated value of y and then it will take the final value that is f l of x dot circle f l of y and then makes a floating point approximation of that number and that is what the final answer that a computer will give to us if we want it to perform this calculation. Therefore, if a human being performs this calculation the value may be different from the value that is given by the computer. Now, if we have some complicated expressions like I have x y and z and I want to find x plus y into z then you can just generalize this idea and you can see how a computer will find the value of this expression. How will it find first it will find f l of x f l of y and f l of z then it will add f l of x plus f l of y and then it again makes a floating point of approximation of that sum and then it will multiply that floating point approximation of the sum with f l of z and then the final answer will further be approximated using the floating point algorithm. So, that is the idea of performing any expression. Let us take this example you have f of x equal to x into root x plus 1 minus root x. Let us say we want to compute the value of this function at the point x equal to 1 lakh using 6 digit rounding. How will we do of course, we have to find this value for that first we have to round this number to 6 digits anyway it has only 6 digits. Therefore, you can just plug in that and take the square root and the square root is coming to be like this. You write it in the floating point form and that comes to be like this of course, you have to put a minus 1 to the power of 0, but I have just avoided putting it here then the 6 point rounding approximation of that number happens to be this right. So, what you are doing is you are keeping up to 6 digits 1 2 3 4 5 6 and then 3 4 7 you have just truncated since the leading number is 3 it just goes like a shopping approximation only. Therefore, the floating point approximation of root 1 lakh 1 is this. Now, you go to do the next step that is you get the floating point approximation of the of this number minus this and that happens to be minus 0.1 into 10 to the power of minus 2. Finally, we have floating point approximation of the function value at 1 lakh which is what we want to find is given like this and that happens to be 100. So, remember we have done this calculation with 6 digit rounding at every step of the calculation and we got the value 100. You can see that instead of doing rounding if you do the chopping approximation then you will get the value 200 here ok, but now what is the exact value well we will see the exact value is entirely different from this it is something like 158 or so therefore you can see that the calculation using 6 digit rounding or chopping has magnified the error to a greater extent. Let us see what is mean by mission epsilon mission epsilon of a computer is the smallest positive real number delta such that f l of 1 plus delta is greater than 1. Thus, for any real number which is less than delta we will have the floating point approximation of that number to be 1 only and 1 plus delta cap and 1 are identical within the computer arithmetic ok. So, such a number is called the mission epsilon. Let us quickly go through the definition of error we all know what is mean by error. Error is nothing, but the true value minus the approximate value by absolute error we simply mean the absolute value of this error and next the important concept is the relative error. Relative error is nothing, but the error divided by the true value and that will give us in fact the percentage error which can be obtained by multiplying the relative error by 100 of course the absolute value of the relative error into 100 and that is what we call as the percentage error. We also use some notations e of x a if we say it means the error involved in the approximate value x a when compared to the true value x and that is given by x minus x a and absolute error we often denote by e a of x a that is defined as modulus of x minus x a. For the relative error we often use the notation e r of x a and that is nothing, but the error in x a when compared to x divided by the true value x of course x should be not equal to 0. With this we will now go into see what is mean by significant digits and what is the danger in losing this significant digits in our calculation. Thank you for your attention.