 In this lecture we are going to start the discussion of solving inverse problems. We are doing this for the first time in this lecture series. So to get started I am going to consider the simplest version namely static deterministic linear inverse problem. We also have attached a qualifier called well posed problem. We will describe what a well posed problem is as we develop the details of the statement of the problem. I would like to start by describing an example. This is called a straight line problem. Suppose a particle is moving in a straight line. The particle is moving at the velocity v. It started in initial position z0. We do not know where it started. We do not know z0. We do not know what the velocity of the motion is. We can only observe the position zi of the particle at time ti. Let us assume we are going to measure the position of the particle at times t1, t2, t3, tm. Where t1 is less than t2 less than t3 less than ti less than tm. In this case the t here must be a lowercase t, a lowercase t, tm. So let the particle pass through the position z1 at time t1. Let the particle pass through the position t2 at time t2, zi at time ti and zm at time tm. So what is the statement? The statement is the following. We have a set of observations of the time and position. The pair ti, zi for i is equal to 1 tm. In other words we have m pairs of time versus position. This is the position at which the particle appears, zi is the position at which the particle appears at time ti. So knowing ti, zi for i running from 1 tm our aim is to estimate the unknown z0 and v. So this is the data. That is what we need to find. So you can conjure up the particles like this. It is moving in a straight line. It started at the position z0. It is travelling at the velocity v at this is z1, this is z2, this is zi, this is zm. This position is t1, t2, ti, tm. We can only observe the position of the particle at various times. We do not know the velocity. We do not know the position it started. We would like to be able to estimate the velocity and the position knowing a bunch of yam observations. That is the problem. Why this is called an inverse problem? Let us talk about it in a moment. In order to be able to formulate the inverse problem, I need a mathematical model. The mathematical model is one that relates the known to the unknowns. In this case, the unknowns are z0 and v. The knowns are zi and ti. The model is now based on simple basic physics. From basic physics, we know that if a particle started at position z0 and travelling at the constant velocity v, the position zi that it would be at time ti is given by the simple relation from basic fundamental physics. So, this relation relates the unknown z0 and v to the knowns zi and ti. So, I have n, I have m values. I have m equations like this. So, this I can rewrite in the vector matrix notation. So, z1 is equal to z1, z2, zi, zm is a vector. z is the vector of all positions at m different times. z0 and v are the unknowns. By using matrix vector multiply, you can see if I multiply the first row by this column, I get z0 plus vt1, z0 plus vt2, z0 plus vti, z0 plus vtm. Each one of them corresponds to the position at various times. So, I call the vector z1 to zm as z. I call this matrix with the first column 1, second column t1, t2, tm as h. I call the vector z0 and v as x. So, x, the components of x, there are 2 components. First component is z0, second component is v. The unknown is a vector of size 2. The known is a vector of size m. h is a matrix. h is a m by 2 matrix. It is m rows and 2 columns. So, this problem can be stated as z is equal to h of x. So, z is equal to h of x is the mathematical relation that simultaneously captures all the positions that are observed at m different times. So, z is a m vector. h is a matrix which is m by 2. x is a vector which is r2. Now, please go back to our definition of direct problem in inverse problem. Given A, given B, I am sorry, given A, let me erase the given A, given x computing B is equal to Ax that is the when these 2 are given computing this that is the forward problem. Given A, given B computing the solution Ax is equal to B that is the inverse problem. We have already seen that in the several classes. Therefore, here this problem z is equal to h of x. h is known, z is known. I need to find x. So, this problem is an inverse problem in the sense of the inverse problem that we have talked about. So, given z and h, given z and h find x this is an example of a linear problem is an example of an inverse problem is an example of a linear inverse problem. The unknown is x, the unknown does not vary in time because v is constant the position where the particle started is also a constant. So, it is a static problem. The relation z is equal to h of x is the model equation. This model is a static model. Therefore, we want to term this as static deterministic linear inverse problem. This is the simplest of the problem that one could formulate. So, what does it tell you? Based on a bunch of observations I do a mining, I build a data, I build a model from the data. The mining rule that helps us to build the model is the basic relation in physics, Newtonian laws. So, using the Newtonian law I fit a model. Once I have a model I know what are known, what are unknowns. It turns out this problem as a problem is an inverse problem. So, let us create some nomenclature. z belongs to Rm. Rm is a set of all vectors of size m. So, that is what is called the observation space in here. You can see the observation space Rm. So, Rm is called the, z is called the observation vector, Rm is called the observation space. Likewise, x is called the unknown vector, okay. In the previous case, x has 2 components. I can generalize that to n components. So, the unknown x is going to have x1 through xn. Also, Rn is a model space, h matrix that Rm, R of m cross n is the relation between the model space and the observation space. So, if I have x in the model space, my h maps the x into z. This relation between the model variable and the observation variable is given by the matrix h. So, h is the known matrix. So, let us come in here. So, z is equal to h of x, z is known, h is known. So, if I generalize the particle moving in a straight line as an inverse problem, the general linear static deterministic inverse problem is to solve given z, given h, find x such that z is equal to h of x. This is the first statement of the inverse problem based on a very simple problem in physics. On methods of solving z is equal to h of x. If m is equal to n and h is non-singular, from matrix theory we already know x can be written as h inverse z. But in inverse problem, seldom is the case when m is equal to n. In the case of particle moving in a straight line, the n was 2, m could be many. It could be less than 1, less than 2, it could be equal to 2, it could be greater than 2. So, we need to be able to consider a general case. So, in general, h is a rectangular matrix, h is a matrix of size m by n, m need not be equal to n. So, the standard notion of singularity, non-singularity. Singularity and non-singularity of the matrix is an attribute of a square matrix. There is no concept of singular or non-singular rectangular matrices. So, when there is no notion of singular or non-singular rectangular matrices, I cannot even define what the, when the solution exists and so on. So, we need to consider a case which is harder than solving linear system Ax is equal to b when, so solving linear system Ax is equal to b when a is a n by n matrix, b is a n by 1 vector, when a is non-singular, I simply write Ax is equal to a inverse b. Somehow that I cannot do because a in this case is h, h is not a square matrix. I do not have even the concept of non-singularity of a rectangular matrix. So, this problem, this linear inverse problem even though it is the simplest problem, it does not fit in some of the standard problem that we study in linear algebra. So, we need to develop a theory for beyond what the first course in linear algebra teaches us. In order to examine the solution concept for this, it is useful to define two cases when m is greater than n, when m is less than n. Please remember, m is the number of observation, n is the number of unknowns. So, if m is greater than n, it is called an over-determined system. When m is less than n, it is called an under-determined system. We are going to show that in the over-determined system, the system is inconsistent. What does it mean? There is no solution for this problem. The simple is the system is inconsistent. In the case of under-determined problem, there is no one solution. There are infinitely many solutions. But in the case of Ax is equal to b, when a is non-singular, it is a single unique solution. So, we are now dealing with a problem that does not have that may not have a solution or that may have either infinite solution. So, these are the two classes of problems that linear inverse problem gives rise to. So, linear inverse problems is more difficult than solving linear systems. So let us consider an over-determined case to examine why the system can be inconsistent. So let us take an example of m is equal to 3 and n is equal to 2. Let us consider a case of h. The first column is 1, 1, 1. Second column is 1, 2, 3. What does it mean? t1 is 1, t2 is 2, t3 is 3. We can think of the particle moving in a straight line. I am observing at time 1, 2 and 3. In this case, if I look at the column, the first column is 1, second column is 1, 2, 3. There are only two columns. These two columns are linearly independent. Why? No one column can be expressed as a multiple of the other. So in here, the columns of h are linearly independent. That means if columns of h are linearly independent, I can consider the span of h, span of the columns of h. What we recall in the module on finite dimensional vector space, we have defined the span to be the set of all linear combinations of vectors. Here the vectors are columns of h. So span of h is equal to in this case, these two vectors are linearly independent. So two vectors each of size 3 define a plane. So this defines a plane which is a subset of R3. So R3 is a 3 dimensional space. The span of the columns of h defines a plane embedded within that 3 dimensional space. It is in this space, we have to do perform certain computations. Now let us consider I have an observation which is 0, 1 and 2. Since this vector z can be expressed as minus 1 times the first column plus 1 times the second column, we can see z can be expressed as a linear combinations of the two columns. That means z belongs to the span of h. If z belongs to the span of h, the solution z is equal to h has a unique solution. So z0 is equal to minus 1, v is equal to plus 1. So this is the case where I can solve an over-determined system but sell them such a case arise in practice. That the columns of h are defined by the mathematical model. The columns of h comes from the basic physics equation. But z is the column of observations that come from the real world measurements. The mathematical model describes the real world but the reality is given by observation. Generally when we say observations, observations also have noise embedded in them. They are corrupted by noise. Observations have noise embedded in them and so there are two things. Observations always have noise and models are always only approximations of reality. So these are two fundamental facts. Hence more often than not, so this is more often than not. Hence more often than not, z does not belong to the span of h. If z does not belong to the span of h then there is no solution in the sense that there is a vector x that will satisfy the equation z is equal to h of x. Therefore in principle when m is greater than n the equations are inconsistent. Inconsistent means what? There is no solution that can make the left hand side is equal to the right hand side. Let us take another look at the inconsistent case by giving little bit more specific example. I consider the same h but I consider a vector for observation which is slightly different from the one that we had. Previously I had observation 0, 1, 2. Now I am going to pick an observation 2, 3.5, 4.2. It could occur in practice. We should allow for this possibility. So I would like to ask myself the question does that exist an x such that z is equal to h of x thus there exists an x such that z is equal to h of x when h is given by this and z is given by this. So I want to ask myself the question does that exist a h? Let us explore this little bit further. So the first equation tells you x1 plus x2 must be 2. Second equation tells you x1 plus 2x2 must be 3.5. The third equation tells you x1 plus 3x2 is equal to 4.2. So if I took the first two equation so let us consider the first two equations. I have two equations two unknowns. If I solve the two I get x1 is equal to 1.5 x2 is equal to 3 by 2. But this solution or the first two does not clearly satisfy the third one. So if you talk any subset of two equations and solve them and substitute the third the third is not satisfied. So this is true whether you solve 1 and 2, 1 and 3 or 2 and 3. Verify the solution of any two of these three equations does not satisfy the remaining equations that is an important thing. So in this sense there is no solution to z is equal to h of x when m is greater than n. That means in the case of over determined system when I have more so what do you mean over determined system n is the number of unknowns to be estimated. m is the number of knowns if the number of knowns m is larger than the number of unknowns n the system is over determined in this case the system may not have a solution. So that is a difficult situation to be n. Now let us worry about the under determined case. Let m is 2, let m is 2, n is 3 in this case I am assuming a h is of this form. I would like to be able to solve the equation z is equal to h of x in this case z1 is given by this z2 is given by this. Now what is that I can now do I can take the first two variables on one hand and kick the third variable to the other side. So I can rewrite the first equation like this I can rewrite the second equation like this. The determinant of this system is not 0 therefore I can solve these two equations. But if I solve these two equations let us look at the right hand side z1 and z2 are given to us z1 and z2 are given to us x is something to be found. So I am going to express x1 and x2 in terms of x3, x3 is a free parameter now there are infinitely different values x3 can take. So for each value we assign to x3 I can find a corresponding x1 and x2. Therefore there is a pair which is x1 of x3 x2 of x3 that means both x1 and x2 are functions of x3 because x3 occurs in the right hand side. There are infinitely many choices for x3 therefore there are infinitely many solutions. So in this case there are infinitely many solutions there is no uniqueness. So in one case there is no solution in another case there are infinitely many solutions. So we are in between a devil and a deep sea. This is the typical nature of the inverse problem. Inverse problems are generally harder that is why in training in colleges we generally learn to solve forward problems because forward problems are not easier to solve. Once you learn how to solve forward problems using the knowledge gained in solving the forward problem then we can hope to solve inverse problems efficiently. The summary of the linear inverse problems now z is equal to hfx sorry z is equal to hfx h is the full matrix of full rank so please understand when h is m and m the rank of h I want to remind you rank of h is equal to the minimum of m and n. So when m is greater than n, n is the minimum the rank of h is n. There is an over determined case is an inconsistent system there is no solution. This is the summary when m is equal to n the rank of h is n there is a unique solution. When m is less than n the rank is m there are infinitely many solutions in non uniqueness. Generally in a linear algebra course we essentially deal only with this case. These two cases are too difficult. We solve the over determined problem under determined problem using the method of least squares. So what is the least square solution? Solution least square solution. Your solution is the left hand side must be equal to the right hand side. The least square solution is a solution that may not force the left hand side equal to the right hand side but in terms so we still call it a solution is a generalized solution. So least square solution is a generalization of the concept of solution therefore least square solution is a very special class of solutions that one has to develop to solve over determined and under determined cases. So now that we have seen the formulation of the problem so linear static deterministic of linear least square problem have two versions. One is the under determined another is the over determined. Now I am going to move towards developing a strategy to solve the problem. So what is the method? The method is called unweightedly square solution. I am going to consider the over determined case. Let this is r of this I am sorry this is r little r of x it is not lambda r of x define r of x is equal to z minus h of x z is a vector h of x is a vector the difference is a vector that vector is a m vector that vector is called the residual vector. If the residual vector is 0 z is equal to h of x but we have seen we often cannot have the residual vector to be 0 in the case of over determined under determined. So when m is greater than n there is no x for which r of x is 0. So as a compromise what is that we do the value of r of x for a given z and h value for r of x depends on x. So r of x is a vector when r of x is 0 we get the classical solution. So what is the generalization of the classical solution for every x r of x is a vector every vector has a length I want to be able to find an x for which the length of this residual vector is a minimum. If the length of the residual vector is a minimum means I am trying to force the right hand side to be as close to the right hand side as possible. We cannot make the left hand side and the right hand side exactly equal we can bring them as close as possible. This notion of being close instead of being equal is the generalization that comes from the concept of the squares. So as a compromise we seek a vector x belonging to rn for which the vector r of x will have a minimum length. So we would like to formulate the problem mathematically so that I can develop an algorithm to that end I am going to define a function f of x. So what is f of x? f of x is square of the norm of the residual vector now you can see the norm of the vector comes into plan. The square of the norm of the vector is simply the inner product of r with r r of x with r of x or transpose r and that is equal to sum of ri square i is equal to 1 to n. So which is the sum of the square of the norm of the residual. So f of x is a function of x that represents the sum of the square of the residuals it is called the r the square of the norm of the residuals. So what is r? So r is a vector it is m components r1, r2, r3, rm ri is the ith component of r ri is equal to zi – h of i star. So this is h of i star what does it mean h of i star means h of i star is the ith row of ith row of h h of i star. So this should be i star i star in the same line. So which h same thing in continuing here is i star is the same line that is the ith row of h that. So the inner product of the ith row of h and x when subtracted from zi is the ith component of the residual vector. So f of x is the sum of the square of the components of the residual vector we want to find a vector x that minimizes f of x that minimizing x is called the least square solution. So I would like to comment on this a little bit. We have a case where we already know that there is no solution even though there is no solution I would like to be able to look at a generalized concept of a solution. The generalized concept of a solution is that value of x for which the length of the vector residual vector r of x is minimum. So we have converted the problem of solving a linear least square problem into one of optimization problem. So that is where the optimization comes into play. So now you can see where the knowledge of final dimensional vector space, knowledge of norms of vectors, knowledge of minimization and all the things comes into a hue. That is where the importance of module 2 on mathematical primaries becomes fundamental to the persuasion of data simulation problems. So f of x is equal to r transpose x times r transpose x times r r transpose x times r this r is z minus h of x we already know. So this is z minus h transpose z minus h we already know the following a plus b transpose is equal to a transpose plus b transpose. We also know ab transpose is equal to b transpose a transpose these are the 2 formulas I am going to utilize. So I first distribute the transpose then I use the product rule. So this product becomes equal to this product. Now I am there are 2 terms in here there are 2 terms in here I am going to multiply they are going to be 4 terms it turns out each of these terms are scalars now look at this now what is f of x? f of x is a function from r n to r. So f of x is a functional f of x is a scalar valued function of a vector f of x maps r n to r is a functional. So each of these this is a scalar this is a scalar this is a scalar this is a scalar the sum of all the scalars this is the quadratic function in x you can readily see this is the quadratic function in x this is the linear function in x this is the linear function in x this is constant with respect to x. Now it turns out if you consider this z transpose hx that is a scalar transpose of a scalars itself therefore z transpose hx is equal to z transpose hx transpose but the transpose of the product is the product of the transpose is taken in the reverse order which is this. So the transpose of the first the second term is the third term transpose of the third term is the second term these 2 terms are equal. So I can reduce the 4 terms to 3 terms by saying f of x is equal to z transpose f of x is equal to z transpose z 2 z transpose hx plus x transpose h transpose hx now h transpose h that is a Gramian you may remember that a transpose a a transpose they are Gramian. So this is a Gramian matrix and this is also a quadratic function in x so this is the quadratic function quadratic function in x. So we have converted the problem of estimating the unknown as one of minimizing a quadratic form in 7. Therefore I want to be able to estimate the unknown the estimation of unknown is recast as a minimization of a quadratic function so you can see the importance of all the things that we have seen in module 2. Now I would like to be able to explore this objective function a little bit further h transpose h is equal to h transpose h transpose you can really see therefore h transpose h is symmetric. So I want to first show that this matrix is symmetric sorry I would like to therefore this matrix is symmetric. If you look at the previous term the quadratic term is x transpose h transpose hx so I am considering x transpose h transpose hfx I can rewrite this in this particular form x transpose h transpose hfx this can be written as hx transpose hfx and that is equal to hfx the norm of square. So when m is greater than n the rank of h is n the rank of h is n the rank of h is n and the columns of h are linearly independent. Therefore hx is 0 exactly when x is 0 hx is not equal to 0 when x is not equal to 0 these 2 comes from the linear independence of the columns of h that comes into play. Therefore this quadratic form is greater than 0 for all x not equal to 0 is 0 only when x is equal to 0 this implies directly h transpose h is not only symmetric it is also positive definite. So this quadratic function is a positive definite quadratic function. Therefore what is that we have now we have accomplished number of things I would like to be able to consider this ffx a constant term linear term quadratic term quadratic term is symmetric positive and quadratic form. If I want to minimize I am going to compute the Hessian and the gradient. So compute the gradient there are 3 terms gradient of the sum with the sum of the gradients. So gradient of z transpose 0 that is what x is 0 second derivative the Hessian is also 0 gradient of 2 times z transpose hx is equal to 2 times h transpose z this can be computed I would like everyone to be able to verify this using the formula that we have already derived in the class on multivariate calculus. The second derivative of this term is 0 the first derivative of the quadratic form is this the second derivative of the quadratic form is also this if you combine all these results term by term I get the gradient of h is equal to this term I get the Hessian of h to be this the Hessian is already symmetric and positive definite therefore if I equate the first derivative to 0 and solve it that solution must be a minimum because I am I am equating the gradient to 0 and at the place where the gradient to 0 the Hessian matrix is also positive semi definite it satisfies the necessary and sufficient condition for the minimum therefore I have found the minimum of the objective function which is f of x. So by equating the gradient to 0 by I get this this so I can transfer the negative term to other side cancel 2 so the optimal solution is given by the solution of a linear system h transpose hx is equal to h transpose z now please understand h transpose h is a n by n matrix h transpose z is a n by 1 vector x is also an n by 1 vector. So we are called upon to solve a symmetric positive definite so this is a symmetric positive definite system such systems in least square methodology is called normal equations these are the set of normal equations therefore by solving this and this is this matrix is symmetric and positive definite. So I can solve this by taking the inverse therefore the least square solution x ls is h transpose h inverse times h transpose z which I write to try to write h plus z where h plus is equal to this and that is called the generalized inverse of h you remember when we talked about matrices we talked about the general notion of generalized matrices generalized inverse of matrices sorry. So here we have for the first time in trying to solve a least square problem have come across the notion of a generalized inverse of a matrix and this minimum is at this point the solution defines the minimum because the Hessian is positive definite and fx is a convex function and hence the minimum is unique. The convexity of the function guarantees uniqueness of the minimum positive definiteness of the Hessian tells you the minimum is well defined and the function is convex therefore it is unique it exists so we have in principle solved the linear least square problem the solution of the linear least square problem is given by equation 13 so this is the least square solution. So the definition of the least square solution intrinsically relates to the definition of generalized inverse of matrices. Now look at that we have talked about two types of generalization one is the generalization of the notion of the solution itself the classical notion of the solution is left hand side is equal to right hand side but here the generalization is left hand side is close to the right hand side they are not equal but close. We have also generalized the notion of inverse of a matrix from inverse of a square matrix to the generalized inverse of a rectangular matrix. So h is a rectangular matrix h plus is called the generalized inverse of h when h is a full rank the generalized inverse as an exact expression that exact expression is given by equation 14 which is h transpose h inverse h transpose. So we have introduced lots of newer concepts generalized the old concepts to accommodate the solution for over determined problem and in this process we have demonstrated that all the mathematical tools are used many of the mathematical if not all are used in the derivation of the least square solution and you can see the least square solution is a solution to an optimization problem. Now if I have so least square solution is not a solution in the classical sense the left hand side is not equal to the right hand side I said it has to be close I want to find out how close they are. So I am going to substitute x l as in terms of x so z minus h of l of x is the residual at the minimum the residual at the minimum this residual is a vector and what does the theory guarantees this theory guarantees this is the residual whose length is the minimum and what is the length of this minimum length vector I am I am I am we can we can we can we can readily compute we can readily compute the norm of this so the norm of r of x ls gives you the measure of closeness. So this is the measure of closeness between the left hand side and the right hand side between the left hand side and the right hand side and how do we show the residual is not 0 okay. Now let us look at this now h l of x l of x is equal to h transpose h inverse h transpose z. So if I substitute this in here and if you simplify you have z minus h h transpose h inverse h transpose z so this is the matrix this is the vector in general this matrix is not equal to identity so long as this big this matrix is not equal to identity this is not 0 therefore the the least square solution does not guarantee equality between left hand side and right hand side left hand side is not equal to right hand side but the difference between the left hand side and the right hand side the length of the difference the length of the residual vector is the minimum. So here in lies the difference between the classical solution where r x is 0 and the least square solution where r x is not 0 for the case of for the over determined case it is the best we could do this is the best we could do. Now if you substitute x l less than f of x we get the minimum value of the sum of the squares so that is what is called the minimum value that is the measure of the fit between the model and the observation. So this is the measure of the fit between the left hand side and illustration let us go back to the particle moving in a straight line h is all 1 t 1 t 2 t m I can compute h transpose h this is h transpose so I have I have h transpose I have h I multiply them I get this matrix I I have h transpose z I multiply this I have this matrix. So the normal equations are h transpose h x is equal to h transpose z h transpose h is this matrix z naught is z naught v is the matrix is the column vector x h transpose z is given by this. Now dividing both sides by m t bar is the minimum t square bar is the minimum is the average of the t bar is the average of ti t square bar is the average of t squares z bar is the average of these z t bar is the average of the product z i ti so by dividing both sides of this equation by m this equation this equation becomes this equation reduces to this equation where t bar t bar square z bar they are all defined in here these are 2 by 2 system we can explicitly solve it the the solution for the system is given by this this is an important expression so I got an expression I got an estimate for v star what is v star v star is the least square estimate of the unknown velocity what is z star z star is the least square estimate of the initial position. So if I substituted this in my in my in my in my ffx I get the sum of the square residuals the sum of the square residuals is given by this formula and this this formula tells if you replace z z naught by z naught star v by v star it is the minimum value that is possible. Now we are going to define what is called the arm's error so sse in above is the sum of the square errors sum of the square error divided by m is the average sum of square errors if I take the square root it is a square root of the average of the sum of square errors that is called the arm's error arm's error gives you a measure of the linear factor. If the arm's error is large the fit is loose if the arm's error is small the fit is tight the looseness and the tightness of the fit it all depends on the goodness of the data the goodness and the availability of the data. So a numerical example h is the matrix that is given here h is the matrix that is given here z is the vector that is given here I compute t bar all the quantities in here the 2 by 2 system takes this following form if I solve these two 2 by 2 systems I get v star I get z star of a z naught star so the fitted assimilated model is given by this equation in this case I would like you to verify the sum of square error is 1.5 the square root of the sum of square error is 0.6124 this is the claim I would like you to call you to verify I think it is better to do these calculations and verify the characters of these things to get a feel for how to do the least square computations. Now I can I am going to graphically define the solution because it is a simple case 2 by 2 so I can define what are called contours f of x what are contours contours are locus of points of the constant value. Now f of x x is z naught and v so f of x has this particular form for the example numerical example that we talked about in this particular case I have actually computed the f of x the the the quadratic function the quadratic function takes this particular form in this case this is the particular quadratic function. So what is that we are looking for this quadratic function is like a bow setting and and if you took cross sections of that and project them onto the plane they are called contours using MATLAB I have drawn the contours you can see the the the minimum lies at the center and if you look at the the if you look at the coordinates of this that happens to be z naught is equal to 0.5 v naught is equal to 0.5. So this way for a small size problem of two unknowns you can actually graphically solve the problem by computing f of x and drawing the contours and looking at the center of that contour. So this is a graphical method the the the previous one is the analytical method we can we can solve simple problems by both graphical and analytical methods it is fundamental that we do all these things when we are in the learning process. So far we talked about weighted least squares now I am I am sorry unweighted least squares now I am going to talk about weighted version I am still going to be concerned with I am still going to be concerned with the overdetermined case. So overdetermined case unweighted least squares is what we saw now overdetermined case weighted least squares is what we are going to see. So let me let W be a symmetric possibility with definite matrix of size m by m so instead of so earlier we had considered f of x is equal to z minus h of x transpose z minus h of x in here I am interposing a matrix in between W. So when W is equal to i the un the weighted becomes unweighted I hope you see the difference between the weighted and unweighted. So in order to emphasize the notion of the weight I am now putting a subscript f of W of x. So f of W of x is the weighted sum of squares of the residuals in general in the special case W could be a diagonal matrix with different weights along the diagonal or in general it could be a general symmetric positive different matrix. The difference in weight essentially tells you I am going to give different weights to different components of the squares of the residual error that is all what it means. In the unweighted case I am considering all the sum of the residual squares have the same value total democracy that is what unweighted case is all about. In the case of weighted linearly squares some components have greater weight some components have lesser weight that means I am going to give more important to certain components and less important so than other components. The question will arise how do I decide which one should be more important which one should be less important that is outside of the scope of this discussion that is something that the designer or the person who is interested in solving the problem has to bring to bear those arguments and make sense out of it. But here we are interested in the mathematical setup if you are interested in trying to weight the solutions one not withstanding how the weights are obtained I am going to tell how to handle the weighted case. So, W could in the simplest case W is a diagonal matrix with all one which is identity or diagonal elements with all different elements or it is general symmetric positive matrix. Again I can multiply the whole sides we can try to minimize this as a function of acts this is also a quadratic function of acts this quadratic function of acts I can compute the Hessian the gradient and equate the gradient to 0 the equate the gradient to 0 I get the new version of the normal equation you can really see in in the unweighted case I simply got H transpose H of acts is equal to H transpose Z here I have a W factor inter pursed both in the left hand and the right hand side. So, you can see these two equations have very similar structure. So, the risk or solution so it can be shown H transpose W H is symmetric is also positive definite when H is of rank. So, in that case I can take the inverse of this so XLS is equal to H transpose W H inverse H transpose W Z. So, this is the solution for the linear static deterministic weighted least squares equation 17 is the analog of the weighted least square compared to unweighted least squares. So, so far we have talked about the solution of over-determined systems now we have to proceed to discussing the under-determined case the under-determined case we have less observations compared to the unknowns. So far we considered linearly square problems over-determined case we talked about the case where it is weighted is unweighted we talked about the generalized inverse we talked about the notion of a least square solution different from that of the ordinary solution we also talked about the notion of generalized inverse all within the context of over-determined system. It turns out the theory of over-determined system and under-determined system are related yet different now I am going to bring out the primary difference between the under-determined estimation problem and the over-determined estimation problem that we have already seen. So, consider the under-determined case m is less than n, m is the number of observation, n is the number of unknowns recall in the case of under-determined problem there are infinitely many solution. So, we have headache headache of one kind in the over-determined problem namely there was no solution here headache is of another kind there is not one solution but there are infinitely many solutions the challenge is how do we pick one among the many infinitely many solutions that makes sense for us and why we are interested in uniqueness when you want to be able to compute the solution using an algorithm if you want to be able to calculate every calculation must have a target I want to be able to calculate this quantity that quantity. So, since every algorithm always seeks to find a targeted solution a targeted unique solution we need to be able to build in the notion of uniqueness before we start about without start talking about computing the solution. So, the computational process has to wait until we define what is an appropriate solution what is an appropriate unique solution among infinitely many possible solutions. So, in this case look at this now there are infinitely many solution solution means what the residual 0. So, there are infinitely many x for which r x is 0 if r x is 0 the f of x which is equal to r transpose r is identically 0 if r transpose r is identically 0 there is no x there is no minimization. So, there is no possibility of doing anything similar to what we did in the over determined case for the under determined counterpart. Therefore, we need a new approach we need a new approach to get an unique solution in order to do that we are going to formulate this as a constrained minimization problem and this constraint minimization problem is going to be solved by Lagrangian multiplier technique this constraint minimization problem is going to be an equality constraint minimization problem. So, you can see everything that we have seen in the module on optimization gets to be applied here. So, the pathway to the solution in the under determined case is to formulate the problem as a Lagrangian multiplier problem using equality constraint and this equality constraint minimization problem is going to help us to pick that optimal solution among the infinitely many possible solutions that is the pathway. So, what is the problem statement how do I state the new version of the problem find the vector x belonging to r such that its norm is the minimum look at that now I am not interested in any vector I am interested in picking a solution with a minimum norm but that x not only must have a minimum norm but it also must satisfy z is equal to h of x. So, the problem statement is find x such that the square of the norm is the minimum when it satisfies. So, this must be when it satisfies z is equal to h of x. So, x must satisfy z is equal to h of x that is the constraint and the norm of x must be minimum. We formulate this as a Lagrangian multiplier problem. So, let lambda be r of m because z is a vector in the m dimensional space h of x is a vector in the m dimensional space. So, let lambda be a m vector define the Lagrangian the Lagrangian x lambda is given by f this is the function to be minimized this is the constraint we are following the same formulation that we described in the module on optimization. So, equation 18 becomes a Lagrangian there are two independent variables x and lambda. So, the above constraint minimization problem is now replaced by an unconstrained Lagrangian minimization problem. This problem is solved by standard techniques I want to be able to compute the gradient with respect to x and minimize with respect to x I would like to be able to compute the gradient with respect to l and minimize with respect to l. So, these two equations must be simultaneously satisfied to find the optimal x and the optimal lambda. So, there are two unknowns x and lambda the lambda and x that satisfies these two equations are called the optimal x and the optimal lambda. Now for the x I am sorry for the Lagrangian given in equation 18 if you compute the gradient of x gradient of l with respect to x and lambda there are two equations 2x is equal to h transpose lambda z minus hx is equal to 0. We have to solve these two equations simultaneously if I solve these two equations simultaneously I get the solution to be lambda is I get the solution to be lambda the optimal lambda is given by this the optimal x is given by this. So, if I substitute this lambda from here to this equation I get the optimum least square solution x l of s to be h transpose hh transpose inverse z. So, when h so this is the unique solution in the case of under-driven problem when h is a full rank the rank of h is m it can be verified that hh transpose is symmetric positive definite. So, it is inverse exist therefore x l as the least square solution can be computed in one of two steps. So, solve hh transpose y is equal to z and find the solution y is equal to hh transpose inverse z and then we can compute x l as h transpose y using this I can I can this implies 23 this implies 23. Therefore, the computation of the least square solution is done in two steps one by solving a linear symmetric system under another using the solution substituting this to get the least square solution. So, we have by invoking to the Lagrangian multiplier technique for equality constraint problem we have obtained the solution for the under-determined case. In this case I have I know the formula for x l s from equation 23. So, R of the residual at the minimum is z minus h of l x l of s x l of s is given by x l of s is given by this expression. So, if you think of this and multiply by h you have hh transpose your hh transpose inverse. So, the one is the inverse the other. So, they get cancelled it. So, z minus z is 0. So, in this case the residual is 0. So, the optimal solution is one such where the residual is also 0. So, that means it satisfies the constraint as to be expected as to be expected since we start with infinitely many solution for which r x is 0 this residual at the minimum must also be 0. So, that is verified. So, with this we come to the end of the discussion of the linear deterministic static inverse problem both under-determined and over-determined. We solved the over-determined problem in inconsistent case where we did not have a solution we tried to bring the right hand side and the left hand side together as close as possible. In the second case there are infinitely many solution among the infinitely many solutions we have tried to find the one that is of least length the norm of the solution is the least. So, that is how we induce the uniqueness into uniqueness into the square solutions with this I would like to encourage you to solve a couple of different problems the problems are directly related to the development in the text using I am in particular I am going to emphasize that you must do the MATLAB related computer problem by plotting the contours once you plot the contours you can read off the minimum by graphical approximation by approximating the centre of the contour I am also giving exercises with respect to expressions for the generalized inverse finding the Hessian the gradient of different functions I am also trying to define properties of the Moore-Penrose inverse which we have already discussed when we discuss matrices when we discuss matrices and the properties of generalized inverse are given by these four equations as the Moore-Penrose condition demands. This development is taken from our book Louis Laxmi Vrahanandal published in 2006 dynamic data simulation a least squares approach published by the Cambridge University Press it largely follows the development in chapter 5. Thank you.