 So, far we have applied the least square method to solving static deterministic least square problems both well posed and ill posed. Given the importance of least squares starting from the days of Gauss I think it is worthwhile to get a geometric view of the nature of the least square solution. Thus far we used analytical methods to derive the least square solution by formulating a problem as a minimization problem both unconstrained and constrained. This geometric view enables us to be able to look at least squares from a very simple perspective of the notion of projections. So, let us consider a vector z, z1, z2 in the 2 dimensional plane this is the vector z. If you shine lights parallel to the y axis x shadow will be cast on the x axis this arrow segment gives you the shadow of z. The shadow is called z1 hat the property of this shadow in figure one is that it is an orthogonal projection. In the sense that if I join the tip of the vector z and z1 hat the angle between the 2 vectors is 90 degrees on the other hand if you shine light in a direction not parallel to the y axis the shadow cast by z on the x axis is z2 hat that if I join the tip of these 2 vectors the angle is theta in this case theta is not equal to 90. So, this is called oblique projection. So, z1 hat is called an orthogonal projection of z on to the x1 axis z2 bar is the oblique projection of z on to the x1 axis oblique and orthogonal decided simply by the direction in which light is shown on the vector z. Mathematically this operation of shining light and projection can be thought of as a matrix p1 if p1 operates on z you get z1 hat. The p1 has a form which is 1 0 the first row 0 0 in the second row. So, in this case p1 z is essentially z1 0 the second component is annulled the first component is non-zero so z1 hat is equal to z1. So, that is an orthogonal projection another hand if you consider p2 to be p1 1 a and 0 0 and apply that operator p2 on z you get z1 a z2 and the second component is 0. So, now you can see when a is 0 p2 becomes equal to p1 when a is not equal to 0 for example a is greater than 0 the first component z2 hat is z1 plus a times z2 z1 is the first component of z say z2 is the second component of z a is positive that the shadow is longer. So, you can think of projection as a geometric operation algebraically the operators matrices p1 and p2 essentially generate this orthogonal and oblique projection this is very fundamental geometric point of view and it is a very close and intimate relation to the properties of least square solutions. Projection as matrix matrices so in the last slide we saw matrices p1 and p2 p1 is called an orthogonal projection matrix p2 is called oblique projection matrix every projection matrix has a fundamental property that is idempotent by idempotency I mean p1 square is equal to p1 p2 square is equal to p2. So, here I am looking for a matrix whose power is equal to itself let us recall in terms of numbers if I have a number a if I want a square to be equal to a to do that I have to solve this equation this equation essentially tells you a times a-1 must be equal to 0 that essentially gives you either a is equal to 0 or a is equal to 1 only. So, there are only 2 numbers which one squared is equal to itself 0 and 1 but in the case of matrices p1 square is equal to p1 it has it can be solved and the solution we saw is given by the matrix in the previous slide likewise p2 square is equal to p2 can be solved one of the solutions for that is the matrix p2 given in the previous slide what is the difference between p1 and p2 p1 is symmetric but p2 is not symmetric. Now I am going to state a very general property of orthogonal projection matrix an orthogonal projection matrix is idempotent and symmetric and oblique projection matrix is idempotent but not symmetric. So, every projection matrix is idempotent it is a symmetry or not symmetric nature of the idempotent operator is going to decide whether the resulting projection is going to be an orthogonal projection or an oblique projection it can be shown every projection matrix is singular that is it is rank deficient if it is rank deficient the determinant the determinant is 0 please verify that the determinant of p1 in our earlier slide is 0 the determinant of p2 in our earlier slide is also 0. So, in this slide we are summarizing the general properties of projection matrices projections are of two types orthogonal and oblique every projection matrix must be idempotent in addition if the projection matrix is symmetric it is orthogonal projection if it is not symmetric it corresponds to oblique projection. Now ordinary least square solutions can be viewed from an orthogonal projection point of view let h so I am considering a very special case where h is a column vector h is a column vector that means n is equal to 1 that means n is equal to 1 that vector h is given by this line h z is a vector in rm so both h and z are vectors in rm I am giving an example of a two dimensional representation assuming m is equal to 2 but the whole analysis holds for any m let z be not equal to h therefore if I draw the vector z this represents the vector z this represents the vector h now I can project the vector z onto h to get z hat z hat is a vector that is along the direction of the vector h so I should be able to get z hat as h times x where x is a scalar so the question in the least square is such that I would like to be able to find the constant x such that the projection z hat is an orthogonal projection of z onto h that is a geometric point of view to do that I am going to consider the difference of the vector z minus z hat and that is this vector this vector is z minus z hat I would like my z minus z hat to be perpendicular to z hat but perpendicular to z hat is also equal to saying perpendicular to h therefore in here I am requiring my r which is the residual you remember the residual we talked about when we talked about least square solution z minus z hat is the residual or the error in the projection must be perpendicular or orthogonal to h z hat is the orthogonal projection of z onto h now I would like to relate another geometric fact it is well known that if I have a line and if I have point not on the line the shortest distance from the point to the line is the length of the perpendicular from the point to the line that is a very well known fact in basic geometry so you can think of the line to be my line h you can think of my point to be the tip of z I am trying to draw the perpendicular from z to h the length of the so if it is perpendicular the angle is 90 the angle between r and h is 90 therefore or when the angle is 90 is the shortest distance the residual is the shortest length so the shortest distance between a line and a point not on the line is the length of the perpendicular from the point to the line so referring to the figure z hat the point where the perpendicular line from the point z the tip of the vector z intersects the vector h therefore r gz minus zh is perpendicular to h so that is the simple geometric fact where the minimum of the residual essentially comes from the simple fact that the shortest distance between the line and a point I am sorry the distance between the line and the point is shortest when the line is perpendicular that is the geometric fact we are trying to use since z hat is a vector in the direction of h since z hat is a vector in the direction of h there is a scalar such that z hat is equal to h of x that is a very well known fact because if I give a direction any segment of the vector can be obtained by multiplying the direction by a constant so x is a scalar so by combining the fact that z minus hx is equal to zh which is perpendicular to h now this perpendicular condition implies is will be implied if the inner product of the two vectors are 0 so h is perpendicular to z minus h therefore h transpose z minus h must be equal to 0 and that naturally leads to the square solution so you can multiply both sides you get h transpose hx is equal to h transpose z or xls is equal to h transpose h inverse h transpose z h transpose z so z hat so this we have already seen to be h plus z where h plus is the generalized inverse now once I know xls I can get z hat to be h times xls h times this is xls I substitute that fact in here so I have now a matrix h times h transpose h inverse h transpose the matrix operates on z this matrix I am going to call it pfh this pfh is called the orthogonal projection matrix induced by h therefore the least square solution z hat is equal to h times xls is also equal to xls is equal to h plus h therefore h times h plus z and that is equal to phz therefore ph is equal to therefore ph is equal to h h plus and that is the orthogonal projection matrix that we are interested in so you can readily see I get the same formula that I obtained by minimizing ffx which is the square of the norm of the residual the same result that we did analytically by minimizing ffx is obtained by a very simple geometric fact which states that the shortest the perpendicular from a point not of the line to the line gives the shortest distance from the point to the line it is a very simple geometric fact we all learn when we are first introduced to geometry in high school so what is the generalization now let us consider h is in this case h is mn a matrix m rows and n columns so previous analysis was a special case when n is 1 now m is greater than n which is greater than equal to 1 is the generalization of that let z be a vector in rm now since h has n columns I am now going to consider a subspace spanned by the columns of h that is the subspace onto which I would like to be able to project my vector z that is the vector z if I project that vector z hat is given by this so z hat in this case is the projection it is still h times x but in this case x is a vector belonging to rn when n is 1 x was a scalar that is what we had we had we had gotten earlier again z hat – z is r there is a dual z hat – z and we would like r to be perpendicular orthogonal to the span of h we all know span of h is the linear combinations of the columns of h that means z – hfx must be perpendicular to every column in h so referring to the figure we are going to get r is equal to z – h must be perpendicular to the columns of h since z hat belongs to the span of h there exists a vector x such that z hat is equal to hfx we have already talked about that earlier so combining this we now know r which is equal to z – h is perpendicular to h same argument if h has only one column if we projected onto that vector if h has multiple columns we projected onto the span of h which is the set of all linear combinations of the columns of h therefore we would like to enforce this condition which is h transpose z – h must be equal to 0 that essentially gives you the normal equation which is h transpose hx equal to h transpose z this is called the normal equation the solution to this normal equation is given by x least square is equal to h transpose so the least square solution is given by this solution the expression given there so z hat is equal to hfx which is equal to so xls is equal to is given by the solution of this which is h transpose h inverse h transpose z that is the correct solution that is the least square solution that solution we have already seen in the previous module therefore z hat is equal to hfx which is again if I substitute xls by this I get this operator operating on h that operator is called pfh so pfh is given by this which is hh plus that is called the orthogonal projection matrix induced by the given matrix h and please remember h plus is the generalized inverse we have already seen so now we are seeing very many different types of matrices that come into play we have the matrix h we have the matrix h plus which is the generalized inverse we have the matrix ph which is equal to h h plus so this is projection that is generalized inverse that is the given matrix so all these 3 matrices are naturally associated with the notion of the squares now I am going to talk about the general properties and verify that it is going to be an orthogonal projection much of it is going to be left as a homework problem you already know ph sorry you already know we already know ph equal to h h plus that is equal to h h transpose h inverse h transpose that is the pfh now I would like to be asked you to verify this is idempotent so what does it mean piece if you multiply this matrix by itself is equal to the matrix itself so ph square is equal to ph that 4 is idempotent please verify this it can be also verified ph transpose is ph why ph transpose is ph let me quickly illustrate that ph transpose is equal to h h transpose h inverse h transpose transpose from matrices we already know the transpose of the product of the product of the transposes taken in the reverse order so this is equal to h transpose that is equal to h transpose h inverse transpose and h transpose now h h h transpose h is symmetric if h h that is a Gramian is a symmetric matrix h h transpose inverse is also symmetric there is a general theorem that says that the inverse of a symmetric matrix is also symmetric if the inverse of a symmetric matrix is symmetric its transpose is equal to itself therefore this is equal to h transpose h transpose h inverse I am sorry I made a mistake I would like to be able to correct myself once I have erased that is correct so that is the correct way of doing it so the the transpose of the product of the product of the inverse transpose has taken the reverse order so h transpose transpose of h this is the transpose of the inverse this is h transpose therefore I get the correct formula the correct formula is given by h transpose therefore it is symmetric that is verified so pH is idempotent and symmetric so by definition is an orthogonal projection from Rn to Rn which is the span of h so please look at this now em is greater than n is greater than equal to n to 1 so Rm is a larger space Rn is a subspace the span of h because h is a full rank it generates the subspace R of n you can also verify the determinant of pH is 0 and hence Pn is singular it is it is singular and hence Pn is singular that is the property we can easily verify and the determinant is so means it must be singular that is that is the way things are okay good now I would like you to be able to verify that particular property so I am going to now go to the least squares with the weight present so in the case of weighted least squares consider z is equal to hfx consider z is equal to hfx there is a weight matrix therefore I am going to be concerned with the residual z- hfx that is the residual vector my ffx is R transpose W of x R transpose W of R that is the weighted sum of square residual we have already seen the least square solution is given by that we have already seen the previous slides and previous lectures so z hat is h of zls so this is going to be the solution the square solution I am providing here summary so pHW is given by the matrix that is underlined so pH matrix is this that can be thought of at hh plus W hh plus W is the inverse in the weighted case generalized inverse in the weighted case that is the so called projection matrix and hh plus W is called the weighted generalized inverse so these are all simply a summary of all the quantities that we have considered so far again it is a very simple exercise to verify that this matrix is idempotent this matrix is idempotent this matrix is not symmetric and hence an idempotent matrix that is not symmetric has to be an oblique projection matrix that is the that is the general conclusion what does this mean when you do problems in 3D ward we always consider here weighted sum of squared errors therefore the solution to the 3D ward problems with weight matrix in those cases the weight matrix is simply the inverse of the covariance matrices observational covariance or background covariance so given the observational covariance matrix and the background covariance matrix so long as they are not identity matrices we are always dealing with weighted least squares within the context of 3D ward so almost 3D ward solutions are giving ways to the so called oblique projection only in the unweighted case do we have an orthogonal projection so that is the beautiful geometric view of things one has to remember I am now going to quickly illustrate by an example n is 1 that means there is only one unknown m is 2 that means h is a matrix so now you can you can you can readily see h is a vector which is equal to h1 h2 that is given by here z is a vector because m is 2 x is a real number I am now going to conjure up a simple matrix which is symmetric the weight matrix is always symmetric even though the weight matrix is symmetric the projection matrix resulting from the weight matrix is not symmetric that is something one needs to keep in mind so w1 w2 are two diagonal elements a is the off diagonal elements so h transpose wh if you do the multiplication you will get this quantity which is a real number now we have already seen the expression for phw in the previous slide I am now going to substitute all this and that takes this form that that takes this form so if do the simplification the projection matrix becomes that particular matrix which is a very each of the elements have two terms it is an addition of two terms now I am going to set a special case h1 is 1 h2 is equal to 0 that means h is equal to 0 1 in this case your your projection matrix becomes the one that is given here the the the projection z hat is given by phw z which is given by z1 plus a bar z2 a bar is a by w1 and this is something we have already seen in the very first opening example of an oblique projection so this so what is that we have we have shown if there is a weight matrix the resulting projection is not an orthogonal projection and that is the conclusion that we have why is this is not an orthogonal projection if you have this if this is z that is an orthogonal projection now if you get this so this is z1 this is going to be z1 plus 8 times z2 z1 plus 8 times z2 is not equal to z1 unless a is equal to 0 so if you assume a is not equal to 0 the angle here is 90 here is theta is not equal in general is not equal to 90 therefore when a is not equal to 0 the projection is an oblique projection that is the important thing that is the important thing that one has to keep in mind illustration continued so I am I have already talked about this so rx the error in the projection sorry rx is the error in the projection is given by z-z hat the actual error can be computed by this so this is this is the actual vector this is the projection the difference between the two is given by minus a bar 1 multiplied by z2 so if I consider r transpose h I get this and that is exact that is essentially given by the so I would like you to see this so this is the vector h this is the vector h that is the vector h so if you project that z hat is the projection the angle is theta that so by Schwarz inequality the inner product is equal to the norm of rx times the norm of h times cosine theta I can compute each of these quantities explicitly this is the inner product this is the norm that is the cosine of the theta so cosine of theta is equal to given by this ratio and that ratio essentially tells you the angle is not 90 degrees that is theta is greater than 90 and rx makes an obtuse angle with h see the illustration so when a is 0 cosine is 0 in which case theta is 90 the projection becomes orthogonal so this is a very simple graphical illustration using a 2 dimensional case where we can readily see you can have weights but for certain sets of weights the projection is orthogonal for certain types of weights the projection can be non orthogonal projection so that is the important conclusion you coming out of this exercise so in summary what is that we have accomplished in this small module we have seen the importance of least square solution within the context of data simulation within the context of solving inverse problems we are simply trying to embellish the character of the least square solution by relating the properties of the least square solution to a very simple geometric fact which we all have learnt in our first course in geometry that of orthogonal projection and oblique projections so what is the conclusion if you have a static deterministic inverse problem and you formulate it as a unweighted problem the least square solution is given by orthogonal projection if you formulate it as a weighted least square the solution is given by an oblique projection this essential difference between orthogonal oblique is essentially coming out of whether there is weight whether there is not weight recall the formula so I have couple of exercises recall the formula that we have already seen so I would like you to be able to plot the value of theta as a bar ranges in the interval minus 1 to plus 1 so this exercise essentially tells you how the angle theta varies with the choice of a bar and please recall from couple of slides earlier a bar depends on a so a essentially controls a bar and as a bar ranges within minus 1 to plus 1 theta sweeps through a particular range and I would like you to be able to plot this perhaps using Matlab and convince yourself what is the range of rotation angle theta 1 gets with a the second exercise relates to the expression for the weighted generalized inverse check if it satisfies so what is the idea here now any generalized inverse must be able to satisfy the Moore-Pendrow's condition in module relating to matrices so here is an exercise that I would like to revisit the Moore-Pendrow's condition that defines a generalized inverse and check to see whether this expression for the generalized inverse with the weight satisfies the Moore-Pendrow's inverse I think it will be a very worthwhile exercise to do with this we come to the end of the discussion relating to the geometric facts on geometric interpretations of least square solutions thank you.