 Okay so we are discussing the problem of best approximation I gave an example last time and stated a theorem let us look at a proof of the theorem. Let me write down the theorem once again so that we will keep it on this side for reference we have an inner product space and a finite dimensional subspace okay so let us take V be an inner product space and X is a fixed element let W be a subspace not an arbitrary subset be a subspace of V then we have the following I think I also need to assume that U belongs to W right let W be a subspace of V U be an element in W is that fine okay we are interested in knowing when this U is a best approximation to X from W when is U a best approximation to X from the subspace W okay the first assertion is that U is a best approximation to X from W there is an assertion sufficient condition if and only if X minus U is orthogonal to the subspace W second condition if a best approximation exists then it must be unique then it is unique third condition if W is finite dimensional and let us say I have an ordered orthonormal basis U 1, U 2, etc U n this is an orthonormal basis then the unique best approximation is given by this formula okay X is fixed this U is related to X by means of these coefficients so if this U 1, U 2, etc U n is an orthonormal basis of W then this is the formula for the unique best approximation to X from W we have already seen an example where this formula was used find the point on the plane which is closest to a vector 1, 2, 1 okay proof we need to show that this is an assertion sufficient condition a formula that could be used let us take let us prove this part first suppose that X minus U is perpendicular to W we must show that U is a best approximation okay remember that the best approximation in the definition this U must come from W okay so if you take any W in W it must follow that U minus W belongs to W W is a subspace and so this condition can be rewritten as X minus U U minus W this must be 0 for every W in W remember U is fixed just as how X is fixed U I know is a vector that satisfies this equation U is a fixed vector in W that satisfies this equation it is only small W that is a variable here so this is true for all W and W I need to show that U is a best approximation I have this I will use Pythagoras theorem consider X minus W the whole square this is U subtract and add U X minus U plus U minus W the whole square subtract and add now observe X minus U and U minus W are orthogonal because the previous equation and so by Pythagoras theorem norm X plus Y square is norm X square plus norm Y square X perpendicular to Y so this is norm X minus U square plus norm U minus W square but this is always greater than or equal to norm X minus U square because this is non-negative norm U minus W square is non-negative so this inequality holds and I can take the square root because this is non-negative this is non-negative so it follows that norm X minus U is less than or equal to norm X minus W this is true for all W and W and this is precisely what we mean by saying that U is a best approximation okay converse suppose U is a best approximation we must show that this is 0 so conversely okay conversely suppose that U is a best approximation to X from W we must show that U satisfies the condition that X minus U is perpendicular to W okay for any W in W and lambda in R see V is an inner product space so it might be a complex inner product space I am restricting my attention to scalars that are real for any W in W and lambda in R what follows is that remember that we have assumed that U is a best approximation so if you look at norm of X minus U that must be less than or equal to norm of X minus any vector that belongs to W in particular U minus lambda W this is true for all lambda sorry for all W in W and for all lambda in R okay then I square I will expand this using the inner product see what I get this is equal to inner product I can rewrite this as norm of X minus U plus lambda W the whole square and then do the usual expansion norm of X minus U square is one of the terms plus this will go with lambda bar but lambda is real so lambda square norm W square then there will be two terms which can be written as two times plus two times lambda is a scalar look at real part of X minus U with W this is what one would get after expanding the right hand side lambda is real so this is taken out so if you compare with the left hand side this X minus U the whole square can be cancelled so what I get is lambda square norm W square plus two lambda real part of X minus U W this is greater than or equal to 0 now this is 0 for all lambda element of R and for all W in W in particular for lambda positive this is true so I will divide take lambda positive divide by lambda to get lambda times norm W square plus two times real part of X minus U W this is greater than or equal to 0 for all lambda positive this time and for all W in W now what did I divide by lambda and made why did I make the assumption lambda positive in order to retain the inequality okay now take limit as lambda goes to 0 for instance take lambda to be 1 by n the sequence 1 by n then when you take the limit this is gone what it means is that sorry I left out a lambda yeah fine so what it means is real part X minus U W is greater than or equal to 0 so taking limit lambda goes to 0 we get real part of X minus U W this is greater than or equal to 0 for all see lambda was a variable I have taken that as a sequence converging to 0 W is only variable now for all W in W real part of X minus U W must be greater than or equal to 0 W is a subspace so I can replace small W by minus W to get real part of X minus U W to be less than or equal to 0 combine you get real part of X minus U W to be 0 for all W in W okay that is I have applied I have I have replaced W by minus W into this and that is a valid operation because capital W is a subspace so I get this equation let us consider two cases it is a real inner product space if V is a real inner product space then we have proved what we wanted X minus U is perpendicular to the subspace W if U is a best approximation then X minus U is perpendicular to the subspace W if V is a complex inner product space if V is a complex space this real part of this is 0 so let us say if let V be a complex space and let us take X minus U comma W to be alpha plus I beta then what we have proved just now is that then alpha is 0 real part of X minus U W 0 we have proved now this equation is true for all W so I will replace W by I W it is a complex space the scalar can be taken from the complex numbers replace W by I W since this is true for all W you replace W by I W then it will mean minus beta is 0 replacing I by I W it comes in the second argument so when it goes out it goes with an I bar I bar into I is a minus 1 or a plus 1 I bar is minus I so that is plus 1 so replacing I by I W we get beta is 0 real part of real part of X minus U I W is 0 but real part of X minus U I W is just beta so in any case this is 0 so once again just check these steps okay if necessary replace W by I W to conclude that X the imaginary part is also 0 so that is the first part necessary sufficient condition for you to be best approximation is that this condition holds okay yeah replace W by I W replace W by I W okay second part is uniqueness, uniqueness typically to prove uniqueness you will assume that there are two best approximations and show that they are the same okay. So let U and V be best approximations to X from W then from the first part X minus U U minus V see U and V are best approximations so by definition they both come from W so their difference is in W so this must be 0 X minus U U minus V is 0 similarly X minus V U minus V is 0 by the first part U and V belong to W we have made use of that I want to show U is equal to V consider norm U minus V square show that it is 0 norm U minus V square you want to show that this is 0 then follows U is equal to V you write it using the inner product U minus V U minus V this time keep the second term as it is modify the first one U minus X plus X minus V U minus V then that is U minus X U minus V plus X minus V U minus V both are 0 so U is equal to V if a best approximation exists then it must be unique. The last part gives you a formula in case W is finite dimensional so to prove the last part to prove the last part that this is the best approximation to X from W all you need to do is to show that X minus U is perpendicular to W then from the first part it follows that this is the unique best approximation okay. So just show X minus U is perpendicular to W that is what we will do but that has been done before so the proof is over you remember that in the very first theorem on best approximation this has been done this part was shown okay so let me just write for the third part refer to a previous result where we have shown that X minus U is perpendicular to W okay this is for best approximation from a subspace there is a related question which will seek best approximate solutions for linear equations best approximate solutions for linear systems of the form T X equal to B so let me pose the problem first best approximate solutions for linear systems this is a more important problem than the best approximation but using what we have developed just now this problem can be handled so what is the problem T is a linear map between inner product spaces V and W are inner product spaces I have a fixed vector B from the core domain my question is how to solve T X equal to B more an existential question does this equation have a solution this has a solution if and only if B belongs to range of T okay this system is consistent that is it has a solution if and only if B belongs to range of T but often B may not belong to range of T typically in statistical applications in experiments that one does in laboratory for example these equations will not be consistent but one still wants to solve these equations it is not consistent but can we at least minimize norm T X minus B that is the question okay I will give you examples in the real inner product space finite dimensional so I will confine my attention to the case when V and W are real Euclidean space so let us consider the case when V is R n and W is R n in which case this will reduce to a matrix equation this equation will be replaced by the matrix equation A X equal to B I want to solve the equation A X equal to B you can think of A as being the matrix of T relative to some basis think of A as being the matrix of T relative to some given basis for R n and R m then you know that this is consistent if and only if this is consistent range space of T is a range space of A so I can think of A as a matrix as well as a linear transformation okay this is the usual practice in the algebra the question is does this system have a solution I mentioned that there are situations where many situations where this does not have a solution in which case we would like to minimize the norm of A X minus B so we are looking at the following problem does there exist the problem is norm A X minus B I want this I want an X in R n such that norm A X minus B is less than or equal to norm let us say A Z minus B X must be fixed so this is fixed this is like U this is like U this is like X so let us try to reduce this problem to the problem of best approximation B plays the role of X in the previous instance A X plays a role of U I want to find an X such that norm A X minus B is less than or equal to norm A Z minus B for all Z in R n see A is a linear transformation from V into W so A is an m cross n matrix the domain of A is R n core domain is R m so A is an m cross n matrix so this Z must be from R n Z is a variable X is fixed B is fixed B is like X A X is like U this is like W norm U minus X less than or equal to norm U minus W sorry norm U minus X less than or equal to norm X minus W okay instead of X I have B instead of W I have A Z small W comes from capital W that is a subspace instead of capital W I have range of A is that clear so let me write then this is like U this is like X this is like a general W from capital W and so I will write capital W as range of A this B of course is X B is a fixed vector I want to minimize norm B is a fixed vector like X I want to minimize norm U minus X so among all vectors in the range of A I pick that particular vector U which is equal to A X the question boils to boils down to finding X I need to find an X and R n that satisfies this inequality this inequality the previous theorem says that this problem has a solution if and only if certain condition is satisfied that is X minus U must be orthogonal to W then there exists X in R n that satisfies okay let me give a name to this called a best approximate solution it is called a best approximate solution best approximation from the previous discussion this time X is called a best approximate solution it minimizes norm A X minus B if and only if X minus U perpendicular to W in the previous instance instead of X I have B instead of U I have A X that is in the range of A and W belongs to capital W capital W I have written that is the range of A so this B minus A X must be perpendicular to let me write A Z this must be 0 for all Z in R n this is the condition coming from the previous theorem X is a best approximate solution that is among all vectors among all vectors A Z replace Z by X then norm A X minus B is the least among all the numbers norm A Z minus B look at this condition once again I am in the situation when the space is R n so this is the usual inner product and so I can write this in terms of the formula what is the inner product between two vectors let us say W, Z okay let us say R, S R, S is summation R i S i if R and S R n R n I can write this as R transpose S where I always write when I write a vector standing alone it will be a column vector it will be a row vector any vector when I write in this situation will be a row vector so R is a row vector R transpose is a is that correct I go back any vector standing alone will be a column vector any vector standing alone will be a column vector so this R and S are column vectors R transpose is a row vector row into column that is precisely this okay so this inner product can be replaced by this formula I will use that here 0 equals B minus A X by the way R transpose S is also equal to S transpose R in the real case okay this is what I want really S transpose R transpose of this times this A Z transpose times B minus A X is this clear I am replacing the see this is a usual inner product I am replacing the usual inner product by the formula involving transposes then this reduces to this you expand this transpose satisfies see we have seen this operation of transpose earlier this satisfies a reverse order law so this is Z transpose A transpose B minus A X this must hold for all Z this must hold for all Z and R n which means you can again think of the inner product the inner product of Z with this vector is 0 for all Z the inner product so this is again a kind of a transpose Z transpose into some vector see this is a vector in R n okay please check this is a vector in R n A transpose of something see A of something is in R m A transpose of something is in R n so this is a vector in R n okay so compatibility there is no problem Z is in so you look at this vector this is a vector I am sorry this is a vector that is perpendicular to all vectors so the vector must be 0 that is A transpose into B minus A X must be 0 that is if X is a best approximate solution okay that is X is a X is a best approximate solution if and only if X satisfies this equation okay this intermediate variable U has been removed that is Z has been removed this is a necessary sufficient condition coming from the previous theorem for X to be a best approximate solution of the system A X equal to B if A X is equal to B if the system is consistent this is obviously satisfied okay but if it is not satisfied then this condition must hold for that X okay if X if A X minus B is not 0 then A X minus B must satisfy this but what is this equation you rewrite this as A transpose A X equals A transpose B A transpose A X equal to A transpose B does this system have a solution that is the problem X is a best approximate solution if and only if X satisfies this equation but is this system consistent we need to consider that question also is this system consistent does this system have a solution so given B I can calculate the right hand side I can calculate A transpose A A is known the question is this consistent is this system consistent okay do you have an answer can you rewrite reformulate this question see what did I say with regard to this equation let us say the same question can be asked here when is the system consistent if B belongs to range of A okay the question is look at the vector A transpose B does it belong to range of A transpose A does the vector A transpose B belong to the range of A transpose A that is the question what is the answer see the answer is not really straight forward so I do not expect you to give the answer the answer is S okay the answer is S and look at the full picture see let me go back the answer is S and look at the full picture what is the full picture A X equal to B in general does not have a solution there is no X that satisfies this equation but there is always an X that satisfies this equation okay so from an inconsistent system we have come to a consistent system you remember that we have gone from one system to another we cannot go back in general we cannot go back so there is no there is no contradiction here see from this we have come to this but you cannot in general go back and so these two systems are not the same okay these two systems are not the same if the systems are the same then there is a contradiction this is inconsistent whereas this is consistent okay just think it over but I want to show this is consistent always I want to show that this is system is consistent irrespective of what B is so the proof comes from this equation connecting subspace that takes a little effort to prove so I want to make the following claim look at range of A transpose I want to show that this is equal to range of A transpose A range of A transpose is equal to range of A transpose A we have not seen this equation before one of the inclusions is straight forward range of A transpose A contained in range of A transpose that is clear it is the other way which requires a little effort suppose I prove this okay then you can go back to this equation see that this is consistent right hand side vector belongs to range of A transpose the coefficient matrix here is A transpose A so A transpose B belongs to range of A transpose A and so the system is consistent okay so we will prove this we will prove this equation and then it follows that system is consistent which means that there is always a best approximate solution for linear equations okay given an equation A x equal to B if it is consistent there is no problem if it is not consistent we can always find the best approximate solution where the best approximation with respect to the usual norm that is when I want to minimize norm A x minus B I am looking at the Euclidean norm okay okay so I need to prove this I told you that one of the inclusions is obvious so let me record that obviously range of A transpose A is contained in range of A transpose by the way you also need to verify that this equation makes sense that is we are talking about two subspaces in the same vector space is that first clear see okay let us make a quick check of that A transpose A is from R n to R n A transpose A is a square matrix if you look if you look at A transpose it is from R m to R n A is from R n to R m A transpose is from R m to R n so if you look at range of A transpose A that is here in R n range of A transpose is in R n so first of all this equation is you know is you can hope for the validity it is well defined to talk about that okay this is true what I will show is that I must show that range of A transpose is contained range of A transpose A instead I will show that the dimensions coincide so I have two subspaces one contained in the other dimensions being the same the subspaces must be the same okay in order to prove the dimension is the same I will use rank nullity dimension theorem but before that I need the following to prove this claim I will prove another claim which is null space of A transpose A equals so can you make a guess of what this is range of A transpose A is called a range of A transpose null space of A only then you will have compatibility again you can check that this makes sense A transpose A is null space of that is in R n A is from R n to R m so null space of A is contained in R n both these are subspaces of R n this is easy to prove again one of them is clear which one if A x equal to 0 then A transpose A x equal to 0 so null space of A is contained in null space of A transpose A okay one inclusion is straight forward that is if A x equal to 0 then A transpose A x is 0 so if x belongs to null space of A then x belongs to null space of A transpose A I need to prove the other way around so consider let us take some vector different from x say P suppose P belongs to null space of A transpose A then A transpose AP is 0 this is the 0 vector 0 vector is perpendicular to any vector in particular to P so P transpose into A transpose AP this must be 0 P transpose into A transpose AP is 0 but this can be rewritten as AP transpose AP but that is the same as saying inner product of AP with itself is 0 we are in R n using the standard inner product that is AP is 0 so we started with A transpose AP is 0 we have proved AP is 0 so it follows that P belongs to null space of A so null space of A transpose A is equal to null space of A apply rank nullity dimension theorem and also apply the fact that the rank of a mat the row rank and the column rank are the same okay so let us apply rank nullity dimension theorem for A transpose A first rank nullity dimension theorem for A transpose A rank of A transpose A plus I will write it in full nullity A transpose A must be the dimension of the domain space that is n for A transpose rank of A transpose okay rank of A plus nullity of A equals dimension of the domain space I am applying it to A rank of A plus nullity of A is a dimension of the domain space what I have shown is that null space of A and null space of A transpose A are the same so I subtract one from the other these two get cancelled it means rank of A is equal to rank of A transpose A you cannot apply range immediately because range of A is an R M okay range of A is an R M but rank of A is same as rank of A transpose so that is what you need to apply then conclude that is rank of A is rank of A transpose this is the same as saying that the row rank is the same as the column rank so rank of A transpose is equal to rank of A transpose A now apply the fact that the dimension what is the rank of A transpose A it is the dimension of the range of A transpose A go back to this equation so range of A transpose A range of A transpose they are the same they are subspaces of the same vector space their ranks coincide and one is contained in the other so they must be equal. So range of A transpose is equal to range of A transpose A okay so it is with this little extra effort using the rank nullity dimension theorem one could show that the system A transpose A X equal to A transpose B is consistent you might have come across this equation in numerical analysis this is what is called as normal equation these equations are called normal equations in the problem of for instance interpolation okay so in the normal equations are always consistent so that is a complete answer then this gives a method in principle if you want a best approximate solution of this equation by the way we have to go back to the familiar least square solution because our norms are the Euclidean norms sums of squares or integrals of squares so it is a least square solution if the system A X equal to B is inconsistent then one seeks least square solutions least square solutions always exist least square solutions always exist that is the same as saying the system is consistent okay least square solutions always exist the questions whether the least square solutions are unique let me just give you an instance where the least square solutions unique and stop. So I have the following result suppose that the columns of A are linearly independent then the least squares solution of the system A X equal to B is unique. I will give 2 proofs one one uses a principle the other one will be useful in practice first proof one and looking at the equation A transpose A X equals A transpose B when the columns of A are linearly independent what is the null space of A when the columns of A are linearly independent what is the null space of A? Single term 0 null space of A we have shown to be equal to something else some other subspace what is that? Just now we prove null space of A transpose A the difference between A and A transpose A is if even if A is rectangle whatever B A A transpose A is square whatever B A A transpose A is square A transpose A is a square matrix whose null space consists of single term 0. So A transpose A as a linear transformation is injective but since it is square it must be surjective so it must be invertible. So A transpose A inverse exists just pre-multiply by A transpose A inverse to get X to be A transpose A inverse A transpose B this is unique because this is like this is a system where the coefficient matrix is invertible so the system must have a unique solution okay that unique solution is given by this formula. We will come back and look at this formula A transpose A inverse A transpose we will discuss an option of generalized inverse in place of this okay that will be done a little later. This is a proof which uses the principle that null space of A transpose A equal to null space of A in practice one could use the q-add composition. So my second proof is by using the q-add composition okay before I give the second proof let this be clear A X equal to B if it is consistent the question of finding a least square solution does not arise. So this question arises only if A X equal to B is inconsistent if A X equal to B is inconsistent and if the columns are independent there is a unique least square solution if not there are infinitely many least square solutions. In general there are infinitely many least square solutions in such a case which solution would you be satisfied with? Infinitely many least square solutions you should be satisfied with one solution which for which a suggestion is to look at a vector which has the least norm among all those vectors okay. This problem we will discuss later for instance if you have a even in the case of a rectangular system of consistent equations there are infinitely many solutions rectangular system consistent infinitely many solutions you should be satisfied with the one with the least norm the same rule applies for the inconsistent case also A X equal to B inconsistent but it has least square solution in general there are infinitely many least square solution there is a situation when the solution is unique. So if you go back to the problem when there are infinitely many least square solutions you should be interested in the minimum norm least norm of all those solutions the question of whether such a vector exists we will discuss that later okay just to give a little more enlarged picture I wanted to make this comment. So I will just go back to the second proof and tell you how the QR decomposition could be used. Second proof I have A X equal to B and this from this I get A transpose A X equals A transpose B okay that is the equation corresponding to the normal equations A is QR this is applicable because the columns of A are independent okay remember the framework in which we derived the QR decomposition the columns of A must be linearly independent that is the situation here. So this is possible A equal to QR with Q transpose Q being identity of order N R is invertible R is invertible and upper triangular so go back and substitute here A transpose A is okay I will write down this quickly R transpose Q transpose QR that is the left hand side equals R transpose Q transpose B Q transpose Q is identity R transpose R X equals R transpose Q transpose B R is square invertible so R is square invertible R transpose also invertible so this is the same as R X equals Q transpose B from this one can write X is equal to R inverse Q transpose B okay of course that is possible but then one would like to stop with this and then look at the structure of R and see what you must do to solve the system R is upper triangular R is upper triangular so the last equation has only one variable the last but one equation has two variables etc so we do what is called as a backward substitution find the variable X N from the last equation go to the previous one find X N minus 1 etc the first equation gives X 1 okay so this is solved by backward substitution okay so you this is something that could be applied in a numerical example whereas this gives a conceptual proof okay so let me stop.