 So, just a quick review of what we have done last time we derived the formula for least square estimation we only through geometric arguments. So, we said that the error between the predicted and the known or measured values predicted by the model and the known values of the dependent variable that error should be perpendicular to the subspace spanned by columns of matrix A. So, we had this model which was u is equal to A theta plus e where e is the error vector. So, we just made geometric arguments something that you know from your school is this error vector u minus A theta this should be perpendicular this should be orthogonal to the column space of. So, e should be in the least square estimation e is perpendicular to the column space what is column space of A span of columns of A column space is nothing but span of columns of A. So, this error should be orthogonal to this just using this simple argument that is argument of projection we arrived at the basic formula that was theta least square is equal to A transpose A inverse A transpose u. So, we could arrive at this fundamental result which earlier we had arrived through algebra through geometry. So, today what I want to do is extend this beyond just finite dimensional spaces also give you some more insights geometric insights in the finite dimensional spaces. Next class after this we will start applying it for different purposes. So, again let us again stick to the interpretation part of it as I said there is a third way of interpreting all these things that is through statistics unfortunately we do not have time to get into that but I am going to just upload my notes about statistical interpretation. So, those of you who are interested should go and look at the notes but that is extra reading and this is a post graduate course. So, you should do extra reading it is not that just you end the things end here what we actually managed to cover in this course is the tip of the iceberg. So, we derived this formula and I want to generalize this to some extent to any Hilbert space approximation in any Hilbert space before that let us look at some geometric insights. Now I just now define column space of A probably in some other course you also have looked about other spaces that are associated with the matrix what are the other spaces that are associated with the matrix null space what about row space what is the row space span of rows span of rows okay or span of columns of A transpose okay. So, column space of A transpose is nothing but the row space. So, we just need to know little bit about these four spaces. So, given a matrix A span of so span of columns of A is called as column space of A then span of row space of A row space of A is nothing but span of A transpose or set of all possible linear combinations of rows okay what is null space what is null space of A A x equal to 0. So, null space of A this corresponds to set of all x such that A x equal to 0 vector. So, set of all x such that A x equal to 0 vector this is called as null space of A and there is one more space associated with matrix that is called as left null space okay. So, left null space left null space is nothing but null space of A transpose null space of A transpose null space of A transpose is called as left null space of A okay. So, there are four fundamental subspaces associated with a matrix okay one of them is the column space column space is all possible linear combinations of columns of A row space span of row vectors null space set of all x such that A x equal to 0 and left null space that is set of all vectors y such that A transpose y is equal to 0 okay. So, this is four fundamental subspaces and what you are going to see or what you will realize here when I go to this okay we split a vector u this vector u was split into two components one component was the projection was the projection p. So, we are able to split a vector by this approach geometrically what we are doing we are splitting the vector into two components what are the two components one is in the column space this is in the column space because A theta is nothing but linear combination of columns of A right A theta is linear combination of columns of A. So, this vector has to belong to column space of A okay. So, if I just draw this picture again here if you have a two dimensional if you have a matrix with three cross two matrix with two linearly independent columns let us say this is the first column A1 and this is A2 and this is my u vector this diagram I had drawn earlier this is my u vector. So, this is the projection this is the p vector this is the projection vector which is lying in the column space of A okay then there is a orthogonal component what do you mean by orthogonal component orthogonal component lies in the null space of this matrix the orthogonal component lies in the null space does it lie in the null space or does it lie in the left null space just think about it where does it lie where does the orthogonal component lie does it lie in the null space or left null space. So, we are able to split a vector into two components in the column space and orthogonal to the column space okay. Will a vector belonging to null space be orthogonal to the column space? What is the property of a vector that is orthogonal to the any vector that is orthogonal to column space? Yeah, a times that vector okay will give me 0 vector okay. Of course e is not in the null space, e is in the left null space. Let us think about it, e is in the left null space, e is in the left null space and p is in the column space okay. So there is a fundamental theorem of linear algebra okay. I am just going to mention it here without getting into the proof you can refer to Strang's book or you can refer to any linear algebra book for this. The column space of a is often referred as r of a, the column space is often referred as r of a and the null space is often referred as n of a, n stands for null space a null space of a. So there is one fundamental theorem of linear algebra which says that I think you know this result, I am just going to state it again. Number of linearly independent columns okay is equal to number of linearly independent rows. In any matrix number of linearly independent columns is always equal to number of linearly independent rows which is equal to rank okay which is equal to rank. So this is the algebraic result which you know which is also a geometric result because linearly independent columns okay. It relates rank which is a algebraic quantity with a geometric property that linear independence okay. So what is the dimension of the column space? Dimension of column space is always equal to rank okay. So the fundamental theorem of linear algebra, dimension of row space of a, dimension of column space of a, dimension of column space of a is equal to rank of a okay. I am looking at a matrix a which is n cross m, I have a n cross m matrix. So dimension of column space of a is equal to rank of a, dimension of null space is equal to n minus rank of a, then dimension of row space of a transpose, sorry column space of a transpose that is nothing but row space is equal to rank of a, dimension of null space of a transpose m minus rank a. Number of columns is m, if I put n cross m then it should be m minus rank a and then this should be n minus rank okay. This is just the background, this is a well known result I am just stating it again okay. Now with this background I am going to prove some nice algebraic properties of projection that we have looked at okay. So now actually what we have achieved is a way of splitting a vector into two components. See we found out the least square estimate right, we found out the least square estimate and how did we get the least square estimate? How did we get the least square estimate? So theta least square this was a transpose a inverse a transpose u correct, this is how we got the least square estimate okay. So what is the projection? See we are talking of two things right, one is this projection onto the column space, other is the vector error which is perpendicular to the column space okay. So how do I project given a matrix a, how do I project it onto its column space okay. So the projection p is given by a times theta hat Ls right, the projection we found out the theta which projects now projection vector itself, what is the projection vector a times theta okay. So this is nothing but a transpose a inverse a transpose u correct a transpose a inverse a transpose u okay. If I call this matrix as projection pr matrix let me call this matrix as pr okay, where does this vector p lie, vector p lies in the column space of a. So given a matrix a how do I project any vector onto its column space, I take matrix a, I find out this matrix this is called projection matrix, this is called projection matrix, this projection matrix projects any vector u onto the column space of a okay see because look at this theta here see theta happens to be this but what is theta, theta is a least square estimate is some number. So a times theta is nothing but linear combination of columns of a, so p has to lie in the column space of a okay. So actually what we have done is we have found out a way given a matrix a which is non square which is actually tall matrix more it has more rows than the columns for such a tall matrix we know how to project any vector u onto the column space okay, this is the component along the column space. What is the component which is orthogonal to the column space, so that is e, so how do you get e, e is u minus a theta least square okay that is u minus a a transpose a inverse a transpose u which is i minus projection matrix what is this i minus projection matrix everyone with me on this i minus projection matrix okay you can check that you can check that p is orthogonal to e take inner product of p and e okay you will get 0 these two are orthogonal components. So now given a vector u we know how to split this vector into two components one in the column space other orthogonal to the column space okay this matrix a into a transpose a inverse a transpose is called as a projection matrix there is something very nice about the projection matrix is projection matrix symmetric is it symmetric matrix if you take transpose of this matrix what will you get I am not going to do it on the board just think about it there is one very nice property of this projection matrix I want to do check it and I want you to understand what it is meaning this is a very funny matrix it is square of itself yes this matrix this matrix has a very nice property p r square is equal to p r can you check this just multiply p r into p r you will get back p r mind you this is not an identity matrix this is not an identity matrix and you have to explain me geometrically what is meaning of p r into p r is equal to p r first of all are you able to get this is this coming out just multiply what is a into a transpose a inverse a transpose a a transpose a inverse a transpose right this a transpose a inverse a transpose a this will become identity you will get a a transpose a inverse a transpose this is not a this is not a matrix which is this is not identity matrix yet it has this very nice property it is square of itself what is the geometric meaning of this yeah yes not matrix vector u plane same vector precisely very very nice so if you project the projection again you will get the same vector right see here this my u vector is split into two components projection and orthogonal to the projection now this projection if I project again on to the plane what we what should I get the same vector because there is no the best approximation of this vector inside this plane is this vector itself this vector itself so the geometric interpretation of projection square okay you can just go on you can just say projection cube is also equal to very very very very nice matrix okay vice versa any matrix that has this property is called as projection matrix that is square of itself is a projection matrix okay so this this is a this is a nice property of projection matrix you can go on multiplying it with itself and you will get back the same matrix what about transpose is a transpose of this matrix symmetric matrix okay here transpose it is a symmetric matrix okay the symmetric matrix it is square of itself okay what will be projection matrix if your u is perpendicular just without computing you should tell me what should be the projection matrix if u is perpendicular projection matrix should be 0 because the best approximation best approximation of u which is perpendicular to this plane is the 0 vector okay so projections these square approximations associate this geometric ideas okay then you will remember it better than just what happens if u is already inside we talked about it if u is already inside you will you will get back the same same vector so same matrix so it is projection matrix is and what happens but it is square if a matrix is square matrix then what is the projection matrix if a is a square invertible matrix if a is square invertible what will be the projection matrix a is square invertible so what is a transpose a inverse it is a into a inverse which is i into a transpose inverse into a transpose which is i identity matrix okay so if a is full rank and square okay projection matrix will turn out to be identity matrix there will be no error you are already in the full space okay so now I am going to generalize this to any Hilbert space this is very nice result and then we need to you know to extract it is full potential we should not just restrict ourselves to n dimensions till now I have been looking only at n dimensions why not go ahead and look at you know any Hilbert space any infinite dimensional space in fact what you will see is that soon what will pop out is Fourier series in not just the Fourier series but generalized Fourier series okay let us look at this very quickly because this is something which probably Fourier series haunts you from your second year of engineering and then this will probably explain you the basis to how it works why Fourier series why we look at orthogonal functions and so on okay so now what I want to do is to generalize this result into any Hilbert space so what we are going to look at now is see basically when you will study all these methods numerical methods for solving partial differential equations boundary value problems algebraic equations whatever whatever I wanted to remember all this geometric insights all this you know it is not it is not just the procedure it is these foundations which help you to modify to tweak to twist for a particular you know application because if you want to if you want to really concoct a solution for a given problem you cannot many times just rely on the standard tools you have to come up with some way of you know concocting the new solution procedure that is where you should understand all these fundamentals okay should know where it is arising from how whole thing was derived and then you can just derive another approach if you if you need to so that is critical okay so just like a let us look at what we have done till now what we have said that I have a way of projecting a vector okay so we looked at the three dimensions we looked at a vector which is outside the plane you know and then suppose this is the plane no I have a way of projecting this vector onto this plane and a vector component component which is perpendicular to this plane okay in general I said well why just a two dimensional plane if I have a vector in Rn and I have a subspace in Rn okay I have an m dimensional subspace I know how to find a component lying in the subspace and orthogonal to the subspace right we looked at m dimensions and n dimensional vectors being projected onto m dimensional subspace and a component which is orthogonal component which is orthogonal so we were able to split in in finite dimensional vector spaces we were able to extend our school geometry that you know the shortest distance of a point from a subspace or from a plane is drop of perpendicular I want to do the same thing in any Hilbert space why can I do this in Hilbert space I have in a product with me since I have in a product with me I have you know I have orthogonality I can talk about projections I can just talk about you know two vectors being orthogonal to each other okay so I can just talk about now you know taking a subspace in a Hilbert space okay or taking a subspace in a product space any space on which in a product is defined and then in that subspace okay and given a vector which is outside the space I can project this vector onto the subspace right I can find a component of the vector along the you know in the subspace and component perpendicular subspace okay so what was the trick the trick was that the component which is error okay our approximation error is orthogonal to the subspace okay so I find the error vector find it is orthogonal and you know that will help me to come up with approximations see for example just to motivate you know let us say I have this function ft a plus bt where a and b are some known values okay and then I want to approximate this function using I want to construct an approximation let us say g of t which is alpha sin t plus beta cos t is this a vector it is a vector let us say this is a vector in where t belongs to 0 to 2 pi is this a vector this function is a vector this is a continuous function over 0 to 2 pi I want to construct an approximation alpha sin t plus beta cos t for this vector so what do I mean by that the error what is the error function what is the error vector error vector that we call this et will be ft minus gt ft minus gt okay what is the subspace here what is the subspace that these 2 vectors span it is a 2 dimensional subspace right 2 dimensional subspace offset of continuous functions between 0 to 2 pi why 2 dimensional there are 2 linearly independent vectors sin t cos t okay there are 2 linearly independent vectors sin t cos t now just we extend our ideas of projection what should happen this error function this error function should be orthogonal to orthogonal to subspace which subspace sin by sin t and cos t 2 dimensional subspace sin which is span by all possible linear combinations of sin t and cos t so this should be perpendicular to span of sin t cos t this error this approximation error between the function original function and approximation okay see this a and b are known constant let us take as you know 1 plus 5 t let us take specific a and b are not known we have to find out a and b this square approximation okay I want to get a this square approximation so I want to get a this square approximation of this I am just taking very simple function in general you may have very complex function which you need to approximate for example you may have data which is actually coming from you know daily temperature variation which is cyclic right daily temperature variation is cyclic and you could approximate using sin and cos functions you could develop very nice approximations so I want to develop the approximation alpha and beta which is which what is the advantage of let us say I have data for temperature variation throughout the day throughout the day what is the approach what is the advantage of fitting see I what I mean is you know I collect temperature at every 5 minute and I plot it see let us say it is you know every 5 minutes I have plotted you have data for 5 minute for 24 hours how many data points 12 per hour into 24 so many data points if I have to save this data for years let us imagine how much data I have to save okay as against if I am able to fit sin and cos okay so I fit some sin and cos which is approximation which does not pass through every point okay nevertheless it sort of gives me a best fit least square fit okay for today suppose I have to save data for today for today's data okay I could find alpha and beta for today's data I have to save only two numbers you must have heard about data compression right you heard about data compression this is a data compression algorithm data for 5 minutes for temperature for the entire day can be compressed and written in terms of only four numbers or two numbers if you are not happy with sin t cos t add two more you know sin 2t cos 2t okay and then you will be able to get an approximation of temperature versus time and instead of saving all this 12 into 24 values for one day I could just save two approximation coefficients all that I have to know that these two approximation coefficient correspond to sin and cos I multiply them I can get a very good approximation of what happened on that day right it may not be accurate to the extent that it may not be passing through every point does not matter but I can get a trend I can get a trend so this is very very important very very important okay so let us leave this aside for a while let us look at this problem we want to get geometric insights and we want to extend them to okay so now what should happen is this error should be perpendicular to span of these two vectors now in general my model could be of this form okay let me write a general model my general model is ut is equal to this is my general model okay I will just make one small change here just to be consistent with our notation let us call this p because this is the projection this is the projection let us call it p okay and then here let me call this as ut okay now let us come back here so in general my ut okay is some linear combination of these known vectors a1t a2t a3t amt are known vectors just like sin t cos t okay you want to project this onto some known subspace okay you know the you know its basis vectors okay and linear combination of these basis vectors will define that subspace what is all possible linear combination of this a1 to am if they are linearly independent it will be m dimensional subspace right it will be m dimensional subspace now the question is if I give you u if I give you ut okay can you find out alpha 1 alpha 2 which are v square how do you find out alpha 1 alpha 2 which is v square so this part this part here is the projection this part here is the projection onto onto the subspace defined by span of a1 to am right so I want to find out projection project ut on okay okay now just for the sake of convenience of notation I am going to drop this tt tt I am just going to work with up a1 a2 so t is there I am just going to drop it for the sake of convenience otherwise the writing will become cluttered what I am saying here and what I told you here is not different just remember this this is a two dimensional example okay here so this is my a1 this is my a1t and this is my a2t this is my a1t this is my a2t right the subspace which is defined by linear combination of these two vectors is two dimensional in general you may have m such vectors and you are talking about distance of a point from the subspace same problem okay visualization will not be different from what you know in three dimensions but I am extending it to any m dimensional subspace of an infinite dimensional space okay so the projection theorem I am not the statement is formal statement is given here the projection theorem says that you know u-p that is equal to error is orthogonal the statement of the projection theorem classical projection theorem in Hilbert spaces is that error is orthogonal to the error the point which is at the least distance from the sub the point in the subspace which is at least distance from a point outside can be reached okay by dropping a perpendicular which you know from three dimensions same result holds here that error is perpendicular to the subspace okay that is the statement of this theorem formal statement is given here I am just compressing and writing here that this error vector is orthogonal to the span so this is my let us call this subspace s okay is orthogonal to the span now how am I going to use this to compute alpha 1 to alpha m okay so what I am going to do is very very simple okay I am going to start I know that this is orthogonal to a1 to am okay which means this error is orthogonal to a1 is also orthogonal to a2 is also orthogonal to a3 so one by one I am going to write these equations okay so I am going to say my first equation is that in a product see now I am just generalizing this result I am saying in a product of u- u-p into ai is equal to 0 u-p is this okay what is the meaning of this statement u-p this error vector is orthogonal for i is equal to 1 2 up to m how many equations I need how many unknowns are there there are m unknowns alpha 1 alpha 2 alpha 3 up to alpha m there are m unknowns okay I need to write m equations okay here are my m equations I am just going to use them okay so just look at this well I will just rearrange this and write it I will say that this means that in the product of uai uai is equal to these two are same equations right I am just writing them I am just expanding and writing this equation so I am going to use this now to create so what is this okay so this equation first equation is that you know alpha 1 a1 plus up to alpha m am in a product a1 is equal to u in a product a1 right so this equation will actually be can be written as alpha 1 a1 in a product a1 plus alpha 2 a2 in a product a1 alpha m am in a product a1 is equal to u in a product a1 right I am just expanding this equation I am just expanding this equation what is my second equation so second equation if I just write like this and expand what will I get I will get alpha 1 a2 in a product a1 in a product a2 plus alpha 2 a2 in a product a2 alpha m am in a product a2 is equal to u in a product a2 okay how many such equations I have I have m equations okay so alpha 1 a1 in a product am up to finally up to alpha m am in a product am is equal to u am is everyone with me on this how many equations you have got m equations in m unknowns okay do I know these vectors a1 a2 a3m I know these vectors I have chosen them sin cos okay sin 2 cos 2 and so on so I know these vectors so if I if I if I know these vectors okay this in a products can be computed okay whatever integral over 0 to 2 pi we can compute them we can compute them and then this equation can be solved this equation can be solved because you have m equations in m unknowns is it a linear equation it is a linear equation it is like matrix times vector is equal to some vector u is known to you a1 to am are known to you so right hand side is known to you all this coefficients are in a products are known to you alpha 1 to alpha m is not known okay now now just see what is the advantage of choosing these vectors to be orthogonal suppose a1 a2 a3 am or orthogonal what will be this in a product okay which in a product will not be 0 a11 a2 to a33 a44 yes if if you choose orthogonal vectors okay the solution is particularly very easy then you will get first equation will be alpha 1 times this is equal to actually what we have derived is nothing but generalized Fourier expansion I will just come to this in my next lecture very briefly this is if you if you take a1 to am to be orthogonal what you get are nothing but Fourier coefficients this is called as generalized Fourier expansion and then we will get Fourier coefficient Fourier expansion need not be only with sin cos tomorrow in some function you know you want to approximate some function over some domain say 0 to 1 using shifted general polynomials why shifted general polynomials shifted general polynomials are orthogonal polynomials okay and then this approximation will be very easy so we will continue on this particular equation is called as normal equation and this is like cornerstone of optimization projections and helps us in discretizing and so on okay so actually if you go back and look little bit carefully this is not different from A transpose A inverse or A transpose A into theta is equal to A transpose U because inner product of columns A transpose A if you take columns of A inner product of the same equation written in a generic form for any Hilbert space same equation not different okay so let us stop here and then let us continue with its interpretations in our next lecture.