 So, this you will agree that if I just square it here it does not make a difference it is a positive number wherever that positive number is minimized its square is also minimized therein. So, let us just take this as a minimization of this, but this we know as a certain form it is representable as an inner product right. So, what we are trying to do is minimize over x hat belonging to R n the inner product of b minus a x hat with itself right which essentially means minimize b minus a x hat the whole transpose because we are dealing in the space of real vector spaces the field is real and this is b minus a x hat over x hat in R n which is tantamount to minimization of x hat belonging to R n what is this whole object now if you take the transposition of course the order gets flipped. So, what comes out its b transposed b minus b transposed a x hat minus x hat transposed a transposed b plus x hat transposed a transposed a x hat now what are each of these these are all scalars real numbers I put it to you that there is no difference between this object and this object would you agree because at the end of the day these are all scalars. So, if you take the scalars transpose it is the scalar itself real number right. So, if you take the transpose of this fellow you get this. So, there is absolutely no difference between these two objects. So, I can just write this as instead minimization over x hat R n. So, this is the ultimate statement b transposed b minus twice x hat transposed or yeah let me just take the first term itself yeah twice b transposed a x hat plus x hat transposed a transposed a x hat. So, now it is all about our ability to successfully evaluate the gradients of these objects and their Hessians and to convince ourselves that whatever we will arrive at will indeed be a minima right. So, I am going to erase this now. So, of course this is just a scalar constant no matter what quantity among the n couples of x hat you take its partial with this is going to be 0, but what about this let us try and investigate what this object looks like yeah. So, b transposed. So, I am going to take a closer look at this b transposed a x hat is going to look like b transposed a 1 a 2 till a n x hat right that is going to be equal to b transposed a 1 b transposed a 2 till b transposed a n x hat which is nothing, but x hat 1. So, I hope you understand what this is this is the first of the n tuples of x hat the first entry of x hat. So, this is b transposed a 1 plus x 2 hat b transposed a 2 plus dot dot dot till x n hat b transposed a n. So, this is a scalar. So, it should not be too difficult to find out the gradient of this object now right. So, I am going to just take the gradient with respect to x hat of this whole object which is nothing, but the gradient of this object which is nothing, but the gradient of this object which is nothing, but the gradient of this object. And exactly what is it going to turn out to be if I take the partial of this object with respect to x 1 hat it is just going to be this term a scalar. So, that is the first of the n tuples yeah. So, this gradient if I write it as a column vector is going to turn out to be something like this where the first term is b transposed a 1, but I am going to do something here is b transposed a 1 any different from a 1 transposed b no right. So, I am going to just change it around a bit for very good reason for ease of representation. So, this is going to be similarly a 2 transposed b till a n transposed b. So, is there a better way of writing this I wonder can I not write this as a 1 transposed that is a rho a 2 transposed that is also a rho till a n transposed that is also a rho times b agreed not also about this which is nothing, but a transposed b. So, far so good right now we have found out we figured that this when you take its gradient is 0 this object if you take its gradient of course that is only half of that because there is a 2 sitting here. So, we have seen what the gradient of this object turns out to be I hope I can it is the first few lines here and just retain this line in pink here because now we are going to not entirely differentiate or take the gradient of this last term, but going to argue by what argue by the chain rule what is the chain rule this is going to be a quadratic in the excess right. So, first indulge me a bit and say that we are going to take this part as a constant and take the gradient for this and then. So, if you have for example, x squared d dx you can do it in one shot and say it is 2x, but you can also say it is x times the d dx of x plus d dx of x times x which is just x times 1 plus 1 times x which is 2x. So, I am going to use this argument why I am going to use this because I already know how to deal with this object courtesy of this, but I have to repeat it twice because once I am going to keep treat this as a constant and take the gradient with respect to this treating this as a variable the next time around I am going to treat this part as a constant and treat this as my variable, but again you see this is a symmetric thing. So, whatever I get as a gradient by treating this as a constant and this as a variable it only needs to be doubled, but what is the form you already identify with this with the form right this takes the role of B transposed the A transpose A take the role of A. So, would it be too much of a stretch if I were to just write in one shot please ask if there is any doubt. So, I am going to just write this here sorry x hat transposed A transposed A x hat because I have already done that work I do not want to repeat that this is very similar is it not. So, I am just going to say this is nothing, but 2 times what is this going to be see if you take the transposed of this fellow this is now taking of the role of the A earlier, but this fellow is anyway symmetric whether you take its transpose or not does not matter this is still going to be A transposed A and this x hat transposes the role played by B transposed earlier. So, this is nothing, but A transposed A x hat please take a breather here and see if you agree with this see if this is obvious if you agree that this is ok then we will proceed this is alright. So, I am now going to take the gradient of this object here yeah. So, I am going to take the gradient of this object here I am going to apply the chain rule first I am going to treat this as constant and take the gradient treating this as the variable then I am going to treat this as a constant and take the gradient treating this as a variable, but both are one and the same. So, if I do for one of those cases I can just double it, but for one of those cases I already know what it looks like because look at the uncanny similarity between this this term is constant this is the variable if I treat this term as a constant and this is the variable the forms are exactly the same the role of A here is taken up by A transposed A here the role of B transposed here is taken up by x hat transposed. So, if I know the result in terms of this B transposed and A I am just going to supplant it here. So, this is the role of A here A transposed here, but because this is symmetric this A transposed A if you take the transposed of A transposed A it is also A transposed A. So, whether I take this transposed or not in this case it matters not and this is the role played by B transposed which comes here as B the role played by x hat transposed here which comes as x hat. So, at the end of the day if I am saying that the gradient of this objective function that is let us get back to our A hat x hat minus B yeah this is going to be squared this is going to be equal to remember I was missing a 2 out there. So, that 2 has to come into the minus sign. So, it is minus 2 A transposed B plus 2 A transposed A x hat right any doubts about this. So, what must this be equated to 0 that is the first order condition. So, I need this to be equal to 0, but we still have a few questions to answer down the way because we do not yet know if certain inversions will be possible or not we have to verify that right. So, what we are aiming for we can get rid of the 2 and we have A transposed A x hat is equal to A transposed B right. So, now I am going to make a claim I hope I can erase this part right. So, A transposed A you agree is a square matrix yeah A is what sizes m cross n. So, A transposed is n cross m A is m cross n. So, n cross m m cross n that is n cross n yeah square matrix. So, it makes sense to ask about the invertibility of this without talking about left or right inverse talking about the inverse of A transposed A. Why am I interested because if it is possible then I ready made have an answer to this what is x hat it is A transposed A whole inverse times A transposed times B agreed ok. So, we have seen that the question of invertibility essentially boils down to checking if the kernel of the matrix is trivial yeah. Claim A transposed A is invertible right. So, if not here is the proof if not there exists V not equal to 0 such that A transposed A V is equal to 0 what is the size of V well of course, n cross 1. It has a non-trivial kernel if it does not have an inverse it means its columns are linearly dependent. Therefore, there must be a non-trivial combination of the columns of A transposed A which vanishes. But this also means whether you hit 0 vector with a non-zero vector from the left or not it still remains a 0 what does this remind you of what sort of an object is that ok. But we know of something even better right we know of this as the inner product of A V with itself is it not which in turn is nothing but the norm of A V being equal to 0 which means if something inside a norm is 0 that A V is 0 what is it possible remember our assumption what assumption did we impose on A full column rank. So, if A is full column rank then at least as it is columns are linearly independent but now you are claiming if V is non-zero that there is a non-trivial combination of the columns of A which vanishes clearly absurd right. I hope it is clear right. So, I have not written clearly, but it is absurd you agree. So, if this is absurd then what happens it also implies that A transpose A has a trivial kernel only and therefore A transpose A is invertible. So, this line of reasoning this approach then works then we can go ahead and say once I have assured myself of the invertibility of A transpose A I can just go ahead and say this is A transpose A whole inverse times A transpose B right which also means that B hat is now what it is A x hat which is in turn equal to A A transpose A inverse A transposed what do you think is the projection matrix in all of this it is staring us in the face what does the projection matrix do it takes a fellow in a vector space maps it to its subspace such that it is the best approximation in the subspace yeah. So, what is that arbitrary vector which we plucked out and wanted to fit within the subspace the arbitrary vector was B in R m whose best approximation we wanted to find in the column span which has a dimension n. So, an m dimension a vector in an m dimensional vector space had to be approximated in an n dimensional subspace of itself right. So, B hat is exactly that by our discussion on all of this we have not of course shown the positive definiteness of the Hessian that still remains to be seen. So, let us also get that out of the way before we proceed further with this, but you already see that this matrix has the makings of a projection perhaps you of course, still need to verify those properties that we have imposed on orthogonal projection matrices, but we will check I mean those are orthogonal projection maps matrices just happen to be a special case already the linearity is quite clear I mean at the end of the day with all those products and all what do you end up with is a matrix and a matrix is known to be linear. So, the linearity is anyway gone, but we want to first address the issue of the Hessian. So, what do we need to prove what do you think is this is the gradient right. Now, if you want to take another differentiation what happens, what do you think will happen it is going to be a matrix right and what are we going to be left with. So, if I take of this ax hat minus b this norm square what do you think is going to happen any guesses we have already done one differentiation then it should be easy to do the second differentiation of course, except for the fact that this is now a multivariable function. So, you started with a scalar took the first differentiation it ended up as a vector take the second differentiation it ends up as a what as a Jacobian yeah. So, what happens to this of course, this term even without glancing through this it is a constant vector does not matter what you take its derivative with respect to what is the way to go about it in general if you have a vector and you want to take its differentiation with respect to another vector what do you do you take each entry in the vector I mean those of you have done a course on non-linear control theory or whatever will encounter system like x 1 dot x 2 dot till x n dot this is f 1 x 1 x n this is f 2 of x 1 x 2 till x n and this is f n x 1 x 2 x n. So, a popular way of ascertaining whether such a system of equations represents a stable dynamics around some equilibrium is to linearize it and how do you linearize it you linearize it by taking its partial. So, what do we do we take the partial of f 1 with respect to x 1 that is the first entry f 1 with respect to x 2 that is the second entry yeah. So, when we linearize this we take f 1 x 1 f 1 x 2 so on that is the first row of the Jacobian likewise you cook up all the other rows. So, this is exactly that right. So, it is a function of vectors being differentiated with respect to the vector of the x's that is what you did here here also you are going to do the same thing this you can treat like the function of course, the only difference is it is linear it is even simpler right and now you take its derivative with respect to each individual entry and you check that this comes out to be nothing, but a transposed a I mean intuitively it might be very obvious it is looks very similar to a scalar differentiation in that if you have something times x it just sheds the x, but what is going on at the heart of it is essentially that sort of an operation right. So, I want you to just write it up in some detail yeah you can replace this a transpose there with any p q r any matrix r x. So, if you if you want to simplify the notation you just take r x hat and you take its you know this is a vector. So, you differentiate this is take its Jacobian with respect to x hat yeah that is all that this is going to be and you will be left with only r probably r transposed yeah probably r transposed, but then in this case the transposition matters not. So, no it will probably no it will probably be r anyway we will have to check, but anyway in this case it matters not because this is symmetric I mean the 2 really again for purpose of our checking we want to ascertain that this object is positive definite and 2 is just a positive number yeah there will be a 2. So, now we have to check for the positive definiteness of this. So, we are asking is this object positive definite. So, before that we have to understand what is this notion of positive definiteness yeah when do we say that a matrix is positive definite. So, let us just define that. So, this part is clear we have actually gone ahead and pushed our luck a bit and predicted that this is going to be a projection matrix we are yet to show that alright. So, here is the definition matrix P is equal to P transposed n r n cross n is said to be positive definite if for all V not equal to 0 V transposed P V which is a scalar is going to be strictly positive that is the definition of a positive definite matrix there are several equivalent conditions to check for a positive definite matrix one of which entails eigenvalues and eigenvectors eigenvalues in fact something we have not yet covered, but we will in some time, but for now this is just what the definition is. So, if you are given a symmetric matrix you just have to try it with different different vectors in this r n and check that this object if this turns out to be positive of course, if you choose this to be 0 it is going to be 0. So, that is out of the question we will only concentrate on non-zero vectors yes you have a question yes yes exactly alright. So, for non-symmetric you can sort of massage it into a symmetric matrix by talking about a plus a transposed half thereof and take a quad it is all about the quadratic form that comes out as a result you see. So, if you can come up with a if you have a non-symmetric matrix to start with and if you can come up with an equivalent symmetric matrix that also does the job for you. So, for example, if you have x 1 squared plus 2 x 1 x 2 plus x 2 squared you can just go ahead and write this as x 1 x 2 that is x transpose for you and then this is 1 1 yeah 1 1. So, I am not claiming this is positive definite, but this definitely gives you a quadratic form. In fact, this is not positive definite can you tell me why because if you just choose x 1 and x 2 to be negatives of one another opposites inverses of one another additive inverses it vanishes. So, even for a vector which is not 0 this vanishes and you can also look at this matrix that is also another equivalent condition this has to be a full rank in case of a positive definite matrix, but this has a non-trivial kernel and that kernel is this exactly what is spoiling the party. So, this is something that we call as positive semi definite because it is non-negative for sure this quadratic form can never be negative because it is x 1 plus x 2 whole squared. Now, this same object someone can write as let me see 1 2 sorry 0 1 I hope this works yeah, but this is not symmetric. So, the trick is if you want the essential check is for quadratic function for quadratic expression if you are given that quadratic expression in terms of a matrix that is not symmetric just go ahead and convert it to a symmetric corresponding matrix and then it is easy to check whether the quadratic form is positive definite or not based on whether the matrix sitting in the middle is positive definite or not and there are several checks for that. One of the checks is of course, the first condition is for a symmetric matrix all eigenvalues we have not gone there, but I am just predicting ahead all eigenvalues are going to be real and therefore, if you have all your eigenvalues to be positive real then it is going to be a positive definite matrix. We will do all those proofs in some time once we are through with the idea of eigenvalues and so on. So, the idea is that maybe checking for a quadratic like this may not be so straightforward, but there are equivalent conditions for checking for the positive definite of the matrix. Now, if you can ascertain that the matrix is positive definite then of course, a quadratic form represented like so through that matrix is always going to be positive definite. So, check the positive definite in a software function like this through the verification of the positive definiteness of a matrix ok. Here of course, we are only interested in whether this matrix is positive definite or not right. So, with this definition in mind let us say suppose not what happens look at any consider any actually we do not need to consider suppose not, but ok that is just a good way to start any argument. So, we have not talked about sizes ok let us take n double yeah a transpose a is obviously n squared. So, it is n cross n ok then v transposed a transposed a v is equal to what we have already seen this just a while back when we proved invertibility. This is nothing, but the square of the norm of a v and can this ever be 0 for any nonzero v. So, of course, I must restrict my v to be nonzero because we are only going to check over nonzero vectors yeah. So, what about this is this certain to be positive yeah. So, we have a contradiction we assumed it is not of course, I mean I could I could have just omitted that step and just 2 3 lines, but because I have written that step now the onus is on me to also completed by saying oh it is a contradiction. So, therefore, a transposed a the Hessian turns out to be positive definite and therefore, the solution that we have obtained is indeed a minimizer of the problem at hand right ok yes no a transpose is not equal to a, but this is just the inner product form right a v is inner product with itself like we wrote just a while back. So, that is nothing, but the square of the norm of a v and a v cannot vanish because a is full column rank. So, a has a trivial right null space or kernel. So, therefore, this for any nonzero v cannot vanish and because this is norm it is positive square of the norm is of course, going to be positive. So, therefore, this conclusion right. So, now since we have assured ourselves that we have a solution to the problem we started with let us look at b hat given by a a transposed a inverse which is guaranteed to exist a transposed b. So, linearity I put it to you is checked linearity of what exactly linearity of this object because this is now what I am going to claim as a p the projection matrix right linearity is verified what do I need idempotence just go ahead and check. So, a a transposed a inverse a transposed times a a transposed a inverse a transposed times and merge these terms together whether you pulverize it with this or this matters not this is a transposed a some q this q inverse. So, what will be left with is equal to a a transposed a inverse a. So, idempotence is also verified yeah sorry a transposed right you all right what else do we need to check that the image of this must be equal to what where it is approximating to the column span of a. So, column span of a is equal to image of this p that I have defined there all right. So, we will see that in the upcoming module.