 So, broadly speaking, the problem is something like this. So, you have an unknown system to which you have provided inputs, right inputs denoted in this sort of fashion, n inputs here, next time you provide again another set of n inputs and you have got some output and what you would like to do is know what the system is like and this is the sort of problem that that results from this kind of formulae, ok. Now, it is important here if when you go about solving this problem, it is important to know that A, the matrix A is full column rank and in particular in many one way that can be ensured is that you have in if you are talking of a system and taking measure and you have provided m different inputs means you are taking measurements m times of the system that means the number of measurements is greater than n which is the length of the input or the number of parameters that you are trying to fit, ok. In that case, this can be solved very easily and has a non-trivial solution, ok. Now, how is this related to what we have been talking about the reason I brought this up is if you look at this optimization problem which is what the entire exercise of regression brought it brought us down to is this optimization problem. This what kind of optimization problem is this? This problem has a twice differentiable objective, right and it is being optimized over the entire space of R n. Now, the full space of R n is obviously an open set. So, this is an optimization of a twice differentiable function over an open set. Now, because of the nature of the function here, it is just a simple quadratic function this also admits a closed form solution, ok using the pseudo inverse of a but that is because of the choice of the norm that you have chosen. But in general, if you take some other type of loss function, you will not necessarily have a closed form solution. So, an example of this is one example of a optimization over an open set as an unconstrained optimization. Let me do another example of a similar flavor. In this case, we actually here the problem is slightly different here the problem is that you actually have a model for the noise. So, suppose you have we have you have taken m measurements and you have gotten these b the vector of measurements like this b equal b 1 to b m these are m measurements that you have taken of a system. And we we know that b should be equal to a x plus some epsilon where epsilon is your measurement noise. So, what we do not know is x. So, you provided this inputs a this is your that is your matrix a here, you got these outputs or measurements b here. And we know that b and a are related in this sort of way b is equal to a x but there is some measurement noise involved which is which is epsilon. And suppose for I am going to assume that we know that also that the noise takes has a Gaussian density. So, noise is distributed as a Gaussian random variable with with mean as the 0 vector and covariance matrix sigma. So, this is the mean of the Gaussian random variable the covariance matrix. So, when you get a string of these m measurements all corrupted by by noise like this with but we know the density of the noise we have we know that it is it is it has a Gaussian density what one philosophy for guessing what the x that result that resulted in these measurements is to say where is to look at the likelihood likelihood is simply the density likelihood in this case is simply the density of of the measurement the probability density of the measurement. So, if you are if you are if these were discrete random variables it would be the probability of seeing that particular measurement and then say well this is the probability of seeing this particular measurement with when you know when I gave these input then you asked and my there is a hidden parameter or an unknown parameter here which is x you asked what parameter would have maximized the chance of seeing this particular measurement. So, you look for the parameter x that maximizes the likelihood of seeing this particular parameter. So, you maximize effectively the like what you can say is the likelihood maximize find the x that minimizes the likelihood of the observed measurements. So, you find the likelihood of this now what what would that be that would be basically in this case you also you are maximizing over x in Rn the probability density evaluated at the measurements that you have received. This is the probability density of the measurement measurement random variable evaluated at the observed measurements. Now, this is the probability density of this. So, this because well the probability density well it is derived directly from the probability density of the noise itself and the system model that you have built. So, consequently it implicitly depends on x. So, I will write a small x here just to denote that just to indicate that there is a dependence on the x that. So, what we are doing is we are trying to say what kind of choice of x would have given me the highest likelihood of seeing this particular outcome. Now, there are there are criticisms about this particular philosophy of finding the x that is a separate matter that is not for our discussion I want to get to the optimization problem that this implies. So, so let us just let us take this forward. So, so this px of b is actually nothing but is can be written in the other in a different sort of way well I I maximize over x in Rn which is the probability of of an epsilon getting realized probability density of of epsilon evaluated. So, this is this is now it is called denote this. So, this was probability density of b this is the probability density of epsilon. So, this is probability density of epsilon evaluated at Ax minus p. So, what is the probability that epsilon turned out to give you can say take value Ax minus p you are finding the x that gives me this the largest such value ok. Now, it is quite convenient in this case to now instead of looking at the probability you actually should it is convenient to take the log of the probability because log is a monotone function that does not affect the optimization. So, you can just simply instead of doing a maximum likelihood you do maximum log likelihood. So, do log of probability of the density of density of the random variable epsilon evaluated at Ax minus b. So, this is log of log of density of epsilon evaluated at Ax minus b and this. So, because this is a Gaussian random variable this epsilon was remember in R m and this was a Gaussian random variable the log of this I can tell you it is I you get a we has as an expression it is minus m by 2 times log of 2 pi the determinant of sigma raise to 1 by m minus half Ax minus b transpose sigma inverse Ax minus b where remember sigma is the covariance matrix of the random variable. So, what you are doing effectively is what we are doing effectively is doing a maximization of this function over X in R n. Now, if you look at this function you will notice that the first term here the first term here has nothing to do with X at all it is simply a constant it does not affect the optimization problem it only shifts the function laterally or in one direction it has no it has no bearing on the optimization. So, this so, so as far as finding the optimal solution the optimal solution of the above problem also the also optimal solution of this problem where you are which is maximizing minus half Ax transpose sigma inverse Ax minus b transpose sigma inverse Ax minus b maximizing this second term now that is maximization of a negative quantity something that is negative. So, there is a minus sign outside maximization of minus of this. So, that is what I can do is that is actually equivalent to doing a minimization of the same thing multiplied by minus 1 minimization of this inverse. So, the X star that maximizes this quantity the first optimization problem here is this optimization problem is the same as the X star that solves this particular these are you can get it is the same X star that solves this the optimal value one will differ because you have we have dropped additive constants that were there in the objective function they have been dropped. So, they have to be added they need to be adjusted for, but the X star is the same the optimal the optimal solution is this. So, this is also another optimization problem over an open side this is very similar to the to the problem of minimizing the norm of Ax minus b except that now the norm has been has been scaled with this sort of matrix in between sigma inverse. So, it is it is it is or scaled or rather the correct word is skewed by this matrix Ax minus b inverse by this matrix sigma inverse. So, in general all these problems become tend to have this sort of form they have the form of of minimum where you are minimizing the norm of some w times Ax minus b the whole square where w is is your weighting matrix is a matrix of this is the general form of this problem. So, this was another example of a of a problem that is can very commonly used in practice in many different disciplines. For example, power system state estimation this is the standard model just so I wanted to demonstrate that as an application of optimization over open set. Now, let us come to let us take this forward and let us see if there is a there are problems other than problems that directly look like optimization problems over open set, but will somehow get reduced to those. And this surprising is actually a very general class. So, I will we can derive a very general theorem about this, but that is that is usually that is not as revealing and as illuminating as as doing an example. So, what I will do is do an example and then I will state the general result. So, hopefully we can at least complete the example in today's class. So, the problem is the following you have suppose you are in two dimensions you have your x axis you have your y axis. And suppose we have this thing here this is a ellipse it is given by this equation let us let me denoted by f 1 of x comma y this is given by x square divided by a square plus y square divided by p square equal to alpha suppose. So, any point on this ellipse any x comma y that lies on the surface of this ellipse must satisfy this equation and conversely any point that satisfies this equation you can locate it on this on this ellipse. The problem for us is to is simply to find the rectangle with maximum area that can be inscribed in this ellipse. So, how do you find such a rectangle what I need to do is I act how do I define such a rectangle well the rectangle can be defined in a very easily what we will do is we will make for simplicity we will make sure that the rectangle aligns with the coordinate axis that we have so that is possible. So, what I mean by that is suppose you take a point x comma y here I will look at the corresponding point minus x sorry x comma minus y look at this point minus x comma y minus y and this point which is minus x comma y take these four points and let us look for the rectangle of this sort of form that has the maximum area. So, what is the area of this rectangle with these with these con with these end points or these corner points it will be 4 x right 2 x times 2 y right. So, what we want to do is therefore find x and y such that you maximize the function 4 x y, but then x and y should lie on the ellipse. So, which means that x and y must solve what does this mean now what is the implication of this so if you look at this particular problem you are maximizing a function that is differentiable, but then you are not maximizing it over an open set you have to maximize this only over those points that are on the ellipse that are on the locus of that of this function f 1 of f 1 of x comma y equal to alpha right. So, we are so now if you look at the set of points that form this ellipse this is actually a closed set this is this is not this is not an open set and that then therefore becomes this becomes therefore a problem that is not in the previous category. However, what I will show you is that it is possible to still reduce it to some something that we have seen before. So, that is