 The last 3 modules sub modules in fact we talked about finite vector space matrices tools for multivariate calculus. There is yet another topic which is basic fundamental to pursuing data simulation that is called optimization. Optimization in finite dimensional vector spaces I keep referring to finite dimensional vector space because all the computational problems in numerical analysis and all the applications where we use computers to compute the solution the basic mathematical framework is finite dimensional vector space. In practice we can we can do only finite things we cannot do infinite things. Therefore finite dimensional vector space is the appropriate background on which the entire theory of computation has been built around. And to emphasize that I keep referring to the importance of recognizing the role of finite dimensional vector space. Now why optimization? We earlier saw in module 1 a broad introduction to an overview of data simulation. The data simulation can be thought of as curve fitting. Data simulation can be thought of as regression analysis. Data simulation can be thought of as identification. Data simulation can be thought of as estimation. So let us take the point of view of estimation. Whenever we want to estimate we want to be able to estimate optimally. What does it mean? We want to be able to get the best estimate. Best optimum optimization. So optimization theory is fundamental to pursuing estimation theory. Estimation theory and optimization theory are interrelated to the fact I use principles of optimization in estimation. Estimation theory is a topic within statistics. Optimization theory is a topic within multivariate calculus. So it is the interaction between the two that provides the ability to optimally estimate. So we would like whenever we do whatever we do, we want to have the best prediction. We want to have the best estimate. We want to have the best way to tell what the temperature in Bangalore would be tomorrow afternoon. So on and so forth. So we are always seeking for the best. Best means optimum. So we need to be able to have a clearer understanding of the notions of optimality when is something optimum and the properties of optimum. Optimum can be maximum or minimum. When you try to talk about cost functions I would like to be able to minimize the cost. When you try to talk about profitability we want to maximize profits. So in economics the aim is to be able to maximize profits. So in economics most of the problems are posed as maximization problem. In engineering sometimes we talk about minimization of energy to be able to accomplish a particular task. In the case of estimation we would talk about minimizing certain magnitude of errors. So maximization, minimization are parts of the optimization theory. Maximization and minimization are truly interrelated with each other. So in this sub-module we are going to be reviewing principles of maximization and minimization. So a classification. Let F be a scalar valid function. You can see the notion of a scalar valid function comes into play right away. Let F be C to Rn that means F is a function which is continuously differentiable in Rn. At least twice continuously differentiable. If F is twice continuously differentiable functional. So if it is a scalar valid function it is called a functional. What is the relation? Minimum of F with respect to X is the same as maximum of minus F. Therefore it is enough I study either maximization or minimization without loss of generality we will consider minimization. So that is the general idea in the start that we need to understand. So you do not have to do maximization and minimization separately because of this intrinsic relation between maximum and minimum. It is sufficient either to do maximum or minimum. We will do minimum. A classification of minimization problems. Minimization problem occurs in various shapes and forms. The first classification is with respect to modality. A minimization problem can be unimodal or multimodal. In a unimodal minimization problem the problem has only one unique minimum. For example if you have a cost function which is parabolic which is a quadratic function this is a unique minimum. But if you have some non-linear problems you may have a function which may be like this in this case there is multiple minimum. So unimodal function versus multimodal functions unimodality multimodality who creates this the cost function that we use is the one that creates unimodality multimodality. It is easier to do unimodal minimization as opposed to multimodal minimization but at the outset we would like to be able to distinguish between the existence of two types of minimization problem. Now let us describe mathematically what is the minimum. Let F be a function. Let X star be a point at which the function attains the minimum and what do I say the how do what is the property of the function of the minimum. If I consider the value of the function at the minimum if I consider any point which is in a small neighbourhood of the minimum so let us draw a little picture. Let this be the minimum point X star if I consider a small neighbourhood around that this is why the value the function at Y in a small neighbourhood on X star is always larger than X star. So in this case X star is called the local minimum. So for example in this case this is the local minimum. So this is X1 star this is X2 star in this case this is X star. So X1 star is the local minimum X2 star is the local minimum in the sense that if you go in any direction away from the minimum the value the function is going to increase that is the basis in here. If X star is such that f of X star is less than f of Y for all Y then it is called a global minimum. So in this example X star is a global minimum this is global here this point X2 is a global minimum X1 is also a minimum X1 is a local minimum X2 is a global minimum there are multiple minimum uni minimum multiple minimum. A function has a unique minimum is called uni model function otherwise called multi model function. The function X times X square minus 1 is multi model the function X square is uni model I would very strongly encourage to be able to plot these functions and see where the modality occurs where the minimum occurs where the maximum occurs. So if there are going to be multi model there will be multiple minimum and multiple maximum too. Multi modality is always a headache optimization problems are man made. So when you are trying to create a particular problem do not create too much of trouble for yourself choose the cost function such that it is endowed with unique global minimum. If you can arrange the problem such that it is endowed with a unique global minimum your headache will be lot less. But if you formulate the problem such that it happens to have multiple minimum you have to scratch your head finding the global minimum will be a much more difficult problem. Of course this is easily said than done for a given cost function some problems may be endowed with one minimum some problem may be endowed with multiple minimum. If the problem is endowed with we have a duty to find all the minimum to be able to pin down who is the global minimum what is the best minimum. So uni model versus multi model the next classification is constraint versus unconstrained. Again those of us who have done little bit of meteorology or data simulation should be able to appreciate. Minimize f of x that is an unconstrained problem I have no constraint on x but I would like to be able to insert a constraint now. Let C be a subset of Rn defined by a set of equations or inequalities for example let C1 be the set of all x such that x1 plus x2 is equal to 1. Let C2 be set of all x in R2 x1 is greater than 0 x2 is greater than 0 x1 plus x2 is less than or equal to 1. So let us understand those constraints now the first constraint C1 is given by this line maybe I will use the arrow the arrow. So in this problem x1 plus x2 is 1 so what does this mean even though there are very many points in the 2 dimensional plane I am interested only in those points that lie on this line the line is the constraint. This the second one on the other hand x1 must be greater than 0 x2 must be greater than 0 x1 plus x2 is less than or equal to 1 and that essentially tells you the region is inside the triangle. So these are the so constraint could be along a line along a curve it could be a sub region. So what is an unconstrained minimization problem an unconstrained minimization problem is stated like this given f of x minimize x with respect to all points in x square in R square there is no constraint you can go anywhere wherever it leads to that is called unconstrained minimization problem. Minimize f of x not over all x but over x belonging to C1 that means I am not interested in every x but I am interested only those x that lie along the line say that is a constraint minimization problem. This is a special form of constraint minimization problem because the constraint set which is the line is given by the equality constraints so this is an equality relation x1 plus x2 is equal to 1 is an equality relation x1 plus x2 is equal to 1 there are infinitely many point that satisfy the that equality. So I am interested in minimizing f of x not over the entire x but along that particular line C2 I would like to be able to minimize f of x constraint minimization problem this is called inequality constraint because x's have x1 is greater than equal to 0 x2 is greater than equal to 0 x1 plus x2 is less than or equal to 1 so that is called inequality constraints. Most of the problems in meteorology are either constraint or either unconstrained or constraint with equality constraints in operation research they all often deal with constraint optimization with inequality constraints of these 3 problem constraint optimization inequality constraint is the most difficult one but these problems are now very thoroughly understood this body of literature represents one of the most thoroughly understood disciplines within optimization theory. The theory of linear and non-linear programming deal with minimization under equal inequality constraints you may have heard of linear programming linear programming what is linear programming linear programming deal with they deal with minimization under inequality constraints in this course we will deal with only with unconstrained and equality constraint minimization problem why we try to formulate the problem in such a way. So equality constraint is easier than inequality constraint further classification so we talked about unimodal versus multimodal constraint versus unconstrained now I am going to talk about universal multi objective functions there is only one objective there are multiple objectives let f if f is rn to rn is the only function to be minimized then it is known as unie objective when f is a vector valid function if I have m components where we want to minimize some component maximize some other component it is called multi objective minimization optimization let me give you an example let us take an example of an automobile design it is one of the hot problems I want to maximize the fuel efficiency I want to minimize the price the cost I want to maximize the safety and comfort so an automobile designer if you tell him I want you to give me a car with a maximum efficiency he will give you but there are no safety no comfort I want to give you a car which is best comfort but it is very few mile per gallon the real automobile design is there are multiple objective that do not that are consistent with each other and automobile engineers have come up with strategies to be able to solve this multi objective optimization problems fortunately in meteorology we are always dealing with one unie objective so operation research people dealing with multi objective optimization and the top of the inequality constraint wow that is some of the toughest minimization problem one can deal with so the conceptually the optimization problem that often occurs within the context meteorology oceanography dynamic data simulation are the easiest of the problems but we say it is difficult it is not because conceptually difficult is because of the size of the problem in meteorology the course of dimensionality we are interested in solving large problem but simple problems in operations is on the other hand they may be solving small problems but are much more complex when compared to problems in meteorology so that is the difference I would like to be able to I would like to be able to bring about so if you meet with the automobile design engineer talk to him or her as to what kind of problem how do they optimize so that we can learn from the kind of methodology they use so multi objective multi so you can think of all the things now unimodal multimodal constraint equality inequality and then unie objective multi objective if you stack all of them together you get a broad overview of this discipline called optimization theory and that is what people in operations resources research there are two groups one develops the fundamental theory others talk about applications of this theory to various problems of industry of engineering and industrial interest the next one is the notion of convexity convexity and optimization are intertwined and I would like to bring about the beauty of the role of convexity the notion of convexity in an optimization so it takes a little bit of a definition so let us be a subset of Rn I call yes a convex set if for every pair of point X and Y and S if I join them by a line the line segment joining any two X and Y completely lies in X for example in the case of a circle if I pick a point in the case of a circle if I pick a point here if I pick a point here the entire line within this if I get this the entire line within this but if I pick a point here if I pick a point here part of the line is outside the set so this is not convex these are convex sets the notion of convexity essentially includes every point is such the line segment joining them includes so you can think of that a sphere is a particular a circle is a perfect example of a convex set a convex object I have to define what is called convex functions but before I talk about convex functions I am now talking about convex set when is a set convex that is the definition of a convex set when do I say a function is convex I will tell you pictorially I am sorry the picture is not perfect but you get the idea I have a function ffx you take a point x ffx y ff y ff y you draw ffx and ff y join them by a line that is a chord in this case the entire function lies below the chord. So what is an example of that if you have x square if you took two points any two points on a parabola if you draw the parabola the parabola is always lies below the chord any function that has this property in general called a convex function x square has a unique minimum so convex functions in general have unique minimum that is where the notion of a convexity comes in we are interested in quadratic functions all the cost functions that are interesting in data simulation are all quadratic functions why quadratic functions has cost quadratic functions appropriately formulated is convex function I have unique minimum I do not have to pull my hair trouble myself to be able to perform minimization so that is where the notion of convexity comes in. So s is a let s be a convex set let x and y be point in s a function from s to r so the underlying set over which the function is defined as a convex set are you all in place so there is an underlying set which is convex there is a function defined over that if I take any two points in the convex set and evaluate the value of the function and draw a chord the function lies below the chord what is an example x square. So f function f from s to r is said to be a convex function ff alpha x plus 1 minus alpha x is less than or equal to alpha times f of x plus 1 minus alpha times f of y for all alpha 0 to 1 now look at this now when alpha is 0 this becomes f of y when alpha is 1 it becomes f of x for every point in between the function is below the chord so that is the definition the function is lies below the chord. So if f of x is convex minus f of x is called concave so for you how would I imagine a convex function f of x is a typical example of a convex function x transpose x is a typical example of a convex function x square is a typical example of a convex function concave functions for maximum convex function for minimum maximum and minimum are dual of each other concave functions and convex functions are dual of each other so within the context of minimization we are in generally interested in convex function convex sets. So x square is convex x cube is not convex what is the what is the what is the plot of x cube that is x cube. So if I took a point here if I took a point here some part is less some part is above. So x cube is not convex x square is convex so that is the model by which you can go with your notion of convexity convex function. So we have defined convex set we have defined convex functions. So there are very many different ways of characterizing convexity let me quickly run through them in the previous definition of convexity I did I simply assumed that there is a function I did not assume any differentiability or anything f is a function I defined it to be a convex. Now suppose I know a little bit more you say in addition f is c1 a c1 function is said to be convex if for any 2 x and y f of y is greater than or equal to this the curve lies above the tangent that is an equivalent definition. If on the other hand if f is in c2 twice differentiated continuous functions the that f is convex if and only if the Hessian is positive definite positive in fact it can be called positive semi definite it is a strictly positive definite do you say the distinction place. So any f in c1 f in c2 for a c2 function if the Hessian is positive semi definite it is convex if it is strictly positive definite it is called it is called convex for example let us take a straight line is it convex secondary derivative is 0 a rhythmic place is so this is convex but this is strictly convex a straight line is simultaneously convex and concave because it is the separation between convex function and convex on the straight lines so you have examples of convex functions. So what is the why are we interested in convexity you earlier talked about unimodality unimodality and convexity are close cousins of each other unimodality refers to functions with unique minimum global minimum unimodal problems are easier so unimodality and convexity are intimately associated with each other so let us talk about this now let f be a convex set sorry let f be a convex set f be a function from s to r so what does it mean f is a real valid function defined over a convex set then f has a unique minimum hey that is the theorem I am not going to prove this theorem is the theorem in convex analysis f is on the other hand is in c2 then at the minimum the first derivative is 0 second derivative is strictly positive definite at the minimum first derivative is 0 and second derivative strictly positive definite means what that point is a minimum so what is a typical example of a convex function f is equal to x transpose ax minus b transpose x is a typical function that is in c2 when ax is symmetric and positive definite now you can see all the tools that we have developed in optimization theory in in in in in matrix analysis all comes into a hue now beautifully so this is a quadratic form this is a linear function this is a general quadratic form so if you look at all the functions that 3d bar 40 bar all talk about they are all functions of this type they are they are typically convex they are also typically in c2 they have a as a Hessian Hessian is symmetric and positive definite so by definition by design all these functions are unimodal they are convex functions everything is beautiful so the role of convexity in inducing unimodality is one of the fundamental beauty of the underlying mathematics that one needs to have a an appreciation to be able to see why when we develop problems when we develop objective functions we always think of objective functions as quadratic forms or quadratic objective functions and that is where the importance of convexity unimodality comes into play so with that as a background now I am going to run through conditions for existence of minimum please understand algorithms tend to compute the point where the function are trying to the minimum but before you start your computation somebody has to guarantee that there is one that exists there is unique so unless I know something exists I cannot go and find it so mathematics helps the first level of mathematics helps you to prove existence and uniqueness of minima once you establish the existence and uniqueness of minima then you develop algorithms to be able to seek it so characterizing the properties of minima and maxima and then algorithms to see the minima and maxima as fast as possible as efficiently as possible these are two complementary aspects of the optimization area so condition for the constraint minimum condition for the constraint minimum again most of us know from basic calculus but in calculus we talk about univariate function here I am talking about the corresponding results for the multivariate function a multivariate function means it is defined over a vector the value scalar f is in C2 twice differentiable twice continuously differentiable a necessary condition for the minimum to exist is that at the minimum the gradient must vanish a sufficient condition for the minimum is that at the minimum the second derivative must be positive definite that means the Hessian is a symmetric positive definite matrix Hessian is in general symmetric it need not be positive definite every positive symmetry need not imply positive definite but when you consider positive definite matrices you need to consider only symmetric matrices that comes essentially from the theory of quadratic forms so that is where the whole thing comes into play that is why Hessian is very important the tool from multivariate calculus that characterizes the second derivative so what do I mean by saying this Hessian is positive definite and around the minimum the function looks like a bow a punch bow so the minimum is a valley so how do I characterize a valley at the minimum point in the valley the function is convex the function looks like a parabola so parabola x square parabola is a model for such minimization process which is related to the quadratic function now I am going to talk about once we have talked about the conditions for the minimum I am now going to talk about equality constraint problems this is simply an algorithmic process I am going to illustrate that by an example let us suppose this is the problem we generally do in univariate calculus I am sure every one of us would have done I would I am given a rope of length L feet I am going to I am asked to enclose an area with this let A be the length B be the width 2A plus 2B is equal to L so I am given a rope of fixed length what is the idea here I am going to have to use the rope to enclose an area the area A is A B A is a side B A and B are 2 sides of the rectangle so what is the idea when if when 2A plus 2B is fixed as L how do you maximize the area A times B suppose somebody says I am going to give you a rope the area that you can enclose by that rope is going to be yours free so humans are built in greedy so you want to be able to say hey with this rope I want to be able to enclose the maximum area that you have to that I can so that is the problem now so maximize A is equal to A B when 2 times A plus B is L so now you can see it is a constraint problem A is the variable B is the variable A and B are not independent if A and B are independent variable when is this maximum infinity A is infinity B is infinity but A cannot be infinity B cannot be infinity because A plus B is L by 2 or 2 times A plus B is L so I have to solve a problem with a fixed length rope so that is the constraint so this is the equality constraint so I have to maximize the equality constraint what is the simple way I am sure every one of us if you cannot have gotten your B as degree until you have solved this problem once in your life I am sure every one of us have solved so what is that we do we first eliminate B from this constraint you can simply say B is equal to L by 2 minus A so if you substitute B in the A A becomes this now A becomes a quadratic function in A you compute the derivative of capital A with respect to little a this is the derivative you compute the second derivative it is negative the second derivative positive means minimum second derivative negative means maximum so the A obtained by solving D A by D A is 0 so when L by 2 is equal to 2A or A is equal to L by 4 so the first derivative is 0 means L by 2 is equal to 2A or that implies A is equal to L by 4 when A is equal to L by 4 B is also equal to L by 4 so at which time the area is maximum the maximum area is L square by 16 so this is the method for solving minimization problem under constraint so what is that you do use the constraint eliminate one of the variables you convert the two variable problem into a univariable problem apply the principles of calculus you solve the problem method of elimination that is the illustration for solving equality constraint problem this is easy for small number of variables but in meteorology you have tens of thousands of variables you cannot do this but again it is essentially an example so what is the general method for solving equality constraint problem the classical Lagrangian multiplier so I am going to quickly run over the framework for classical Lagrangian multiplier so equality constraint minimization problem Lagrangian multiplier method let G be a vector valid function I want to be able to minimize f of x under the constraint that G of x is equal to B what is B B is a m vector G x is a function is a vector valid function so G refers to G m G 1 G 2 G m B refers to B 1 B 2 B m so what does this constraint refers to G I of x is equal to B I for I running from 1 to n so each of this is going to constraint so I would like to be able to minimize this under equality constraint fx a nonlinear function G is a nonlinear equality constraint in the previous problem whatever the equality constraint a plus B is equal to L by 2 that is a linear function a is a variable B is a variable they occur in the first variable first degree here G of G 1 of x G 2 of x I do not know what it is it could be any function so in general is a nonlinear function so define a Lagrangian which is a sum of f of x B minus G of x that is the constraint you lambda is a vector lambda transpose B minus G of x is a scalar add that scalar to f of x so that becomes a new Lagrangian function now x is a vector belonging to R n lambda is a vector belonging to R m because G is a m vector B is a m vector lambda must be a m vector so I have now defined a function where this is n long this is m long so the total number of variables is n plus m I am expanding the space over which I need to do the minimization this is the technique that Lagrange designed number of years ago what did he say he said the following what is the theorem the theorem is as follows the minimum of x when G of x is equal to B the constraint minimum of f of x is equal to the unconstrained minimum of L he converted the problem of constraint minimization to an unconstrained minimization unconstrained minimization we know how to solve constraint minimization are difficult so what is the method you convert a constraint minimization problem into unconstrained minimization problem but what is the price we are going to pay the constraint minimization problem is a n dimensional problem because x is a n vector but the resulting unconstrained minimization problem is a m plus n variable is a larger space so by expanding the space over which I am going to minimize I can convert a hard problem into easy problem so that is the fundamental idea lambda is called the under demand Lagrangian multiplier so what is that unconstrained minimization problem how do we solve this you compute the gradient of L with respect to x equated to 0 gradient of L with respect to lambda the second one must be lambda I am sorry this must be lambda so you compute the gradient with respect to x gradient with respect to lambda equate them to 0 the gradient computation I have already talked about when I talked about multivariate calculus how to compute gradient of various types of functions now you can see why we do what we did all of it may that is right so this gives you a set these two give you a set of necessary condition a necessary condition for the minimum is that at the minimum the gradient of f of x so what does this essentially say if b is equal to g of x so if this constraint is satisfied this is trivially true so if the constraint is trivially satisfied what does the first one say the first one essentially tells you gradient of x with respect to f is simply summation of lambda I gradient with respect to x of gi i is equal to 1 to m so each g has a gradient lambda I are the constants this is the linear combinations of the gradients of the components of g f of x is the gradient f of f so at the minimum what does Lagrangian theory say is one of the most beautiful results in applied mathematics it says a necessary condition for the minimum is that at the minimum the gradient of f of x must be a linear combination of the gradient of the constraint functions in a unconstrained minimization what must be the gradient value delta f must be equal to 0 that is unconstrained in a constraint the gradient must be equal to the linear combination of the of the constraint functions the coefficients of the linear combinations are the lambdas so by solving these two equations simultaneously you not only find the minimum x star but also find the value of lambda which are used in this linear combination so you kill 2 birds in one stroke you kill 2 birds in one stroke so either you solve by minimization or a constraint minimization by elimination as we did in this simple example which is feasible only for small dimensional problem if the problem is a large dimensional the only recourse to solving equality constraint minimization problem is Lagrangian multiplier method Lagrangian multiplier method the Taylor series expansion these are very fundamental tools in in in doing many things that we do in in in optimization theory we can also talk about a certain sufficient conditions I am I am not going to go over the details of this but I want to I want you to recognize the following I am going to talk about the need for sufficient condition in an in an unconstrained setup in a in an unconstrained setup what is that we have we have we have talked about the gradient of f must be 0 the Hessian of with respect to x must be spd this is the unconstrained characterization of of of is that I am sorry this is the uncut let me I made a mistake I have to erase this part good that the Hessian must be spd so in the previous slide we only talked about the necessary condition first derivative I have not talked about the second derivative condition are you in place there is a first derivative condition and second derivative condition first derivative is necessary second derivative is sufficient both for constraint both for unconstrained so the sufficient condition for equality constraint problem is that the Hessian of L so that should not be surprising that is the reason I went into this in the case of unconstrained problem the Hessian must be symmetric positive definite what is the analog of that if you consider the Hessian of L with respect to x which is given by this that must be positive definite in an appropriate set of space so with the with the with the with the necessary condition and sufficient condition we have shown the conditions for existence and uniqueness of minima when there is equality constraint that is the fundamental talent of this I am I am not proving the derivation of sufficient condition as I have not proved several of the claims I want you to remember all the theory that we have covered in matrix theory is the half course in linear algebra all the topic we are covered in finite vector space is one third of a course all the topic I am covering in optimization literature optimization literature is about one third of a course in optimization theory so these are parts of several courses pulled together in a hue to be able to develop an appreciation for the underlying mathematical background needed in doing what we do now I am going to illustrate by a simple example let n be 2 let f be given by this that is a non-linear function you can readily see I want to be minimized this I am considering inequality constraint problem I am considering a Lagrangian multiple Lagrangian function I compute the first order necessary condition so first order necessary condition gives rise to these two equations I solve these two equations and I get this optimal solution to be this so that is the optimal solution I am going to leave the method of solving this to get this as a homework problem you can readily see I also compute the Hessian and I show the Hessian and an appropriate definition is also positive definite and hence I demonstrate that x1 is equal to 4 and x2 is equal to minus half is the constraint minimization problem for this which is a very typical homework problem that is that one has to do why I would like you to emphasize the following the function is a quadratic function look at that now first degree second degree second degree the constraint is a linear function so objective function is quadratic the constraint is linear simplest possible case you have to have an aha here you have to be able to do this to be able to do anything else in life in this area so this is a very nice example that illustrates the power of the Lagrangian multiplier by expanding so in here there is only one lambda because there is only m is 1 n is 2 m is 1 so I find both the lambda x1 x2 we solve the problem and that is the constraint minimum now I am going to talk about another class of function which called penalty functions which is again very well used in this is very much in Vogue in data simulation literature in data simulation literature Lagrangian multiplier technique is called strong constraint formulation so let me let me let me go back and so do that tell you that Lagrangian multiplier technique within the context of data simulation is called strong constraint problem because I am I want to be able to console the satisfy the constraint at any and every cost because the constraint is sacred I cannot afford to not to satisfy constraint so the constraint is very strong there is no way there is no way out of it but in some cases I have constraint but I do not want to enforce the constraint strictly but you should be very constraint you cannot you you can deviate from the constraint but not too much so that is what is called weak constraint formulation this weak constraint formulation is done by a class of method called penalty function method say people in in geosciences they are very clever people if they call penalty function they will say oh yeah it is already exist in operation the literature so they give a different name by whatever name you call the row smells the same so the same idea comes in different areas by different names it is very easy to get lost that is why I am trying to develop this bridge between the terminologies that are used in different disciplines strong constraint Lagrangian weak constraint penalty weak constraint means what I want to respect but not to the verb of the verb and word of the law it is like speed limit you say somebody says the speed limit is 60 miles if you go 60.1 are they going to give you ticket now if you go 70 definitely they will give you ticket so when when does the police catch us somebody when there is when when when the speed limit in this highway 60 60 miles are 60 kilometers well plus or minus 5% if you go too slow they will come and ask you why are you going slow if you go too fast they will say hey I will give you a ticket you are so you are allowed to break the rule but within certain limits so speed limit is an example of a weak constraint the example of a weak constraint so f is a function so let us look at this now again f is a function g is another function I want to be able to minimize f I want to be able to minimize under equality constraint so I am trying to solve equality constraint problems that is where that is interesting so instead of a Lagrangian multiplier of Lagrangian function now I am going to talk about a penalty function what is the penalty function this is f of x plus alpha times g transpose gx so this is the alpha is called the penalty parameter alpha is a fixed large number alpha is a fixed large number because I have a 2 here I think I must add a 2 here 2 otherwise it is not necessary you have it you need to have a 2 if I have a 2 here you should have a 2 below so what is g transpose gx it is simply gi square sum of gi square alpha is called the penalty constant or a penalty parameter alpha is chosen alpha is not a free variable there is only one free variable which is x what is the difference between this and Lagrangian multiplier in the Lagrangian multiplier lambda was a free variable x is a free variable here it looks as though alpha plays the role of lambda it looks as though alpha plays the role of lambda but the difference is alpha is fixed it is not a free but x is free so I have 2 free variables here one free variable and another free parameter but I have to choose and fix it this is the difference place now that is right so alpha is called a penalty parameter generally penalty parameters are supposed to be large so the gradient so what is this you solve this constraint minimization constraint minimization problem by solving this unconstrained minimization problem p alpha so what is the gradient of p alpha it is gradient of this plus that now you remember when we talked about multivariate calculus h transpose h g transpose g the gradient of this is Jacobian transpose tan g in order to be able to do that you must have known the knowledge from multivariate calculus that we did in the last lecture that is why we are trying to make it as complete as possible where d is a Jacobian so x star the optimal solution is obtained by solving the gradient is 0 this optimal solution x will always depend on alpha so the optimal solution is a function of the penalty parameter alpha it may place so given a problem I can solve the problem in a strong constraint fashion given a problem I can solve the problem in a weak constraint problem that prints is the following question how are these solutions are related it can be shown I am going to show in a minute if I took x star alpha and let alpha grow to infinity it reduces to x star alpha of the Lagrangian multiplier technique in other words the weak constraint solution converges to the strong constraint solution when the penalty parameter grows unbounded alpha goes to infinity you say in other words a problem is the same problem you solve it do different ways you formulate it by two different ways so which solution do I take I want to be able to talk about the difference or a relation between the two solutions and that is the final solution the weak solution converges to a strong solution as the penalty parameter goes to infinity so what does this tell you if you believe strong constraint population is difficult weak constraint problem you can solve it and you can simply change alpha or plug alpha to a large value then you will know you got a solution which is as close as possible to a strong solution so what is alpha in terms of speed limit in some police department they are very strict if you they will not allow more than 2 miles above speed limit in some police department they will allow 5% above the speed limit that is the value of alpha in different departments and why do they do a structural limit they want to make money if I give you a ticket the city gets money out of you so what is one cheap way for cities to be able to get good money from public put speed limit if you put speed limit there are people who will test it and if you catch them when they test the speed limit you can find them you get money so alpha is the parameter by which you allow you vary the allowance above speed limit so that is where the role of the parameter alpha comes into being so it can be shown that the solution the weak solution tends to be a strong solution as alpha becomes large I saw there are other properties I am sure you can follow this but that is the main result of this page is this result convergence in is the fundamental result I am going to illustrate the weak solution and strong solution again by example so I got an ffx I got an ffx I got a gfx is a quadratic problem with a linear constraint you can easily see this problem let us look at this problem now this is x1 this is x2 ffx is the parabola you can see a bow sitting on that but I am interested that line so I am now have to define the parabola over this line alpha is my place so instead of the parabola being located at the origin now I have to think of a parabola defined over the line where will the parabola be a minimum a little reflection tells you it must be half half so the constraint solution is half half unconstrained solution is 0 0 so that is what we have shown we have formulated this problem as a penalty function problem p alpha alpha by 2 we have computed the optimal solution by penalty the optimal solution by penalty is given by this now when alpha goes to 0 1 over alpha goes to I am when alpha goes to infinity 1 over alpha goes to 0 therefore 1 over 2 plus alpha inverse 1 over 2 plus alpha inverse tends to 1 over half 1 over half everybody with me please therefore you will be able to see and if you solve the same problem we have already solved the half half is the strong solution the weak solution converges a strong solution as the penalty parameter grows and now it is a beautiful illustration of the relation between the two so strong versus weak constraint formulation again there are two choices mathematicians have provided us so which one we are going to choose in our analysis that is the question minimize f of x with respect to g of x is equal to b so what is the one standard condition that constraint condition that comes in meteorology barotropic if the atmosphere is barotropic there are there are constraints geostrophic constraints so when you when you recover the u v velocity it cannot be anything it must can satisfy geostrophe so geostrophe is a constraint that comes from physics u and v are the velocities that you may recover from other means the recovered values of the velocity if they do not satisfy the geostrophe constraint the recede value of the velocities of no use so what is that you would like to be able to say recover the velocity with the constraint the recovered velocity must satisfy geostrophe constraint do you want exact geostrophe or you want approximate geostrophe if you want exact geostrophe strong constraint if you want approximate geostrophe we constraint in meteorology all the equations are approximation from Navier stokes from primitive equation so when your model is an approximation there is no point in requiring something to be stronger so everything has to be consistent therefore we constraint formulation is a beautiful formulation where you allow for variations from the equilibrium or conditions but only by a very small percentage so it fits in so this mathematical concept of strong versus weak jives well with our concept of geostrophe is that atmosphere which is perfectly geostrophic no is the atmosphere is always barotropic no is always baroclinic no in some cases barotropic in some cases baroclinic there are different situations so we would like to make different approximations so depending on the nature of the approximations we can impose different kind of constraints to be able to handle problems that is that is so mathematics provides you the facility to be able to handle different parts of analysis and different kinds of assumptions you want to make and that is where the fit between physics and mathematics comes into a beautiful hue and that is what I would like you to appreciate when we do this so Lagrangian multiplier method is a strong constraint formulation which forces exact equality constraint penalty function method is called a weak constraint formulation which only forces approximate equality depending on the values of alpha the solution is more closer to the constraint when alpha goes to infinity when the when alpha goes to infinity the constraint is exactly satisfied we will use both the formulations in most of data simulation problems with this we come to a set of exercise problems again I want you to go over these are very simple exercise problems I also would like to recommend that you do these exercises both with pencil paper also on a computer one particular medium of my choice my favorite choice is matlab for example when you say plot the function it should be f of x I am sorry not f of alphas this should be f of x not f of alpha I would like you to plot this function what are the quick way to plot this function matlab in two lines you can you can you can you can do so I would like to recommend use of either matlab or mathematical if you have good facility in programming in either in mathematical or matlab you can do these exercise of pencil paper and you can verify them by doing in a computer and that will make your understanding rather complete so I have given exercises covering most of the topics that we have covered in this in this in this arena some of the standard text books on optimization these are my favorite text books I have copies of these things in my library in my personal library Luenberger optimization vector space 1969 is a classic book Luenberger introduction to linear and non-linear programming in 1973 is another classic book on constraint minimization nation so for linear and non-linear programming problem published in 1996 is again another classic so any one of these all these books are very similar you can read these books based on the notes to further expand on the proves and deeper understanding and I hope with this you come to realize how what are the what is the extent of the mathematical ability that one needs to have to be able to pursue and do good work in data simulation this involves concepts from finite vector space concepts from matrices concepts from multivariate calculus and concepts from basic optimization theory that is another part which I am not included in this discussion is probability theory when you go to stochastic aspects of estimation you need to understand reasonably good you need to have a good background in probability theory and basic statistics we will try to fill in some of these things as we go by through the lectures so with this we will conclude our overview of mathematical polymerase for doing data analysis thank you.