 In the previous modules we talked about static deterministic linearly square problems weighted and weighted versions well posed and ill posed versions. We also saw the natural relation between the least square solutions and a geometrical interpretation of the least square solution and that theory was very beautiful in itself. But very seldom the problems that we come across in geosciences are linear some are linear some are but many of them are non-linear. So in this module our aim is to be able to extend the concept of least square solutions to solve deterministic static non-linear inverse problems. So let H be a map that means H is a vector valued function of a vector X is a vector that gets into Rm H of x that comes out is belonging to Rm. So you can see this is the vector valued function of a vector it is also called a map in meteorological context in geophysical context is also called forward operator it is a map from the model space. So Rn is the model space this is the observation space. So H is a map from model space to the observation space H has m components H1 H2 Hm transpose whether X is where X is Rn. So given Z which is Rm given the nature of the function H our problem is to be able to estimate X such that Z is equal to Hfx that is the problem non-linear version of the least linearly square problem. The linear problem what is that we said there is a matrix H that goes from model space to the observation space. So in that case we had the problem Z is equal to Hfx we have solved that problem now it takes the form Z is equal to Hfx this we have already done our question is how to do these problems how to do these problems and that is our that is our goal in this in this module. We are going to characterize this inverse problem again as an unconstrained minimization problem please remember that is exactly what we did when we did the linearly square problems. So we are going to follow the same track of ADS the residual is Z-Hfx m is a vector x is a vector. So you can think of the residual to be Z-Hfx and that is in Rm. So we can now concoct a function f of h which is the square of the norm of the residual the square of the norm of the residual is given by Z-Hx transpose times Z-Hfx the only difference is instead of using a linear function I am using a non-linear function that is all what the difference is if you multiply this and simplify you get Z transpose Z-2 times Z transpose Hx transpose Hx transpose Hfx so that is a scalar each of this is a scalar function of the vector we again seek to minimize f of x with respect to x and there is no constraint on x x belongs to Rn that is why it is a unconstrained minimization problem a standard way to solve the unconstrained minimization problem is to be able to compute the gradient from the module on multivariate calculus we have already seen we have already computed gradient of terms like Z transpose H which is equal to 2 times transpose of the Jacobian of H times Z again from that module on multivariate calculus the gradient of H transpose H is transpose of the Jacobian of H times Hfx this can be succinctly written by 2 times the transpose of the Jacobian of Hx-Z where please recall the Jacobian is simply a matrix of partial derivatives it is a m by n matrix Jacobian. So if Hfx is equal to H times x linear the Jacobian of H is simply H so by specializing this we can readily get the least square counterpart of this so this is in that sense is a generalization. So we have computed the derivative in order to be able to maximize in order to be able to minimize we want to be able to minimize Ffx please we want to be able to minimize Ffx in order to be able to minimize Ffx I have to be able to equate the gradient to 0 if I equate the gradient to 0 as you can readily see I get a non-linear equation where is the non-linear equation comes from there is a Jacobian of H times Hfx- so I will rewrite this now the solution is essentially given by the solution is essentially given by Jacobian of X with respect to H Hfx-Z must be equal to 0. Now H is non-linear DH is also non-linear product of two non-linear functions are non-linear therefore the gradient is a non-linear function and we have to solve a non-linear system of equations the only way to solve a general non-linear system of equations is to solve them numerically. So one way to be able to solve the non-linear least square problem is to be able to compute the gradient and use the well established procedures from numerical analysis to be able to solve the system of equations that is one way. So I would like to summarize by saying there are number of packages available and these packages can be used to solve this type of equation 3 and using that we can find the minimum of Ffx the solution of these equation 3 corresponds to satisfies the necessary condition for a minimum and then in order to be able to guarantee their minimum we have to compute the Hessian we have to evaluate the Hessian at the roots then we have to test whether the Hessian are positive definite once the Hessian are positive definite then we have minima in general a non-linear function may have many roots for this equation therefore that could be multiple minima so this going to be computationally a very challenging problem. So to get around that what is that we are going to be doing we are going to be looking at an alternate method we are going to be seeking good ways to approximate the non-linear least square problems and that is what we are going to now describe the approximation we are going to be talking about is called a first order approximation first order approximation to the function Ffx. So what is the basic idea let us pretend I now know where to start the solution that is the current operating point generally engineers and scientist know the range within which the solution they may not know the exact solution but it is supposed to be in this box or in this sphere. So Xc current operating point is some point that we already know which is not too far from the solution so that is contingent on our prior knowledge of this problem. Now what do we do we try to expand Hfx in a first order Taylor series in a small neighborhood around the point Xc again going back to our module on multivariate calculus I am going to be expanding Hfx. So you can think of the domain like this this is the current operating point Xfc I am considering a small enough neighborhood around this I am now considering a point X in a small neighborhood around Xfc. So Xfc is the current operating point X is the point I would like to move from Xc to X if X is closely Xc I can express Hfx by a first order Taylor series which we have already seen in our module on multivariate calculus. So Hfx of c is the Jacobian of H at the point Xc which is so I can so given H I can always compute the Jacobian if you give me Xc I can evaluate the Jacobian matrix numerically. So the numerical value of this Jacobian matrix is known Hfxc the value is known Xc is known so you can simply see is simply a function of Hfx it is simply a function of Hfx that is the vector is the vector function and not only it is a function of X you can also see it is linear in X. So it is a linear approximation to Hfx around Xc. So what is that we have done Hfx is defined globally but I have replaced the global Hfx by a local linear approximation using a first order Taylor series in a small neighborhood around the current operating point. So that is the key to this argument. So what is that we are going to do instead of solving the problem globally we are going to solve the problem locally and keep making local improvements with the hope that these local solutions and local improvements will ultimately eventually lead to the global solution. So in here we have converted a nonlinear problem to an associated linear problem by invoking to the first order Taylor series using Jacobian. So now if you go back to our previous slide my Ffx consists of Hfx Hfx so what is that we are going to do we are going to replace these Hfx in equation 1 I will come back here. So now replace Hfx in equation 1 by the right hand side of 4 the right hand side of 4 is simply a linear approximation to Hfx. So if I did that I am going to get a function q1 of X q1 of X is an approximation to my Ffx in a small neighborhood. So Ffx is a global function q1 of X is a local approximation to the global function you can readily see q1 of X is given by a linear part and a linear part multiplied together. So this is a quadratic approximation. So this is a quadratic approximation where Gfx is equal to G-Hfx instead of writing Z-Hfxc I am so Gfxc will be Hfxc so G is a change of variable for Z-Hfx. So this quadratic approximation of Ffx in a small neighborhood around X of c. So what is that we are going to now look at we are going to be looking we are going to be looking for minimising q1 of X. If I want to be able to minimise q1 of X I can readily compute the gradient I can compute the Hessian of q1 of X. Again by the results in the module relating to multivariate linear algebra the gradient of q1 is given by these expressions the Hessian is given by this expression. Now please recognise the Hessian is the transpose of the Jacobian times the Jacobian evaluated the point c this looks like H transpose H. So there the inverse of it exists if H is full rank here the inverse of it exists if the Jacobian is a full rank. So if the Jacobian is a full rank then the Hessian is positive semi definite. If the Jacobian is full rank and the Hessian and the Hessian is positive semi definite the equation to the solution of the equation the gradient setting it 0 in 7 must give the optimal or the minimum solution. By setting the right hand side of 7 equal to 0 I do not have I do not have to say 9.7 it is simply 7 by equating the right hand side of 7 to 0 we get the minimiser of q1 as a solution to the normal equation the solution to the normal equation is given by this matrix times this is equal to this matrix times g. Now I would like you to look at this structure this structure is very similar to H transpose Hx is equal to H transpose z that we saw in the linear case there we got the global solution here I am getting the local solution local solution. So by solving this and this matrix is symmetric positive definite so it is non singular so I can express x-xc is equal to the inverse of this matrix times the right hand side. So by solving this I am going to get x-xc I originally got xc so originally I started with the point xc I went to a point x I went to the point x so if I add these two together I go from xc to x the optimal x that minimises the linear approximation so my new operating point is equal to the old operating point so this is the old operating point plus the increment plus the increment that gives you a new operating point then I repeat the same show around the new operating point which is xc new so now I will consider another small neighbourhood around this new neighbourhood I will consider another x I will go from here to here so I went from here to here then here to here then there to there so by moving sequentially from operating point operating point operating point I am moving towards the global solution this whole process is repeated from the new operating point until a suitable convergence is obtained. So what is the key here I am converting a difficult I am converting their difficult problem of solving a non-linear least square problems globally to a sequence of simple linearly square problems we already know how to solve linearly square problems so the trick is if you know how to solve one problems very well I can convert other problems to one that I know how to solve and using the algorithms for solving the linear problem I can continue to solve the non-linear problems iteratively so you can see the difficulty of non-linear problems essentially comes from our inability to look at the global solution at one junk in one shot I am trying to build the global solution by sequence of local solutions thus far we saw the first order approximation now I am going to go to the second order approximation just to be able to tell you if my function hfx is strongly non-linear what does it mean if it involves logarithmic functions exponential functions trigonometric functions are fractional powers of fractional powers of different quantities of interest for example what is one typical non-linear function in the case of satellite meteorology energy radiated is equal to alpha to the power t to the power of 4 the temperature that is a very strongly non-linear function the energy radiated is proportional to the fourth power of the temperature so if I so in this case z is equal to so this is equal to h of t so this is equal to h of t t is the state variable so x plays the role of t and z is equal to h of t in this case very strongly non-linear in this case linear approximation will not come very very handy linear approximation has lot of errors in order to be able to improve the quality of approximation I am now going to go from first order to second order terms again you can readily see all these things are related to first order Taylor series second order Taylor series for vector valid function of a vector is all the things that we have already covered in the module on multivariate calculus so in this case in addition to the first order term I am going to have a second order term the second order term addition second order term improves the accuracy of the Taylor series approximation the second order term depends on the Hessian so this second order term is given by a vector please understand h is a vector h of x c is a vector this df h of x c is a Jacobian matrix evaluated x c y is a vector so this is a vector this is again a vector the vector is given by now look at the following now you already know h is a function which is which is of h1 h2 hm transpose so you take h1 computers Hessian that is the matrix there is a quadratic form with the Hessian of h1 this is a quadratic form with the Hessian of h2 is a quadratic form with respect to Hessian of hm 1 over half the half comes from the second order term Taylor series coefficient so this whole vector will go in here and how do we express this I would like you to be able to think of it like this this is x of c this is x the difference between them is y is y so y is the distance between the current operating point and any other point x in the neighbourhood of it so y is equal to x minus x c that is a vector that is a vector in Rn del square h of i is the Hessian of the ith component hi so the Hessian of the ith component of hi so with this I get a reasonably good expression for the second order Taylor series for h of x and that is given by 11 look at the notation could be little little little complicated first some of us who are not familiar with dealing with Taylor series expansion so it is very imperative we understand the Taylor series expansion for multi vector valid functions of vectors in order to be able to get the complete total understanding in here what is happening around the current operating point x c now what do we do so we are going to do the same thing so I got a I got an approximation for h of x and 11 so I am going to substitute 11 and 1 to be able to obtain a new approximation for h of x again we are dealing with approximation so this is one term this is another term now what is y please remind yourself y is equal to x minus x c x c is known so I can recover x if I know y so I am simply talking about the the the increment y so so f of x is now expressed in terms of y again g has the previous value g z minus h of z z minus h of z so expanding the right hand side the right hand side has now three terms if you multiply the whole thing you get an approximation in terms of I would like to call it y sorry that that is not that is not x it should be y so we expand the term if you expand the term look at this now this is the g is a constant term this is the first order term this is the second order term in y this is a constant term this is a first order term this is a second order term so each one of the factors are quadratic functions if you multiply two quadratic functions you are going to get a fourth degree term in the components of y so what do we do we expand but keep only the second order terms in y so what is mean what do you mean by second order terms second order or the second degree in the components of y there will be third degree term there will be fourth degree term we are going to neglect this third degree term in the fourth degree term why we are allowed to neglect the third and fourth degree term if x is close to xc h is y is small if y is small y square is smaller y cube is even smaller y to the power of 4 is even much smaller so we are simply invoking to the order of magnitude scaling process involved in here so we are only going to keep the dominant term up to the second order we believe third order term and fourth order term are essentially essentially very small so by keeping only the second order approximation I get an approximation of f of x as q2 of y q2 of y is given by these terms I would like to look at this term for a minute this term is of degree 0 the first term is of degree 0 the second term is a linear term the third term is essentially a quadratic term look at this now the third term this is quadratic in y and by the definition of second order term that is also quadratic in y the sum of two quadratic terms is a quadratic terms so q2 of y is quadratic q1 of y was also quadratic but in the in the case of q1 of y I did not have this term so this is the new term that comes into play if I use the second order approximation so this is new so that is why we are going to call q2 as a full quadratic approximation and q1 as only a partial quadratic approximation that is simply a mathematical fact that comes out of this analysis so if you drop the second order term you q2 becomes equal to q1 so that is the important nesting that we have to look at that so quadratic approximation is obtained by simply adding a second order term which is the last term in equation 15. So that is summarized in the following discussion q2 differs from qi with the addition of the fourth which is the second order term on the right hand side of 15 so I had f of x h of x by a second order Taylor series I substituted that I will get second degree term third degree term fourth degree term we dropped the every term larger than the second degree so q2 y is the total or full quadratic approximation of f of x. Now the problem becomes very simple I have a quadratic function I would like to be able to compute that there are the gradient of this quadratic function in 15 and that I do in stages I am considering q g transpose d square h of y this is the term that we have added is a new term that comes into q2 that was not in q1 this term if I expand it so g is a vector g is a vector d square is a vector I am talking about inner product of two vectors the inner product of two vectors is given by the sum of gi times the quadratic formula hence q2 of y now can be replaced by this constant term first degree term one second degree term and the second second degree term so you can readily see the quadratic function coming in here you can also see the quadratic function coming in here these two are quadratic that is the linear therefore I can compute the gradient of q2 again we are invoking to the module on multivariate calculus the gradient of this gives raise to this the gradient of this gives raise to this the gradient of this gives raise to this and I am going to equate this gradient to 0 to get my solution I also get the Hessian of q2 this is the gradient of q2 so I have computed all the required quantities in order to be able to compute the solution I simply need to be able to set the gradient to 0 I am I am I am going to have to set the gradient to 0 the setting the gradient to 0 gives raise to gives raise to a linear system where the system matrix is given by this so this is like a y is equal to b where this is the matrix a this whole thing is the vector b and I am going to solve for y now I would like to look at this matrix so this is the Gramian that comes out of the Jacobian this is the term that comes out of the Hessians of the components of that gis are the constants the gis as your if you recall g is equal to z- hfx so gis are constant so this is each one of these are matrices so this is the linear combination of matrices multiplied by gis so this is the matrix I can solve this matrix equation the solution of this matrix equation is going to give me a y least square that least square solution is in will indeed if that solution will indeed be a least square solution provided provided this Hessian term is positive definite provided this Hessian term is positive definite so this is essentially another way of looking at approximations to the the non-linear problem using second order approximations again we are going to we are going to solve 20 for y if I solve for y please remember y is equal to x- xc once we have x- xc we can compute we can add x- xc to y that gives you a new operating point so here again I am going to go sequentially I have xc then I am going to xc new operating point from here I am going to go to another new operating point we can call it new new and the conversion goes on so I am solving a sequence of local minimization problems by using clever partial quadratic form approximations or full quadratic form approximation to the function f of x so the entire process of second order approximation is repeated until suitable convergence is obtained how do I say suitable convergence when do I say the convergence has occurred if the norm of y you compute y if the norm of y is less than a pre-specified epsilon and what is that epsilon the epsilon could be 10 to the power of minus d and what could be that that could be you can set the criterion anyway you 0 1 0 0 1 or 0 0 0 1 or 0 1 these are all the typical values of epsilon 1 could utilize and so if d is large your approximation is better if d is small the approximation is screwed in some problems if the model is not a perfect model it is not worth worrying about exact solutions you can afford to get reasonably good neighborhood but not nearly exact dc could be need not be too large in such cases so it all depends on how well you believe your model is how well you believe your method should be your method need not be more accurate than the model so more accurate solutions are needed only when the model is more precise I want you to be able to think of this consistency between goodness of the model versus goodness of the solution that you obtain by virtue of data simulation so the idea behind these two approximations is that the solution can be obtained by solving linear system with a speedy matrices using the normal equation approach so what is the basic idea here we developed the expertise in solving linearly square problems once we have a good expertise in solving linearly square problem we are really extending that expertise to solve nonlinear least square problems by approximating the nonlinear function using either the first order Taylor series or the second order Taylor series so that is the idea so understanding linearly square problem is fundamental and if you do and if you understand it very well if you have developed programs to solve the linear least square problems you can readily apply them to solving nonlinear problems but nonlinear problems are not solved in one shot they are solved repeatedly so it is a sequential approach to solving nonlinear problems using a sequence of linear problems with this we come to the end of this module there are a couple of different exercises these are very useful exercises I want to emphasize couple of these FFX is a very simple function I would like you to consider X of C as a starting point compute the first order and the second order approximations for FFX around X of C find the gradient in the hessian of FFX at that point draw the contours of FFX around X of C also draw the contours of the first and second order approximations around X of C you can see how these contours approximate the solutions the problem as we progress from one operating point to another operating point to another operating point the you can you can understand and appreciate the progress of the local solution towards the global solutions with that we come to the end of the discussion of solving nonlinearly square problems thank you.