 In the previous lecture we introduced the fundamental principles of 4D war algorithm which is also called the first order joint method using a linear deterministic dynamical system as a strong constraint. We solved the constraint minimization problem using the Lagrangian multiplier technique. The whole aim of the exercise is to be able to compute the gradient of the cost functional in the context of the linear equality constraint and once we computed the gradient numerically we then have to use the gradient in an optimization minimization algorithm such as gradient method or a conjugate gradient method. So since we have seen conjugate gradient method as well as gradient method how they work we did not elaborate on that part of the problem that is the one half the bottom half of the problem the top half of the problem relates to computing the gradient itself. So 4D war adjoin method are essentially methods for computing the gradient of the cost functional with model as a strong constraint that is the deal. Now I am going to provide an extension in the previous case we considered linear perfect model deterministic we considered noisy observation observations where linear functions of the state we remember we considered 4 cases model can be linear non-linear observation can be linear non-linear both being linear is the easy case both being linear is the more complex case now I am going to go over to the non-linear case when I say non-linear everything is non-linear model is non-linear observations are non-linear functions of time observations are noisy observation so using non-linear noisy observations I have to determine the initial condition based on a given set a finite number of indirect information about the states. So given that I have the non-linear model deterministic initial condition is x naught m is a model map m of x is m1 of x m2 of x mn of x so m is a function from Rn to Rn is the vector valued function of a vector observations are non-linear functions h is the forward operator again h is a map from Rn to Rm so h of x is also is equal to h1 of x h2 of x and hm of x transpose Rk is the observation noise we are assuming white what do you mean by white vk so if I consider 2 instances in time k1 and k2 I have vk1 I have vk2 I have vk2 vk1 vk2 are uncorrelated that is what white means white means the temporarily uncorrelated but at a given instant the covariance the matrix covariance matrix for the noise is given by Rk so vk is normally distributed with mean 0 and Rk as the covariance what do you mean by mean 0 mean 0 means the instruments that measure the states do not have any bias that means if there is a bias in the instrument I have already calibrated the instrument against known standards I have I have I have subtracted the bias therefore observations are bias free observation noise are mean 0 the observational covariance Rk Rk is at a given time and observations at different times are temporarily uncorrelated these are the standard assumptions I want to make a comment if the observations are temporarily correlated the analysis becomes much more complex therefore in trying to solve the problem we would like to be able to make assumptions that are necessary to make the problem non-trivial if you keep adding various levels of assumption the model the problem statement becomes so difficult we may not be able to mathematically solve it in a meaningful way so what is the trick in modeling you make all the necessary assumption to make the problem interesting but at the same time not too complex so if you stand if you analyze the standard versions of the problem you get a feel for the solution based on which we can then tag on other assumptions other conditions to further explore the impact of assumptions like if what if the observation noise is actually correlated and so on so the cost function is again j of x not it is given by z minus h of xk transpose Rk inverse z of zk minus h of xk so this is the forecast z of xk is the forecast error so this is a sum of weighted sum of squared errors the problem is to be able to minimize j of x not where x not is related to x a xk is related to x not through the model dynamics again it is an implicit minimization problem in the linear case we could replace xk implicit explicitly by xk is equal to m to the power of k x not in the non-linear case it is much more difficult so we need to be able to handle this case with little delicacy which with little bit delicately not delicacy with little bit delicately I am sorry delicately so while we could use the Lagrangian multiplier method we illustrate an alternate strategy called the adjoin method I would like to distinguish between adjoin equation and adjoin method the adjoin method is a class of algebraic methods by which I can compute the gradient rather explicitly in a beautiful way so first perturbation analysis of the model is key to the adjoin method that we are dealing with so let x bar not be a base state a base initial state from that I can generate a trajectory which is called the base trajectory if you give me the base trajectory I can compute h of xk bar as the model predicted observation the actual observations are zk if zk is very close to h of xk then xk bar arising from x not is pretty good and x not bar could be a decent initial condition to start the forecast from but in general but in general sk not x z of xk may not be equal to h of xk z of xk may be large zk minus h of xk is equal to ek and ek will be far from being small if ek is far from being small that means I have a large forecast error our aim is to be able to annul the forecast error or reduce the forecast error the forecast is xk bar the forecast was generated from x not bar so how do I and h is known fixed zk is known fixed so how do I reduce the forecast error the only way to reduce the forecast error is to xk bar but how do I change xk bar xk bar is the state of the model at time k but xk bar depends on even the model equation is fixed so I would like to remind you xk bar is equal to m of xk minus 1 so that is the model equation so starting from x not bar I calculated xk bar so look at this now my aim is to minimize or reduce the forecast error ek the only way to reduce forecast error is to change xk the only way to change xk is to change x not so x not is called the control element it is called control because by changing the initial condition x not bar I can change xk if I change xk I can hope to change the forecast error so with this in mind I am now going to change the initial condition by adding a correction delta x not x to x x bar not so this is the old initial condition this is the correction or the perturbation term this is x not is the new initial condition so delta x not is called the initial perturbation so I am changing the initial condition from x bar not to x not now if I run the model forward in time from x not I will get x1 from x2 xk so what is the hope zk minus h of xk must be smaller than ek I have already computed if this can be made smaller by moving the initial condition then I am going the right direction to be able to reduce the forecast error so here is the idea of basic perturbation analysis so let me summarize the only way to reduce the forecast error is to be able to change the model solution the only way to change the model solution for a given model is to change the initial condition I am assuming everything else in the model are fixed so by changing the initial condition means I am going to add a perturbation to the initial condition so this is the base trajectory this is the base trajectory this is the perturbed trajectory I would like to be able to find an optimal perturbation at an optimal increment in the initial stage such that zk minus h of xk is small small in what sense in the sense of least squares so I would like to be able to minimize the norm the weighted norm of zk minus h of xk where the weighted norm is rk inverse the square of the norm so what is this this is the sum this is the weighted sum of squared errors at a given time and by if I sum it up over k is 1 to n I get the total errors coming forecast errors coming from all the observations so I would like to be able to decide how I should perturb the initial condition to be able to to be able to annihilate the forecast errors at specified insertion time so that is how we can look at the 40 bar problem I hope I hope the statement and the view of the problem is clear right now so I am now going to be talking about first order perturbation theory this is this is all this is also called variational analysis that is why the notion of 40 bar variational method came into being so first order perturbation analysis how does it go let x not be the initial let x not be the initial x not bar be the initial condition this is the base state given by 20 the base trajectory I am going to perturb the initial condition I am going to get the new initial condition by adding a perturbation to the this is to x not bar I am sorry this is x not bar so I can now compute the perturbed trajectory the perturbed trajectory is given by the equation 21 so the base trajectory is given by 20 the perturbed trajectory is given by 21 now what is the whole idea let me go back to the picture now if x not is the perturbation at time 0 let x 1 be the perturbation at time 1 delta x k be the perturbation at time k delta x not is the first perturbation I am giving up at the initial condition that induces a perturbation at time 1 that induces a perturbation at time k so what is our first goal our first goal is to find out how delta x k transforms to delta x 1 how delta x 1 transforms to delta x k in other words I am interested in understanding the dynamics of propagation of the initial perturbation through the model so that is the first task. So in trying to write the dynamics of perturbation let us consider how delta x 1 is a relative to delta x not please understand delta x 1 is the induced perturbation at time 1 induced by the initial perturbation delta x not at time 0 to this to compute delta x 1 from delta x not we are going to invoke to the first order Taylor series we have already talked about first order Taylor series for maps in one of the previous modules. So delta x 1 by definition is equal to the perturbed state at time 1 with the unperturbed state at the base state at time 1 x 1 is m of x not x 1 bar is m of x bar x not bar but x not we already know x not is equal to we already know x not is equal to we already know x not is equal to x not bar plus delta x not. So that is what delta x not is all about I am now going to apply the first order Taylor series expansion for the first term when I apply the first order Taylor series expansion I get m of x not bar plus the Jacobian of m at x not times delta x not minus m of x not that first and the third term cancels I get delta x 1 is equal to the Jacobian of m at x not times delta x not please recall the Jacobian is a n by n matrix Jacobian of m is n by n matrix it is a partial derivative of the ith component with respect to this I must say the following this is x I am sorry this is not m this is x x of j not so x not of j is the jth component of the initial condition m i is the ith component of m so i varies from 1 to n j varies from 1 to n so that is the Jacobian of m at the initial base state. So if you know m I can compute the Jacobian I can evaluate the Jacobian therefore you can readily see equation 22 that relates the perturbation at time 1 to the perturbation at time 0 is a linear relation. So even though the model is non-linear the perturbation dynamics is linear that comes out very easily from equation 22. In general let delta x k be delta x k plus 1 be the perturbation at time k plus 1 at time k plus 1 this is equal to x k plus 1 minus x bar k plus 1 this is m of x k m of x k plus x bar k now please remember x k to a first order approximation is equal to x bar k plus delta x k so that is essentially this then you have the m of x bar k by applying the first order Taylor series you get this. So now you can see this is the perturbation dynamics this dynamics in mathematics is called variational equation this is also called perturbation equation but in meteorological circle they have a very different name called tangent linear system they call it TLS. So TLS is a linear system it is a non-homogeneous system because the Jacobian matrix varies along the base state so even though this is the matrix the matrix changes in time therefore you can try to relate delta x as this is equal to a k times delta x k where a k is the Jacobian at x k of yeah so a varies with time therefore this is a non-homogeneous system is a linear system it is a non-homogeneous linear dynamics that gives you the evolution of the perturbation this is the first order perturbation equation. By iterating 23 so by iterating this equation you can really see delta x k is equal to the product of Jacobians along the trajectory the product of Jacobian along the trajectory times the initial perturbation I am now going to concoct a new solution a new notation d subscript k minus 1 colon 0 to m essentially represents in this product this is the product of k matrices going from time 0 to time k minus 1 so with this notation you can readily see delta x k the perturbation at time k is related to the perturbation at time 0 through the product of Jacobians the product taken along the trajectory starting from x 0 to x k minus 1 this matrix is the product of Jacobians along the base trajectory starting from x 0 bar to x k minus 1 bar notice the order of multiplication please remember matrix multiplication is not commutative therefore you have to take great care in trying to when you want to involve matrix matrix multiplication you have to worry about the order and be sensitive to the order now come back to so what is that we have done so let me quickly summarize we stated the problem we saw the problem of forecast correction is equal to changing the initial condition change the initial condition is equal to adding perturbation to the initial condition so we were interested in trying to quantify how an initial perturbation propagated through the model in time so we have simply the original model equation we also have the forward dynamics which is the tangent linear system that describes the evolution of the perturbation forward in time that is what we have accomplished so far sorry I am going back instead of going now I would like to come back to my J of x 0 the thing to be minimized please recall J of x 0 is given by this summation this is given in equation 20 so equation 20 when you multiply that all the things now becomes equation 25 25 is simply a re-return version of 20 now I would like to change the notation to make the application of multivariate calculus lot easier so let me call the linear term let me call the linear term like this let me call so A is RK inverse ZK so from module 4 I do not think it is module 4 the number is slightly different I will compute the number later recall from one of the modules in multivariate calculus which is to point something I am now concerned with what is called the first variation you remember the notion of first variation the first variation is simply the inner product of the gradient and the variation XK so the first variation of A transpose H of XK is the gradient of A transpose H of XK times delta XK the A transpose XK is given so its gradient is given by this equation A by substituting this is given by this equation therefore the first variation of the linear term is given by 27 so what does first variation means I would like to be able to talk about that for a moment before we go further I have A transpose H of XK so if I change XK to XK plus delta XK and if I compute the difference between this and that term and that is what is called the difference between A transpose H of XK plus delta XK minus A transpose H of XK that is called the first variation actual difference this difference is equal to the gradient of A transpose H of XK times delta XK that comes from the basic definition of the notion of first variation. So how is the induced variation affects the linear term and that is going to be affected by this where DH is here DH is the Jacobian of the forward operator H I hope this idea first variation is very clear why am I talking about the first variation look at this now delta XK depends on delta X naught I have already related delta XK to delta X naught through the tangential linear system. So if I can relate the first variations of each of these terms with respect to delta X naught of delta XK sorry I have already related delta XK to delta X naught so I can relate the variation of delta X naught to the initial variations so that is the basic principle that is at work here. If I consider the second term if I consider the second term the second term is a quadratic in H from the module on optimization from the module on multivariate calculus we already know the first variation of this quantity with respect to XK is given by the right hand side of 28. The first variation of ZK transpose RK inverse ZK is 0 because it does not depend on XK so substituting the whole thing we can readily see substituting 25 through 29 in 25 the first variation in the cost function induced by the first variation in the cost function induced by delta X naught is given by this expression in equation 30. Now, how do the right hand side depends only on delta XK but the left hand side I am talking about variation induced by delta X naught through the tangential linear system I already know delta XK is related to delta X naught so combining tangential linear system and 30 I now know how to relate the first variation in J to delta X naught so that is the line of arguments again I am going to define to simplify notations F of K is equal to the forecast error this is the forecast error this is the normalized forecast error this is the forecast normalized forecast error viewed from the model space the same F of K as we have used in the previous case. So, I can rewrite the equation this equation 30 succinctly as the variation in the variation sorry the variation in delta J induced by delta X naught to the sum of the variations induced along the path all with respect to delta XK K running from 1 to recognizing the fact delta XK and delta X naught are related through the tangential linear system and that is 32. So, this is the first and the most fundamental relation in a joint approach to 30 y. So, let me go back to the picture to illustrate it little bit further I think it is very useful so I have when I change from X bar naught to X naught my J function changes. So, originally I had sorry I originally I had J of X naught bar now I have J of X naught the difference between the two the difference between the two is delta J this delta J is induced by delta X naught therefore I would like to be able to talk about the induced change in J at a given time through the initial condition and that is what we have a complete so far through this variational analysis through the tangential linear system if I wish. So, that is what we are now I would like to remind you that delta XK is given by the tangential linear system in 24. So, let me go back to 24 remind you what is the so 24 essentially relates delta XK to delta X naught through the product of the Jacobians. So, 24 so using 24 I am changing dependence from delta XK to delta X naught let me come back here. So, I am going to substitute for delta X naught here if I did that this is the expression I get that is the expression I get now you can really see delta J induced by a change in the initial condition is related directly to the change initial condition itself everything becomes explicit. Now, we are going to go back to matrix algebra to define the notion of an adjoint operator adjoint. So, if A is a matrix the inner product of AX comma Y the inner product of AX comma Y by definition is AX transpose Y which is equal to X transpose A transpose Y which can be written as X comma A transpose Y. So, that is what this relation is therefore, AX the inner product AX with Y is the same as the inner product of X with A transpose Y this is the so A transpose is called the adjoint of A the transpose of a matrix is called the adjoint of A. So, that is the definition of the adjoint of a operator or a matrix A. So, using this adjoint property I can now apply to 33 so you can think of 33 as F of K comma A of K delta X naught which by adjoint property. So, let A of K be equal to this matrix let that be equal to A of K therefore, by applying the adjoint property in the reverse way I can essentially say this is A K transpose F of K delta X naught this is delta X naught. So, A K transpose is D of K minus 1 colon 0 transpose F of X K so by adjoint by applying the adjoint property 34 to 33 I get this relation this relation is one of the most fundamental relation the method is called variational methods because I am talking about variations induced by the initial variation I am also talking about this adjoint method because I am using the adjoint property of matrices. So, is adjoint and variational schemes all put together it uses many number of interesting properties of the adjoint from matrix theory recall some of the basic things from again linear algebra. If I have the inner product X 1 plus Y plus X 2 plus Y that is equal to X 1 plus X with Y if I have X is inner product X and Z is equal to inner product Y Z for all Z then that implies X must be equal to Y it comes very. So, first is a linearity property second is the property there essentially tells you if the inner product X with Z is equal to the inner product Y with Z for all Z X better be equal to Y. So, that is the basic property by applying the first property the sum of the inner product is equal into the inner product of the sum. So, the inner product of the sum with X naught so I have applied the first property the first property is sum of the inner product is inner product the sum that variables essentially. So, if I have a first variation is given to the inner product also from first principles from variational calculus we have seen in multivariate calculus delta j is equal to the gradient times delta x naught. So, I am now going to have two expressions two different expressions related to delta j and these two expressions are equal for all delta x naught delta x naught is an arbitrary therefore by applying the second property I can readily now conclude the negative of the gradient is given by the sum which is summation k is equal to 1 to n transpose of the product of matrices along the trajectory times f of k and f of k is the forecast error mu from the model space. That is the whole idea my idea is to be able to find an expression for the gradient and we have achieved it. So, instead of using Lagrangian multiplier and equating the gradient we have walked the path of being able to use the variational analysis the variational analysis is very intuitive in the sense that the only way to be able to correct the forecast error is to change the initial condition change the initial condition is equal to perturbing the initial condition if I perturb the initial condition the initial perturbation propagates through the model. So, if I can quantify to a first order accuracy the propagation of model errors or model perturbations along the trajectory I have handle on one of the issues to be tackled using tangential linear system. The other part of the story is to be able to compute the induced first variation in the cost function itself then we relate these two first variations in trying to relate this we use several properties one is the joint property another is the linearity property another is the property where the inner product of x and z and y and z are the same for all z x is equal to y all these three basic properties are intimately related and they are applied and that essentially gives you an expression for the gradient which is given in 39. Now look at that everything on the right hand side of 13 is computable f of k is computable the Jacobian of the model is computable but what is the only thing that we do not like it involves product of Jacobians Jacobians are matrices matrix matrix product is a no no so you never multiply matrices unless that is what you want you never invert a matrix unless that is what that is what you want these are some of the golden rules in numerical analysis in numerical in linear algebra because these are two expensive operations. So in theory you may involve matrix matrix multiplication but when you practice you should avoid as much as possible whenever wherever possible. So even though we have gotten x 39 for the gradient still we are in the same boat as we were in the linear analysis we still involves matrix matrix multiplication we want to avoid matrix matrix multiplication matrix vector multiplication is cheaper than matrix matrix multiplication let me talk about it for a moment now. If I have two matrices if I want to multiply c is equal to a b the cost is o of n cube if I have y is equal to a x when a is the matrix x is a vector this matrix vector computation is o of n square o of n square is much cheaper than o of n cube. So whenever there is a matrix matrix operation avoid it whenever you can get away with matrix vector multiplication instead of matrix matrix multiplication introduce them. So now we are going to rewrite the computation of 39 without doing any matrix matrix multiplication but involving only matrix vector multiplication very intelligently. So we are going to talk about in a very efficient way this way is essentially the joint equation we already saw so the rest of the problem rest in computing 39 efficiently essentially involving only matrix vector multiplication to bring this to bring the structure of the expression 39 let us assume n is 4 just to be able to look at how this one comes up this is in here we have involved a product of two Jacobians in here we are involved with product of three Jacobians in here we are involved with product of four Jacobians that is how the terms come in. So please understand these Jacobians these are all product of Jacobians from the computational point of view these Jacobians if you did it like this that is deadly. So 41 is simply an explanation but 41 while it is meaningful to interpret it is computationally disastrous but to understand the kind of challenge we have I have simply expressed 39 in a simple form of n is equal to 4 and expresses 40. So you can see how much of computation involved in computing the gradient. Now how do we avoid a matrix matrix multiplication and introduce matrix vector multiplication that is done very cleverly by what is called that joint equation. So let us look at 41 so in this case the gradient is given by the sum of these products sorry the sum of these products. Now look at the beauty of the structure of this now we are going to successively expand the matrix matrix multiplication involved in here recursively. So this is the structure I want to be able to compute efficiently in order to compute this efficiently again I am going to give an example let lambda 3 so I am going to introduce some new variables these variables lambda are related or somewhat related to the Lagrangian multiplier that we have seen earlier. So let me set lambda 4 is equal to f of 4 lambda 4 is equal to f of 4 let us say lambda then I will compute lambda 3 to be equal to d3 transpose lambda 4 plus f3 so that gives raise to this formula lambda 2 is equal to d2 transpose lambda 3 plus f2 that give us to this expression lambda 1 is equal to d1 transpose lambda 2 that gives raise to this expression you can see that is delta action. Therefore when I have only 4 observations I can compute the gradient simply by involving matrix vector multiplication what is the matrix multiplication look at this now lambda 4 is f4 there is no multiplication to compute lambda 3 this is d3 transpose m is a matrix lambda is a vector. So that is matrix vector multiplication addition of a vector 1 matrix vector multiplication addition of a vector 1 matrix vector multiplication addition of a vector. Now if I did this like this involving only matrix vector operation you can see by iterating backward in time I have gotten the explicit expression for the gradient. So what does this suggest this suggests that we can avoid matrix multiplication by introducing a backward recurrence relation much like that joint equation that we had earlier and that is the essence of the story. Therefore generalizing so I have given you a specific example to be able to get a handle on the problem with n is equal to 4 for a general n I am going to define lambda n is equal to f of n I am going to define lambda k using dk transpose m lambda k plus 1 plus f of k. So compute from lambda n to lambda 1 if you compute lambda 1 then you can come you can verify that delta the gradient of j x0 with respect to x0 is given by minus d0 transpose m delta of lambda 1 if mfx is equal to mfx that means if the model is linear you can see this equation reduces to the one that we did in the Lagrangian multiplier technique. Therefore this relation is a generalization of the backward adjoined dynamics that we considered for the Lagrangian multiplier technique that is why this is called adjoined again adjoined dynamics. This whole idea of using first the propagation of the perturbation adjoined principles from matrix theory and combination of adjoined dynamics within the context of nonlinear system provides a very simple and an elegant means of computing the gradient of the cost function with respect to the initial condition once the gradient is available we can again use it in a general gradient or conjugate gradient method. So here is the summary the 40 watt algorithm for the nonlinear case start with x0 compute the nonlinear trajectory only for the sake of illustration we use the base state perturbation state and other things now that we have understood everything I am now going to encapsulate all the analysis into a form of a algorithm this is called 40 watt algorithm for the nonlinear case start with the initial condition start with the initial condition x0 compute the nonlinear trajectory from the model then compute the Jacobian and evaluate the Jacobian along the model please understand computationally is it could be it is expensive why is that each Jacobian is a n by n matrix there are n square elements. So I have to evaluate the n square elements of the matrix at each of the n instances in time so that is going to be a pretty demanding computationally so the question comes in do you compute these matrices and store them that is a no no why storing these matrices which requires n square space when n is of the order of million there is no space in the world in computers to be able to store all these matrices for all time so what is that you do you have a general expression for the matrix someplace you evaluate that matrix on demand whenever you need the value at a given state. So there is a notion of a space time trade off involved in implementing these and one has to be very clever in not to burn the space requirement by storing too many things but I also want to recognize if you can store few things you can speed up the computation but the demands of the space becomes too large it makes no sense because it will eat up all the memory and then some and you may not have much space left for doing anything else. So great caution must be exercised in trying to be able to evaluate you have to evaluate the Jacobian the question is should I store them or not you probably cannot afford to store them you simply compute them on the fly and use them whenever you want wherever you want that is the basic idea. So once the Jacobians are computed I can compute f of k f of k are vectors say f of k can be stored matrices expensive f of k is it could be a million million million by vector million by a million vector is easier to store than million by million matrix. So that is a trade off between trade off between storage and time and now what that I need to do I need to be able to start the backward dynamics starting from f n is called lambda n and lambda k is equal to transpose of the Jacobian at time k times lambda k plus 1 plus f of k. So once I have lambda 1 I can compute the gradient by this simple formula in step 5 once I have the gradient I can compute the optimal x naught by simply repeating this until convergence using a minimization algorithm the gradient algorithm or the conjugate gradient algorithm. But one thing I want to remind you the j function that is being minimized is not a quadratic function I would like to be able to bring that part of the argument very carefully I want you to appreciate that the j of x naught let us go back yeah j of x naught is given in 20 h of x is a non-linear function so therefore if you multiply both sides it is not quadratic it is much more non-linear than quadratic. So if the function j of x is not quadratic but much more non-linear than quadratic what does it mean the performance of the gradient algorithm and conjugate gradient algorithm are guaranteed only on quadratic functions the function that is of interest here is not yeah is not a quadratic function therefore the performance of gradient algorithm or the conjugate gradient algorithm for the non-linear case model non-linear and the observation non-linear could take longer to minimize because there is no there are no general results guaranteeing convergence. So when to stop how far to go all these things are questions whether one has to content with in trying to decide once the in trying to decide what kind of stopping criterion one could use to stop the minimization process that is a problem quite apart from the theory that we have developed but I want you to understand that is an integral part of the overall development of the package if you are interested in trying to program this. So with this we have come to the end of the introduction to 4D var so let me quickly summarize of 4D var or adjoin method is essentially a solution of the inverse problem within the dynamic context given a bunch of n observations noisy observations that contain information about the state I would like to be able to determine the optimal initial state we pose this as a minimization problem is a constraint minimization problem this can be solved in one of two ways either as a strong constraint Lagrangian multiplier problem or by using variational methods and trying to understand and quantify propagation of variation through the model and the joint properties of a joint operators. So I have illustrated these two complementary methods of analysis by one using linear model linear observation another using nonlinear model nonlinear observation this covers pretty large area of techniques that is known for solving deterministic dynamic inverse problem where the model is deterministic and assumed to be perfect but the observations are noisy. We would like to we would like to encourage you to perform several computational related experiments I am going to go over this in some detail let us spend couple of minutes on this consider a very simple scalar model xk plus 1 is equal to a times xk consider an observation zk is equal to vk please remember the model is linear the observation is linear there is no h h is identity is the simplest of the possible model the model m is the matrix is simply a scalar a vk is the observation covariance which is sigma square I am going to ask you to make take it as 0.5. So my suggestion to you is to pick a is equal to 1 pick x0 is equal to 1 generate a sequence of state with n is equal to 50 generate observations from 1 to 50 by adding noise to that store this sequence of observations once you have chosen a set of sequence of observations now I am going to consider different observation sets 1 o1 I am going to consider assimilating all the observations another one I am going to consider assimilating only a subset of observations spaced 4 this 4 units apart and then I am going to ask you to assimilate even a further subset 1, 10, 20, 30, 40 and 50 so what is that we are trying to do we are trying to assimilate different subsets of observation smaller larger and the largest possible observation what is the idea I would like you to be able to perform the data simulation experiment with different observation sets and I would like you to be able to compare the quality of the recovered x0 initial condition. So in order to do that once you have developed the observation now I would like to be able to start the forecast mode to start the forecast mode I do not know what is let us forget about the states that are used for observation so I am going to assume I am going to start the forecast model with 0.5 I am still going to use a is equal to 1 I want to use each set of observations and estimate the optimal x0 using the procedure we talked about I would like you to in particular use the gradient method being the linear case the j function is quadratic if the j function is quadratic gradient method should work reasonably well so I would like you to discuss the impact of the num varying the number of observation and the quality of estimate this I would like you to do with one variance then you could once you program this you can then change sigma square we can make sigma square less sigma square more so you can discuss a variety of cases the impact of the estimate on the observation noise you can also discuss the impact of estimate on the number of observations so one can create quite a variety of computer related projects so this is a very good example that can be used to test your understanding of the basic mechanics of the 4d war this problem can be solved using Lagrangian multiplier technique so I would like to encourage you to be able to apply the grand general player technique to solve this linear problem I believe it will be a very educative exercise to do this pencil paper and program and plot and and and and discuss the results then I am going to give you a nonlinear dynamics which is which is which is again a very interesting dynamics this is called the logistic equation when the parameter is for this logistic equations is supposed to exhibit extreme sensitivity to initial condition the model actually exhibits chaotic behavior for any initial condition in this range so the previous problem is an easy problem this is simple this is a slightly more difficult problem but still I would like you to make your hands dirty by doing this start with x0 is equal to 0.5 generate a set of observations again I have given you observational covariance u sigma square is equal to 0.5 generate 20 observations then start the model from a wrong initial condition then assimilate all the 20 observations into the estimate and find the optimal estimate now look at that the observations were generated using 0.5 the forecast is used started from 0.8 so forecast will have the error we are going to use the 4d war method to be able to correct the forecast error in the nonlinear model so I would like you to utilize the joint method that you have described in solving this problem I hope you will spend enough time to be able to solve these two problems if you solve these two problem by applying this method first pencil paper and then program it you can say you understand 40 war methods with that we conclude our coverage of an introduction to solving inverse problems within the context of deterministic dynamic models with noisy observations thank you