 In this module we are going to be talking about another method that has been introduced way back in the early 70s called nudging. I am going to provide the basic principles of nudging and some of the associated questions relating to the design of nudging schemes. So what is nudging? The model is always used to create a forecast. The goal of data simulation is to make the model to fit the observations. So this fitting was done by looking at the sum of the square differences between the model predicted variable and the observation to decide on the values of the optimal values of the parameters and the initial condition from which we started the model forward. And that is one of the themes that underlie 40 watt data simulation or forward sensitivity based data simulation. Nudging is an alternative method. In nudging what you do? You compute the forecast error which is the difference between the model predicted observation and the actual observation. This error in the forecast is often used as a forcing. The forcing that makes the model move towards the observation, this ability to force the model by adding a force that depends on the forecast error is the fundamental idea behind the nudging scheme. To be able to nudge, to be able to force, to be able to coerce the model towards the observation, these words force, nudge, coerce essentially captures the fundamental principle that underlie this notion of nudging algorithms or nudging methods. A bit about early history, Anthos in 1974 introduced the nudging method. We used this for initialization of hurricane prediction model. It was published in Journal of Atmospheric Sciences in 1974. Hock and Anthos in 1976 further explored the use of nudging schemes again within the context of hurricane forecast and their paper was, joint paper was published in 1976 in monthly weather review. The idea is to use the model, the model forecast error to force the model so as to reduce the forecast error. So it's a kind of a feedback principle. So the model makes a forecast, observations are there, there's a forecast error. I'm using the forecast error to be able to force the model to be able to reduce the forecast error. This is the fundamental principle that underlie any feedback control mechanism. So it's the kind of a feedback control theory that is brought to focus by this nudging scheme within the context of data simulation methodologies. So today to get a feel for this nudging scheme, let us consider a state xk, a time k which is an rn, m is a map from rn to rn. The forecast model I'm assuming to be deterministic with x0 as the initial condition. The observations are again nonlinear function of the state, x bar be the true state of the system that is not known. I only have information about the true state through the observation, vk's are the observation noise, h is the forward operator. I would like to emphasize the notion of a true state and the observation noise is Gaussian. It's a standard setup. So what is the difference between this and the Kalman filtering scheme? The model is deterministic. So this has commonality with 4D war. In the early 70s within the meteorological literature, they considered the model to be perfect. So under the perfect model assumption, noise the observation, they would like to be able to use the forecast error to force the model which will in turn make the model move towards the observation. So let m bar be the true model dynamics. So I'm going to now develop a general theory. Let m bar be the true model dynamics. k plus 1 bar is equal to m bar xk bar with x0 bar as the initial condition be the true unknown deterministic system being modeled. If m bar is equal to m, the model is perfect. If m bar is different from m, the model has an error. So I'm now going to consider a generalization of the nudging scheme where I'm going to think that the model may or may not be perfect. So let m bar x be the model error. x, I'm sorry, m tilde x is equal to mx minus m bar x, that's the model error. x tilde 0 is equal to x of 0 minus x bar 0 is the error in the initial condition. So if I use m as the model to be able to generate the forecast xk, xk is the forecast generated out of the model m, m may have errors, zk is the observation coming from measurements of the real world. So ek is the forecast error as given in 4. Now what is the nudging scheme? Another and otherwise deterministic model, xk plus 1 is equal to m of xk, please understand. xk is the forecast starting from x0 on the model m, m may have errors, the initial condition may have errors. So the forecast xk generated out of the model equation may have errors. I would like to be able to add a forcing term. Please remember ek is the forecast error, g is the matrix. So g times ek is the forcing that is artificially added to the model. The forcing always makes the solution move towards a particular goal. Our aim is to be able to find g such that asymptotically the model state moves towards the observation which represents the true state of the model. So g is called a gain matrix, g is called the gain matrix, g is again an n by m matrix, g of ek is a vector that is an artificial forcing applied to the forecast model. The error term ek represents a state feedback. Why please remember ek is equal to zk minus h of xk. Therefore I am using the state information to force the model so that is what is called the state feedback. The idea of the state feedback has been around since the early days of steam engines. The gain matrix g in the early days was empirically designed. I am now talking about some of the early approaches to design of nudging scheme. The early approaches period from 1974 to 1990. During this period several people have applied nudging schemes to be able to make data simulation schemes. In other words you are trying to use the data to force the model or the forecast error which involves data towards the model. So the model response to external forcing depends on the intrinsic relaxation times of the model. So what is the basic idea? G ek is the forcing, g ek is the forcing to the model. If you apply a force to a dynamic model how long does it take for the model to respond to the initial force that is that depends on what is called the relaxation times. The intrinsic relaxation times in engineering we also call it time constants. So the design of g in the early days was essentially based on the time scale considerations. What is the intrinsic time scale of the processes involved? How long does it take for the model to respond to the external force? So the value of the matrix g was essentially heuristically decided and the considerations was essentially based on the time scales of the processes involved in the model solution. So the nudging scheme in phi has a strong similarity to the design of observers, the theory of the design of observers in control theory. The theory of observers was developed earlier by Luanberger in 1964 it is not very clear whether Anthos and his group knew about this work on Luanberger's work on observers but the observer based design as well as the nudging schemes had a very strong similarity structurally. So now you can see Kalman filters came from control theory, the notion of observers was already in existence in the early 60s in control theory and maybe it was invented independently by Anthos but I would like to emphasize a very strong similarity between the nudging scheme as used in geophysical literature as well as the observer designs in control theory. So you can see there is a great influence of control theory in the design of data simulation algorithms. So now I am going to talk about the post 1990 era, what is the question? So before 90 people were heuristically designing values of the gain matrix g to be able to force the model towards, the model state towards the observations was to reduce the error. The only consideration they used was based on relaxation time and that is all but in many cases it worked very well but as the theory of 40 war was well developed as the theory of optimal methodology was very well understood. The notion of being able to estimate the optimal state and the theory behind the strong constraint formulation, weak constraint formulation was well understood. Around the turn of 1990 the emphasis shifted towards trying to objectively design the gain matrix that essentially dictates the amount of forcing that was applied to the model equations. So two approaches emerged within this quest for optimal approach to the design of the gain matrix. One is the 40 war like methods, one is the 40 war like methods, another is the two stage Kalman filter like methods. The 40 war methods sprung up in the early 1990 to 1994, the two stage Kalman like method was announced around 2003. These methods developed algorithms for optimal estimation for g. So what did they do, well g is the known and they wanted to be able to optimally estimate the appropriate value of g. So they brought the full force of least square based estimation theory within the framework of 40 war or within the framework of two stage Kalman like methods. Even though these theories were known in 2011 it was pointed out that the claims of optimality that were obtained by these researchers were found to be a little defective. It turns out the optimal estimation of g is more subtle and involved than it appears on the surface when you read the papers on 40 war like methods as well as two stage Kalman like methods. We are going to talk about both the methods as well as some of the problems associated with the methods and ways to go around some of these challenges. So 40 war based methods Stoffer and Seaman, Stoffer and Bau, Zhou, Nawan and Lidemay where some of the earliest people who are working in trying to find the optimal value of g. So referring to the equation 4, referring to the equation 4 let me go back. Equation 4 at the bottom of slide 4 is the forecast error of the non-linear model. So 40 war based methods were introduced by at least three sets of authors Stoffer and Seaman, Stoffer and Bau, Zhou, Nawan and Lidemay in the early 90s. Referring to 4 which refers to the expression for the forecast error. Let e1, e2 to en be the set of forecast errors created by the model forecast or based on the model forecast. So in other words you can see like the 40 war like scheme here. So while the original lodging scheme involves essentially adding a forcing to the model and let the model evolve these people coming from the 40 war methodology their aim is to be able to design the optimal g. So they are doing an offline experiment. What is the goal of this offline experiment? Let us pretend that there are n observations. If there are n observations then there are n forecast errors. If there are n forecast errors then I can compute the least square cost function j2 of g which is given by e k r k inverse e the transpose I am sorry e the inner product of e k with r k inverse e k. With this this is the weight of sum of square errors. Please understand this is exactly the cost function that is used to minimize the to find the optimal initial conditions and parameters in 40 war as well as forward sensitivity method. You can see the approach because 40 war is something they know very well and they were part of the development of the 40 war techniques. So they would like to look at this scheme as though it were a 40 war scheme to be able to estimate g. Another difference is that in the classical 40 war they use this kinds of objective function to decide the optimal initial condition. In the nudging I am not going to worry about where the model starts. Initially the model may have forecast errors but as time goes on adding g e k forcing to the model will try to correct the model forecast errors while the model is in evolution. So that is the basic idea. In the 40 war we wanted to be able to start from the initial optimal initial condition so that starting from the initial condition the new forecast generated will match the observation as much as possible here that is not the goal. Initial condition could be anything our aim is to be able to simply move the model towards the observation as the model starts operating so they were able to understand that I may not want I may not be able to correct some of the initial forecast errors but in time the future errors may go to 0 or may become very small that is the idea. So you can see in the 40 war the j2 function was a function of if you recall we called it function of j x0 that is in 40 war in here we calling it j of g in nudging. So the independent variable with respect to which 40 war was with respect to which the minimization was done is the initial condition the independent variable with respect to which the minimization is going to be done in the optimal design of nudging matrices is g the elements of the matrix themselves g that is essentially the difference but rest of it are very similar so mathematics is not too different from what 40 war involves again they wanted to bring in the notion of background why this notion of a background is useful because in the pre 90 era they have been very successful in trying to demonstrate the usefulness of nudging scheme based on some empirical values of g so that is the knowledge they knew what recently well in many circumstances so they do not want to throw that knowledge out of the window so they would like to be able to use some of the prior knowledge where nudging had worked so they would like to be able to give the benefit of doubt so they said let g hat be the prior estimate of g obtained through empirical consideration relating to the relaxation times of the model so quite a lot of timescale analysis of the models have been done so they know which timescales respond to what type of forcing and that knowledge they did not want to go as a waste so they assumed well we will also want to take advantage of some of the earlier estimates let g hat be a prior information so you can see now they are combining a prior and the new information coming from the forecast errors they would like to be able to combine the two types of objective functions so the prior term gives raise to what is called a penalty term a penalty term is as follows j penalty of g is g minus g hat beta times beta over 2 is a square of the Frobenius norm you may recall from our example from our module on matrices the Frobenius norm square I am sorry I will simply say the Frobenius norm so the Frobenius norm by definition is equal to sum of a i j square sum is over i and j it is something like the Euclidean norm I simply take the sum of all squares of all the elements of the object if it is a vector there is one set of objects if it is matrices there is another sets of objects so it is simply the sums of squares and the square root of that so that is called the Frobenius norm so the penalty term regards is relates to the difference between the two so what does it mean I would like so what is the penalty term tells you I am interested in designing an optimal g I know they have been using g bar but g bar I am sorry g hat the g hat was obtained from heuristic consideration I am trying to design g optimally I am trying to design g optimally based on the forecast sums of forecast errors at the same time I do not want my g to be far removed from g bar because g bar had already worked so I want to find an optimal g that is a compromise between minimizing the sums of forecast errors at the same time going not too far away from g bar I am sorry g hat so we have defined the Frobenius norm in here so the constant b is called the penalty parameter if the constant beta is large since I am going to minimize the product has to be small so if beta is large g will be much closer to g hat if beta is small I am trying to relax the distance between g and g hat so by look by essentially picking the value of beta one can have a variety of different range of g with respect to g hat in other words I have g hat here do I want my g to be in a sphere of radius dictated by the value of beta so if beta is small it will be you have more freedom for g if beta is very large because if beta is large and the norm is large it will not come to minimum so if beta is large the only way to minimize it is to force g towards g hat so that g minus g hat will be much smaller so that is the idea of the penalty term so we have two terms now one I do not want my new estimate to be too far away from the old estimate that they have used based on heuristic consideration I do not and I am doing myself some freedom by being able to choose the penalty term at the same time I have the forecast error term some squared errors therefore I am not going to consider a new criterion which is q and j which is going to be a sum of j2 plus jp why I call it j2 because it is it is in a two-norm jp p for penalty term so compute the matrix g that minimizes q and g where the nudged dynamics is going to be used as a strong constraint please understand that that the nudged dynamics is the one that is going to be ultimately used in the forecast so then I cannot decide g or minimize this independently I have to find an optimal g within the context of the dynamic models therefore it is formulated as a strong constraint problem okay so we have we have now essentially formulated the problem so what is that we can do we can essentially develop the first order of joint method that was developed in 5.1 so we can use the first joint first order of joint method of module 5.1 to be able to decide on the optimal g and the optimal g so what is that we do we start we start with the g run the model forward in time compute the forecast errors compute the objective function and evaluate the gradient of the objective function and then once you evaluate the gradient of the objective function numerically I can use it in a gradient method so as to minimize the elements of so as to minimize the objective function q and g recall that a joint method gives raise to the gradient which is then used in some minimization algorithm until convergence and we have already talked about minimization algorithms in module 4.3 so by because we have done a lot of things relating to a joint methods relating to optimization methods our our our discussion becomes simpler because we do not want to repeat the entire derivation an interest reader can simply apply the methods and and and and and derive the up and derive the expression for the gradient so this was the theme of people who wanted to be able to estimate g optimally using 4d war like method but there are couple of philosophical challenges what is the first one getting the prior value of g hat may not be as simple as one deems with why is that if you change the model if you change the process for hurricane prediction the dynamics is of one type where the time scales are of one type if you change the model equation and and go from hurricane to other physical processes again the time scale analysis will be different therefore the choice of g hat depends very much on the process that is to be used or the model or the process that is captured by the model so in general there are no specific guidelines for choosing g hat so the only thing that we can fall back on is that for those processes for which nudging method have applied in the early years you have a very well defined g hat but in general there is no clear cut algorithm for generating g hat so the difficulty with respect to getting the prior value is one problem to appreciate the second difficulty and this is a more serious problem we want to be able to write phi using 2 as follows so let us go back to phi and 2 what they are so phi is the nudged dynamics phi is the nudged dynamics and 2 is the observation equation so I am now going to combine the 2 to be able to come up with the exact equation for the nudged dynamics so when I use when I use 2 and phi the explicit equation for the nudged model becomes this why where is this coming from please understand x k plus 1 is equal to m of x k plus g times e k and e k is equal to z k minus I am I am I am sorry z k minus h of x k and and and z k is equal to h of x bar k plus v k so these are all the various quantities that are involved in here so z k is equal to h of x bar k plus v k x bar k is unknown true state so I substitute z k in here I get e k I substitute e k in here if you do these substitutions the resulting equation takes this form so now let us look at the structure of this it consists of a deterministic component why this is deterministic component m of x k is that m of x k is a deterministic model forecast h of x k bar the true state that is the deterministic h of x k that is the forecast and observation counterpart of the observation so this term h of x bar k minus h of x k is there is the actual expression for the forecast true minus the actual g is the multiplier constant so this term is essentially a deterministic term that the v k term now occurs as g v k so that is a random term so nudging method in fact induces a stochastic dynamics because observations are stochastic even though your model your model is is deterministic the process is stochastic because of the observation always have observation noise if you look at this carefully you can really see this is the first order nonlinear autoregressive process so what does it mean x k is not deterministic x k is stochastic so in the previous approach to 40 bar they did not realize that that is a stochastic term that is affecting the evolution they simply assumed that there is no such thing as a stochastic part in the forcing they applied the 40 bar like scheme and they found optimal within the framework what they had built but a clear examination of that of those ideas essentially tells you a correct formulation has to take into account this autoregressive process why this is autoregressive x depends on the x k plus 1 depends on the previous x k so what is an autoregressive process this is k plus 1 this is k so k plus 1 depends on the previous values of the state so I am I am I am I am dependent on myself at the previous time plus a random noise plus a random noise so once you recognize this is an autoregressive process x k becomes a random if x k is a random what does it mean x k the trajectory is a random process so the this essentially tells you x k are serially correlated x k serially correlated why why they are serially correlated let us go back now I have x naught I have x 1 x 1 has the effect of effect of of of of then x 2 x 3 x 3 x 3 depends on x 2 x 2 depends on x 1 x 1 is random x 2 is random therefore x 1 and x 2 are not totally independent they are x 1 is a random process r r is a realization of random process x 2 is a realization of random process x 2 depends on x 1 so there is a serial correlation that is induced so this serial correlation was neglected in almost all of the treatment of of of of nudging schemes and this observation was first made by us so we would like to be able to make amendments to the optimal estimation of G by taking this serial correlation into account so that is one of the second problem not only second problem but also we propose a solution to go around and solve the second problem so let us let us talk about this now so now that we have established that the that the errors are serially correlated so what is that I want I would like to be able to define a matrix C what is this matrix C the matrix C is is r nm times r nm y nm m is the size of the forecast error please understand please recall this is the size of the forecast error I have n observations so if I consider all the observations together all the observations together as well as I consider a gigantic vector the set of all forecast errors from time 1 to n I am sticking them all together to get a gigantic vector which is r nm if I have a vector r nm its covariance must be r nm times r nm so C is the represents n m a gigantic matrix that represents the serial correlation between all the all the states so now we are going to define in okay now you may ask a question how do you know C we have to estimate C because once we know that they are serially correlated we have to take that correlation to account in trying to define your your your your weight function and that is what J 3 G is all about so J 3 G is an alternative to J 2 G that we saw earlier J 2 G neglects the serial correlation now this is a new objective function that involves the weighted sum of squared errors the weighting is related to the serial correlation matrix so now we can minimize now we can minimize with respect to a new function instead of Q 1 is Q 2 Q 2 G is equal to J 3 G plus J P G if you minimize G 2 G using the not dynamics as a strong constraint model you try to take into account the serial correlation but computing the covariance C in 12 is not easy hence in principle finding an optimal G for nonlinear model is a difficult problem so in our view the so called optimal methods that are proposed between 1990 and about 2005 covering a period of about 15 years they claim that their optimal is indeed or not optimal so that is the critique about the methodology and we are also going to suggest a way out of this critique but in general so what does it mean nudging you can implement heuristically more often than not it works but if you want to move away from heuristic methodology to an optimal methodology where I am trying to design a G which is best then you have to take into account all the processes that are involved if you look at it carefully there is a serial correlation trying to minimize the errors without taking that serial correlation to account always leads to results that are not optimal that is the simple way of looking at what is happening in here so in order to be able to define the the matrix C I have to look at the structure of the forecast errors because if you want to be able to understand the serial correlation I need to understand the structure of the forecast errors so let us spend few minutes on trying to understand the structure of the forecast errors within the context of being able to consider the nudging scheme as the first order nonlinear autoregressive process that is the next step so they will let this be the true model let X naught bar be the true but unknown initial state so what does it mean if you use the true model and starting from the true initial state the model forecast will be perfect it will match the observation model or noise please understand when we say match only in the deterministic sense we cannot match the random process anytime so whenever we say something matches something it is always modulo noise noise is something we may have to live with so if I iterate this equation XK bar is equal to M bar K to the power K X naught bar so observation is given by this equation so I can substitute 16 in here to get 17 so the expression for ZK the observation at time K is given by equation 17 equation 17 so X1 is equal to M of X naught sorry so I am now going to talk about the nudged forecast model in the previous case we talked about the unknown true forecast model M may not be equal to M bar that means there is a model error so what is that I am now going to talk about I would like to be able to simultaneously arrange G such that it not only corrects for the model error also it corrects for the initial condition error so I am trying to kill two boards in one stroke so let X1 be equal to M of X naught X naught be the initial condition for any K greater than that M of XK plus G times ZK minus HK that is the nudged model can be written like this because I am considering a linear model to be able to do things little more precisely for non-linear I would like to be able to expose the difficulty for the linear model if the linear model is difficult the non-linear model at least is one notch more difficult than the linear model that is the aim in here in this in this in this discussion so my nudged model becomes 19 where the matrix A is equal to M minus GH G is the gain to be determined H is the forward operator M is the one step transition matrix for the linear model so I am assuming the model is linear the observations are also linear function of the state so iterating this equation 19 I get this and ZK contains the unknown truth plus noise so the noise is embedded within the ZK term the noise is embedded within the ZK term I hope that is clear so 20 is obtained by iterating 19 a simple iteration gives you the expression so what is this expression the nudged state at time k depends on the the the state at time x 1 plus anything beyond substituting 17 in 20 so let us go back to this is 20 what is 17 17 is an expression for the observation based on the models solution you get an expression for XK which is the model state the forecast state given by this expression it is a funny looking expression the little complex expression but I do not think there should be any difficulty in trying to verify that why am I trying to find that because I would like to be able to pin down the forecast error so ZK minus H of XK is the forecast error now therefore I know ZK ZK please remember ZK is equal to H of XK bar plus VK H of XK bar we have already computed in equation on 16 based on H of XK bar I also have in relation for ZK from 17 so I can substitute for ZK from those equations in here I have already computed XK in the previous slide let us look at it once more that is equation 20 so substituting all these things what do I get I get an explicit expression for the forecast error in the nudged model why is that if I want to be able to compute the forecast error covariance the serial correlation I need to be able to get an explicit expression for the forecast error itself that is the first step and that is what we have accomplished so look at the structure of 22 22 has several terms the this is the first term this is the second term this is the third term this is the fourth term is the summation the fourth term itself is the sum of two terms but among all these VK is a noise term VK minus 1 minus J is a noise term so there are two noise terms the rest of it are deterministic terms so the error is noisy you can also see the error of time K depends on VK as well as error I am sorry the noise at time VK as well as the noise at time 0 to yeah 0 to time K minus K minus 2 so that means there is a serial dependency among all these noise expressions so when J is you can really see so I am depending on VK then when J is 0 that depends on VK minus 1 so let me write that down when J is equal to 2 that depends on VK minus 2 when J is equal to K minus 2 that depends on V1 so therefore EK the error DK does not only depend on VK but also the entire sequence in the past it is this dependency of EK on the entire noise sequence up to including time K that induces the serial correlation I hope that part is clear to you it is this serial correlation I am now going to have to extract I am now going to rewrite this expression EK for the sake of convenience a deterministic part plus the random part the deterministic part has this expression the random part has this expression so 24 and 25 correspond to the deterministic part of the forecast error and the random part of the forecast error so you can see 23 is deterministic EK is stochastic what is the stochastic part stochastic part is again the noise from the past the noise from the present the past noise are weighted by the powers of a and you may remember a is equal to a is equal to m minus gh so what does it mean the random part of the forecast error depends on the model dynamics depends on the forward operator it depends on the model dynamics a it depends on the forward operator h it depends on to be chosen the matrix g which is the gain matrix to be used in nudging and of course all of the errors in the observation starting from time 0 to this time to the time k therefore the expected value of EK is equal to the deterministic part the expected value of eta k the random part is 0 now I am going to for the if I assumed a general value of n the expressions are more gets more complex so instead of n I am going to assume n is equal to 3 to just get a feel for the expressions in this in this in this quick discussion. So let us assume I have 3 observations are time k is equal to 1 2 and 3 and so by specializing this now look at this now the previous expressions they go for k 1 to n n is the last observation time to simplify to get of to get that aha feeling I am simply going to assume n is equal to 3 with the loss of generality. So if I substitute n is equal to 3 and simplify the expression there the expression for the random part of the forecast error which is eta eta that is that is this that is that that is that look at this now eta 1 depends on v 1 eta 2 depends on v 1 and v 2 eta 3 depends on v 1 v 2 v 3 eta 4 will depend on v 1 v 2 v 3 v 4. So the the this means eta's are correlated it is this correlation makes the 4D war scheme little defective in the sense they have not taken the entire weight function that accounts for the serial correlation. A simple exercise in statistics computation of correlation tells you C i j which is the correlation between errors at time i and errors at time j is given by expected value of eta i eta i eta j transpose in the case when n is equal to 3 eta 1 1 is given by r what is r r so I am please go back now v k is equal to m of r k I am it is not is generally assumed r k is identically equal to r y v k's are coming from instruments when we buy instruments we buy instruments in bulk. So I am going to assume all the instruments that make measurements are the same time that means the covariance of the error the error characters of the instruments are the same so r k does not depend on k r k simply is r. So that is that is a very useful assumption even though the theory I can continue with r k I do not want to unnecessarily complicate the expressions by trying to be general there is no loss of generality in assuming r k is r. So that is r c 22 so what are this so c is a matrix which is c 11 c 12 c 13 c 22 c 33 everything because it is symmetric I do not have to worry about the bottom part but I can continue the c 21 c 31 c 32 so I am going to compute all of these elements you can see I have computed all of these elements like this look at this now r r plus this r plus two terms c 21 c 12 now look at this c 21 2 1 2 they are they are transposers of each other we will talk about we will talk about the symmetry of the resulting matrix in a minute but I am trying to give you the exact expressions for the c's. So let me let me let me go back and remind you once more I would like to I would like to be able to understand the presence of serial correlation n observations to make life simple I assumed n is equal to 3 so I substituted n is equal to 3 in the expression for the forecast covariance especially the random part of the forecast covariance using the random part of the covariance I am simply computing expressions for these covariances so c 13 c 31 c 3 c 32 you can you can really see I can compute this from this you can really see they are all related so what if I if they so in the in the early methods based on 40 bar what did they not use they did not use this term sorry they did not sorry they did not use this term they did not use this term so these are the new term that comes into the picture and it is these new terms we are interested in incorporating I hope these expressions are clear these expressions can be very explicitly evaluated by the close form expression for the random part. So I have I have computed the matrix c so let us go back now I have computed the matrix c please understand in trying to consider quadratic forms only the symmetric part of the c matters so the symmetric part of c so c is equal to c plus c transpose by 2 so we compute c as I have done and compute the kind of consider the symmetric part of c so this the new c is called the symmetric part of c now the correction term j 3 okay now look at this now the correction term j 3 in 13 let us go back so 13 this is the new j function c inverse is to be used here please understand c inverse is used to be here it is generally the case that I need to know the weight matrix if I am going to consider the weighted least squares I need to know the weight matrix weight matrix is c inverse I have computed c so in principle I can compute c inverse therefore the correct term the correct term j 3 j 3 the correct term in 13 is the quadratic form with the symmetric matrix c inverse where c is the symmetric part of the computer so what is this say this is the computed c this is the c that is used in the left hand side c is the one that is used in 13 so you compute the deterministic part of the computed c and that becomes the new c whose c inverse is the one that is going to be used in the quadratic function the inverse of the symmetric matrix of c is the one that is used in 13 and I would like to re-emphasize the fact which I have already mentioned this matrix c depends on yam h and r what is yam model what is h observation operator what is r noise property so you can readily see the serial covariance is the function of the model the forward operator and the noise all the three players in the game so I have now completed one aspect of the estimation problem for the optimal g this is using 40 var so we talked about what they did and we talked about what is wrong with it and we also talked about what is the meaningful way to correct it now I am quickly going to provide a review of the second approach that was used in the post 90 era based on two stage Kalman like scheme it was introduced by a group of French atmospheric scientist we taught it all 2003 this uses a Kalman filter like predictive part that combines it with a conventional nudging scheme so you can readily so you can now see the following idea if you knew some of the basic approaches to assimilation that we have covered in this class you can hybridize these methods to be able to generate newer methods so in this course we are not going to be talking about all possible methods of hybridization we are going to be we have described all the methods in their purest form because before you can hybridize you need to understand what is the power of each of these techniques therefore this being the first level course at the graduate level we have emphasized all the basic tools which if well understood not only be applied directly also can be used to devise newer schemes for data simulation in other problems that is the idea here so this is an example of such hybridization so what is the thing in here the first step is the following let xk-1 be the state I know I am using the model to create a forecast the second step is I am going to do an analysis which is forecast plus g times zk-h of f so what is this this is the nudging part nudging part so you make a forecast and then you create an analysis the forecast comes from the model the analysis comes from the nudging the nudging uses a G G plays a role of a Kalman filter and they would like to be able to determine G using methods similar to the arguments in Kalman filter we can see the how the hybridization comes into play so that is what I am going to quickly describe so defined dk which is equal to zk-h of xkf which is the innovation the new information that zk contains other than what the observation gives you so d is the vector of such innovations d is again a vector of size nm now I am going to concoct in a jn x0 of g jn is called the the nudging induced a cost function so that is equal to the transpose of this g transpose pf inverse g df n so you can think of these three as part of the weight matrices pf is the forecast error covariance so that is very similar to the one that comes in for you are like scheme here because of the way that g appears in in in equation 30 31 is a very natural way to be able to consider a j function so the model error covariance so here look at this now in here pf is the model error covariance using Kalman filter you may realize even if the model is linear computing the forecast covariance involves two matrix matrix multiplication so computationally it is it is much more expensive than the 40 watt based idea 40 watt based idea therefore they concocted several components for the overall minimization one is the background term what is background term x0 until now until now we did not worry too much about x0 so initially I may allow the error but as the system picks up in time the error will reduce that was the basic idea now they would like to be able to start with some background information for the model initial state itself so that goes to show you the flexibility of how many such terms if you knew you can add to the objective function to be able to create a solution that takes care of several several pieces of information that you may want to bring to that on the problem so j2 is essentially the sum of squared or criterion please remember they used RK yes we have now argued use of RK is not correct because the forecast errors are correlated so in that sense the use of RK in 33 why one essentially closer that I to the presence of serial correlation so what is the best way to do it you still need to be able to get an expression for the forecast error and need to be able to compute the serial correlation essentially you are trying to talk stochastic dynamic models with stochastic observation in the context of nudging so when you are trying to do everything stochastic you need to call spade a spade and using RK does not fit that paradigm that is one of the observations that we have made so you you you you can cut a new cost function please understand until now we only considered a cost function is a function of G now the cost function is a function of X not on G so it is a slightly extended formulation so the J and G we have already seen J to G we have already seen JB X not so there is a penalty coming out of X not there is a penalty coming out of G there is a penalty coming out of X not on G so our objective is to minimize not D3 I am sorry is Q3 sorry minimize Q3 using the adjoin method when the nudging model uses a strong constraint now look at this now again they can cut a column like scheme but they want to be able to find the initial condition and optimal G by method similar to the adjoin method when using the adjoin the nudged model as a strong constraint you can see the power of the 40 are like principles where you can apply it repeatedly whether it is initially whether it is estimating the initial state or parameters or anything else so in this case G is a parameter we can think of it for the nudging model since the forecast errors are correlated we need to correct J to G please understand we need to be able to correct J to G J to G is given by 33 so this is what we talked about RK the use of RK inverse so all the other terms are kosher the only term that does not fit the bill is because 33 relates to the total sum of squared errors weighted sum of squared errors the weighting is not appropriate the weighting is incorrect so we can again correct the weight function by appropriately computing the serial the serial correlation so the temporal covariance estimation is an important part of this we only cited the need for computing this temporal correlation we have not done this explicitly I think it will be an interesting exercise for somebody to be able to take up this two-stage nudging scheme that involves 40 war and the Kalman like scheme and and and compute the serial correlated errors and if you if you use that you should be able to find out what is a good scheme what is a good method so that could in my view a good starting point for probably a master's thesis probably master's thesis with this we have provided you a major all the major ideas that relates to the development of nudging schemes we have given several exercises that are extensions of the discussions that we have had this module follows a paper that we wrote in 2013 Lakshmi Raghavan and Lewis nudging method a critical overview this appeared as a chapter 2 in a book entitled data simulation for atmospheric ocean and hydrological application it is the second volume published by Springer log in a series edited by Sanghi Park and Yelchew and that paper contains a lot more information about basic nudging we also alluded to the relation between observer theory as was developed by Luvenberger in nineteen early 1960s 62 63 64 in the time framework and that in our paper in our critical review we have talked about the intrinsic relation between observer theory and the nudging theory to be able to see how observer theory can help to be able to design better nudging schemes nudging schemes are general schemes which are which are very useful class of methods for forcing a model towards the observation by by using the notion of this state feedback so with that we conclude our discussion an introductory discussion of nudging methods thank you