 In the previous couple of lectures, we covered two distinct methods for assimilating data into dynamic deterministic models. The first one was called the first order of joint method. Second one is called forward sensory method. A careful analysis of the details of these two methods would immediately reveal that both the methods are considering first order variation or first variation. That is why the adjoin method is called first order adjoint. The forward sensory method is also called first order forward sensory method. So, when you say first order, there is also a corresponding second order. There is a second order adjoint method. There is also a second order forward sensitivity method. The question may arise, who uses first order? When do you use second order? If the model and the observations are very strongly nonlinear, first order methods may not be as accurate as a second order methods. So, second order methods are generally preferred when the models are very strongly nonlinear or highly nonlinear. But second order methods are computationally more demanding than the first order method. So, one way to be able to use the first order method in the context of strongly nonlinear system is to be able to repeatedly apply the first order in a iterative fashion. Effectively, it is equivalent to doing a second order. So, either you do second order one shot or first order repeatedly, I think the effect would mathematically be very similar to each other. So, having seen different versions of different types of data assimilation algorithms, 4DVAR and FSM, I think it is time for us to be able to find the relation, the intrinsic relation between FSM and 4DVAR. It is we are going to now demonstrate that both the methods talk about different aspects of the same fundamental philosophy. The gradient of the J function using adjoin method that we computed is called adjoin sensitivity. We are going to now relate the adjoin sensitivity to the forward sensitivity and that is the topic examining the relation between FSM and 4DVAR. So, FSM versus 4DVAR to start with, FSM needs backward adjoint model with forecast errors as forcing. You can see the backward error, we remember the FK which is the normalized forecast error viewed from the model space which are the forcing for the linear backward dynamics. The structure of that dynamics is that is the linear dynamics is the matrix it involves matrix vector multiplication. So, we would like to call it linear vector occurrence. So, I have to run the model forward and run the adjoint backward, this forward backward loop is repeated several times and is also combined with a minimization procedure. So, that is the overall structure of the 4DVAR. In the case of forward sensitivity we run the model forward, we also run the forward sensitivity dynamics the UK dynamics and the VK dynamics. The difference here is that these are linear matrix recurrence as opposed to matrix vector recurrence that is the difference. Linear vector recurrence is cheaper computationally matrix recurrence is computationally little bit more demanding is much more demanding in fact. But the advantage here is that I do not need any adjoint. We considered simple problems but writing an adjoint and developing an adjoint is considered to be not an easy task in general. So, the advantage of FSM is that it does not require an adjoint. The disadvantage of the FSM is that it requires solution of the matrix recurrence. Another advantage of the FSM is that using forward sensitivity I can express the adjoint sensitivity. I can decompose the adjoint sensitivity to it and express it as a function of forward sensitivities and for cached errors. In the case of 4DVAR our aim is simply to be able to get the adjoint sensitivity. What is the adjoint sensitivity? The adjoint sensitivity is essentially delta j by delta x0 that is the adjoint sensitivity because my aim is to be able to minimize j of x0. In order to minimize j of x0 we have to compute the gradient with respect to x0 and this sensitivity is the one that we computed using 4DVAR or adjoint method. So, in this parlance this is called adjoint sensitivity as opposed to forward sensitivities UK and VK and here. So, what is the fine structure of the adjoint sensitivity? We should be able to express the adjoint sensitivity using UK and VK and for cached errors. So, it is this ability to express the adjoint sensitivity as a function of the product of forward sensitivity and the for cached error relates to the fine structure of the adjoint sensitivity. And this has lots of advantages that we will talk about in a minute. So, again consider the nonlinear problem I have a deterministic model again x0 is initial condition alpha is the parameter. I have the observation nonlinear observation I have a white Gaussian noise in order to make things simple I am only going to consider in this lecture the case of single observation at time k is equal to n. So, if I can do the comparison for a single observation similar analysis apply for multiple observation. So, it is without loss of generality to get to the crux of the matter it is enough if you consider one observation. So, if I have only one observation JC has only one term. So, this is a for cached error time n this is a for cached error time n this is a sum of squared errors weighted by R and inverse. We have already seen from 4D war method the derivative of JC with respect to xk is given by this and we are going to call this as eta n this is related to f of n I am simply using another term we could have called it f of capital N as well. So, eta n is the normalized forecast error we have from the model space which we have already seen the I am now going to consider the first. So, I have earlier I have now computed the derivative of j with respect to xk now I am going to consider the first variation. So, recall the first variation delta j is given by the gradient times the x of n and the gradient with respect to x of n is eta n that is what we have seen in the from the slide a previous slide equation 4. So, the first variation is essentially inner product of is essentially inner product of eta n sorry is essentially inner product of eta n with delta xn. From the forward sensitive equation we already know please recall delta xn is equal to un delta x0 plus vn delta alpha. Now substitute 6 and 5 the delta xn in here that gives rise to delta c is delta of JC is equal to this term as well as some of these 2 terms because from 6 delta xn consider 2 parts one from arising from the change in the initial condition another change in the parameters. So, this term accounts for the change in the initial condition this term accounts for the change in parameter. Now I am going to use the adjoint property please remember the adjoint property x I am sorry I will state in a slightly different fashion there are several ways of stating this I would I would like to state in a slightly different fashion. So, what is the adjoint property x comma y the inner product is equal to x comma a transpose y that is the basic adjoint relation. So, using this adjoint relation applying to 6 I get 7. So, the un from here goes to this vn from here goes to that. Therefore, from basic first principles you can readily see this essentially is the gradient of j with respect to x not and this essentially is the gradient of j with respect to alpha. So, you multiply the gradient with respect to x not delta x not gives you the adjoint sensitivity. So, this is the adjoint sensitivity we are looking for this is the sensitivity of this with respect to the parameters in our 40 bar we never considered this part this is simply the sensitivity of the cost function with respect to the parameter. So, from 8 we now know how the gradient of the j function with respect to x not and alpha are structured you can readily see the sensitivity is the product of un transpose and eta n what is eta n eta n is simply the model error the normalized model error viewed from the model space. The neural as a forecaster I am sorry model error I misspoke it is the normalized forecast error viewed from the model space you are going to multiply that by the transpose of the forward sensitivity. So, you can readily see that the adjoint sensitivity is the product of transpose of the forward sensitivity times the forecaster likewise the adjoint sensitivity with respect to the parameter is also the product of the transpose of the forward sensitivity times the forecast error. Now, please remember that from 8 we already know that un transpose is simply product of the is simply the product of the Jacobians along the along the trajectory eta n is the eta n is the forecast error at time n we already know transpose of the product is the product of the transpose of taken in the reverse order. So, this is the transpose of the products eta n is the forecast error. Now, you can see the adjoint sensitivity with respect to the initial condition is a path property why it is a path property it is a product of the Jacobian along the trajectory starting from x naught x 1 eta n minus 1 times the forecast error at time n. So, it is the product of the transposes of the model Jacobian transposes evaluated along the trajectory and the normalized forecast error viewed from the model space. So, you can this is what we call the fine structure this is what is called the fine structure of the adjoint sensitivity. So, at left hand side of the adjoint sensitivity right hand side is an expression for the forward sensitivity we are able to relate the forward sensitivity and adjoint sensitivity using this relation in 9. So, that is an important connection between 40 bar and FSM. Now, with respect to some computational considerations matrix product requires n cube operations matrix vector product requires n square operation. So, adjoint dynamics with respect to the initial condition that we saw earlier can be written for the case of one observation. The backward dynamics is the linear occurrence which is called the adjoint dynamics. Since there is only one observation there is no other forcing for this I start with lambda n is equal to eta n and simply integrate through this do loop lambda is the vector I am interested in the matrix vector multiplication. So, this is simply a matrix vector multiplication and when you come to the end you get the forward sensitivity and which is I am sorry adjoint sensitivity and that is what we are seeking for. So, this is the adjoint dynamics that gives raise to the evaluation of adjoint sensitivity. So, this is a summary of the 40 bar adjoints. So, the fine structure is what we have talked about. So, I have already talked about the fine structure with respect to the initial condition the fine structure with respect to the initial condition is given by 9. We also talked about how the adjoint sensitivity calculated in 40 bar. Now, I am going to go to exploring the adjoint sensitivity with respect to the parameter. From the previous module we already know the solution for V n is given by this we have already solved the forward sensitivity with respect to the parameters. As an illustration let us pretend n is 4 when n is 4 V 4 is given by this expression as we saw it is a sum of the products of the model Jacobian with respect to the state and the model Jacobian with respect to the parameters a's are the model Jacobian with respect to the state b's are the model Jacobian with respect to the parameters. So, substituting this in 11 the adjoint sensitivity with respect to the parameter takes this particular form again you can see it is a product of matrices product of matrices product of matrices the whole thing is multiplied by eta 4 again this is the gradient this gradient is a path property is determined by the product of the transposes or the forward sensitivities along the path with eta 4. So, that is the important consideration this reveals the importance of the Jacobian along the trajectory along the forecast trajectory. Now we are going to express the computation of the forward sensitivity with respect to the parameter in the form of a pseudo code I would like to remind you I have already written in equation 10 or yeah 10 in slide 7 gives you the pseudo code for the back the pseudo code which represents the backward occurrence in the computation of adjoint sensitivity. Now this represents the backward code for the computation of adjoint sensitivity with respect to the parameter initial condition there parameter here because of the complexity of the expression in 14 this program is slightly a little bit more involved I am not going to go over this line by line you can readily verify the correctness of this program and this is the backward adjoint code for adjoint sensitivity computation with respect to the parameter and what is the end result of this the end result of this it computes delta alpha with respect to the parameter alpha there is the adjoint sensitivity with respect to the parameters when I have multiple observations it is simply an extension of what we have shown again I am going to have n different times I would like to show this for the sake of completeness there are n different times n different observations each of the eta k i s which are the forecast errors normalized forecast errors viewed from the model space at each one of the time instances since the forecast errors are not at every time there are gaps I am now going to define eta bar k which is related to eta k eta eta k is defined only at discrete instance and time eta bar k is defined at every instance and time it is multiplied by a delta function k k i what does it mean when k is equal to k i it will be 1 otherwise it will be 0 so it is a kind of a selector function that maps observations at certain intervals of time to continuous observation so delta k is the standard delta function with this standard delta function we can now express the cost function again delta c this is the sum of all the weighted squared errors over all the observations we have already seen delta c is given by this is the forward sensitivity part with respect to the state this is the forward sensitivity part of the parameters I know I am skipping some of the details but the details are already explicitly given we have we have already seen that the move from here to here uses that giant property the UK time becomes UK bar and so on I think a bracket is missing in here again a bracket is missing in here so this 19 is simply an extension of what we did for one observation now in this case because there are multiple observations there are summation from i is equal to 1 to oh I am sorry I think I made an error I do not think this is needed I do not think that is needed sorry I do not think that is needed that is correct the summation is already there I am sorry I missed it so we have utilized the linearity property as well as the giant property to get this 19 so 19 essentially tells you how your first variation of delta j is related to the forward sensitivities with respect to state and initial conditions. Therefore from using the fundamental definition the expression for the forward sensitivity with respect to the the expressions for the forward sensitivity transpose times the observation summation over gives you the joint sensitivity the expression some of the transpose of the forward sensitivity with respect to the forecast errors gives rise to the the adjoint sensitivity these two are simple extensions of the one one observation case UK vk are the forward sensitivities of the solution xk with respect to x0 and alpha so adjoint dynamics with respect to x0 is given by this in the case of multiple observation so please remember in the case of in the case of single observation we simply have the final condition there is no forcing when there are multiple observation I have a final condition and I also have a forcing so in the end when we calculate this I calculate the joint sensitivity so this is the backward dynamics likewise the backward dynamics for the multiple observation is given by this structurally they are not too different from each other so we have we have we have given the pseudo code for computing the adjoint sensitivity in the case of multiple observation both with respect to x0 and alpha slide 13 gives gives the adjoint calculation for multiple observation with respect to x0 slide 14 adjoint sensitivity with respect to the parameter again an example now I am going to talk about an example one of the standard questions in data simulation is how do I distribute my observation temporarily how do I distribute my observation stations spatially so the fundamental question relates to how do you distribute the observation in a spatial temporal domain where from I can get the maximum amount of information that can be derived from the observation for being assimilated into the model that is one of the fundamental question that is continues to be of great interest in data assimilation because there are physical processes going on in nature development of models and a model analysis continues separately in order to be able to fit the model to the data I need observations datas are observations so and then when the data model available we are going to do the data simulation part but the data collection is an expensive business what to measure how to measure where to measure these are all fundamental question that have to be that has to be argued and the decision has to be made I am going to make observation that these time because I am going to make these observation at these spatial location because the reason is I would like to get that I would like to be able to maximize the transfer of information from observation to the model through the process of data simulation. Now we are going to talk about that particular aspect of how the distribution of observation impacts the computation of the adjoin sensitivity because that is that is one of the fundamental question one has to be concerned with in order to answer this question I am going to be concerned with the same model that we talked about in the previous lecture this is the discrete version of the cold air moving over the hot sea surface and there is temperature transfer because of turbulent mixing and this is the solution the solution can be rewritten 22 is the model 23 is the solution the discrete time sensitivities of the solution with respect to the initial condition sensitive with respect to the boundary condition and sensitive with respect to the parameters are all given from the plots of the solution we you may remember that the sensitivity of the solution with this parameter exhibited a maximum the maximum occurs when K is equal to K star K star is this value. So delta xk divided by delta the partial of xk with respect to K times the maximum when K is equal to K star which is given by that and you may remember beta is related to K beta is equal to delta t times K delta t is the time discretization K is the original parameter in the continuous time version. So what is that we are going to we are going to do a thin experiment I am going to start with x0 is 1 sea surface temperature is 11 so you can nearly see the air when it comes in contact to the water for the first time has a temperature of 1 degree centigrade at that time the water is 11 degree centigrade. So there is going to be a transfer of heat from the water surface to the air the heat transfer coefficient is assumed to be 0.25 and we assume x0 is 1 xs is 11 and K is equal to 0.25 that corresponds to the truth what is the main that is how nature has arranged matters. So you generate the solution by running the model forward in time I am going to observe the air temperature I am trying to make observations of the air temperature. So the observation zk is equal to the air temperature plus noise. So in this case hfx is x itself that means is the linear hfx is an identity function the Jacobian of h is unity is a very simple problem we simplified it because we want to be able to bring out the beauty of the underlying argument the argument being how the distribution of the observation affect the quality of the computation the quality and the value of the adjoin gradient adjoin sensitivity I should say adjoin sensitivity. So vk is given by this standard Gaussian noise so that is what mother nature does now not knowing what the mother nature has planned I am going to assume my initial condition is 2 the Caesar system pressure is 10 K is 0.3 look at this now I have errors in the initial condition I have errors in the boundary condition I have errors in the parameter. So if I am going to make a forecast by running the model forward the forecast is going to have errors the forecast error results from the confounding of the errors in initial condition parameters and boundary condition is one of the hard cases because we are assuming everything is wrong the we summarise the observation so computed we summarise the forecast calculator from these control value we summarise the forecast errors all in the following table. So you can readily see the forecast error ek the observation zk and xkf is the forecast so look at each of these now so the zk ek I am sorry zk xk and ekf so this must be the first column is zk the second column is the forecast from the erroneous state the last column is the is the error so please remember this must be together this that is the last column that is the last column. So the first column is zk second column is xf the last column is zk minus zk minus xf that is the that is the condition that is a little bit of a space here I hope you understand now what these numbers are so we are running the model to time 18 you can see from the forecast the temperature of the I am assuming initial condition is 2 the temperature is rising that means the water is getting the air is getting hotter the water is transferring heat to the atmosphere. The actual observation generated from the true initial conditions are given by this therefore you can you can readily see this is the observation this is the model predicted so the difference between 2 is minus 1 and that is how you are going to get the forecast errors so sometimes the forecast errors are negative sometimes the forecast errors are positive they widely vary from minus 1 to close to plus 1 as time evolves. Now we conduct experiments I am going to tell you the summary of the experiments experiment one in experiment one I take observations at time 15, 16, 17, 18 it means I am taking observations at very late in the game I am computing the delta j and fsm method I am using 4dvar and the forward sensitivity method initial condition the theta must be x actually okay initial condition x0 is 2 so when I used the forward sensitivity method these are all the results for the forward sensitivity method this is x so this is actually xs this is actually xs the adjusted value of the control is given by initial condition the the the sea surface temperature and the control value look at this now this is the record value using these observations so I use the observations 4 observations at time 15, 16, 17, 18 if I applied the adjoin method the recovered value are these I using the same observation if I use the fsm method I got the recovered value to be this. Now you can see the true value of the initial condition was 1 the true value the parameters was 0.2 fsm has recovered the true initial condition and the true parameter and the boundary condition I think was 11 and it is pretty close in the case of 4dvar the initial condition it did not recover correctly there is a error in the parameter recovery as well the actual parameters 0.2 but sea surface temperature it recovered reasonably closely so this is good but these two are not good so this gives you a relative comparison of the performance of the two methods you may ask why this happens if you look at the sensitivity functions as we plotted in the previous one the sensitivity with respect to the initial condition comes down this is this is this is t this is delta xt divided by delta x0 that is sensitivity with respect to the parameters xs goes like this the sensitivity with respect to this is again t this is sensitivity with respect to the parameter k it went up and came down. So 16, 17, 18 the times at which I 15, 16, 17, 18 the times at which I compute the I performed the I collect the observations 15 is here and so on the forward sensitivity solution has already become very low close to 0 but the sensitivity with respect to the boundary condition is quite large is equal to unity the boundary the sensitivity with the parameters has become very close to 0. Therefore the adjoin method is not able to recover the initial condition largely because the sensitivity with respect to the initial condition has died down to 0 the sensitivity with respect to the parameters also has died down to 0. If you look at the structure of the adjoin sensitivity it is the sum of the transpose of the forward sensitivity times the forecast error the product of the two quantities you want me is close to 0 the product is 0. So the adjoin sensitivity in the case of 4D war does not have enough information from the initial condition sensitivity and the parameter sensitivity that is why the adjoin sensitivity is not able to recover the true value there is large error in here large error in here. However if you change the observation timing 1, 2, 17, 18 so two observations in the earlier time all the two observations the final time. So what is the difference between these two experiments in the first experiment we put four observations all in the end here we are putting two observations in the beginning and the two observations in the end. Why this is important if you look at the sensitivity curve in slide 19 the initial condition sensitivity is large initially the initial condition sensitivity is also large the sensitivity of the solution of this parameter is also large initially. So by sampling at 1 and 2 I get lot more information about X0 I get lot more information about K by sampling at 17, 18 I get lot more information about X the boundary condition X a batch therefore by distributing the observation some in the initial some in the final we are able to maximize the amount of information that can be transferred from observation to the control. So in the next slide you can readily see if you use the 40 bar method I am sorry if you use the 40 bar method this is the 40 bar method this is the FSM likewise this is the 40 bar I am sorry this is the 40 bar this is I would like to say this is 40 bar this is FSM. So if I come back here you can readily see the 40 bar method does better with respect to the distribution of observation at 1, 2 and 17, 18 this is better than the previous recovery this is better than the previous recovery but this is even better because FSM is able to take advantage of the distribution of observation with respect to sensitivity. So what is the model of the story so far in data simulation we have assumed we are given a bunch of data we never worried about the impact of data and the quality of data simulation that is what motivated us to be able to think about a method by which we can characterize the impact of distribution of observation on the adjoin sensitivity computation. Why adjoin sensitivity computation because adjoin sensitivity gives the gradient of the cost function with respect to the control once I compute the gradient of the cost function with respect to the control I can use it in the minimization algorithm. So for the whole framework of data simulation within the dynamical context to work I should be able to compute this gradient this adjoin sensitivity reasonably accurately. So all the information in the observation and the forecast errors have to be transferred to this quantity called adjoin sensitivity it is this transfer of information is very much critically dependent on the location of observation the spatio temporal location of observation. So this experiment 1 and 2 for the simple problem illustrates the fact that if you put all the observations and 1 and 1 end or another end we may not be able to maximize the information because at 1 end the forward sensitivity may become close to 0 or the forward sensitivity may become close to 0 in the beginning. Therefore to maximize the impact of observation what is the lesson you need to run the model you need to run the forward sensitivity you need to ascertain the spatio temporal regions where the forward sensitivities are not close to 0 if you put observations in those locations where the forward sensitivities are bounded away from 0 those places will have will contribute to maximum amount of information that one can utilize from the observations back to the computation of adjoin grade adjoin sensitivity that is the moral of the story. So we so comments in experiment 1 we used observations at 15, 16, 17, 18 which led to the poor recovery of the correction to the control this is largely because of the fact at a later time the only sensitivity that is bounded away from 0 was the boundary condition sensitivity initial condition sensitivity and the parameter sensitivity have already died down to 0. So we will have greater difficulty in recovering the initial conditions and the parameters if you put all the observations where these two sensitivities are close to 0 that is the moral of the story from experiment 1. The moral of the story in experiment 2 is that I had two observations in the beginning and two observations in the end. In the end the boundary condition sensitivity is large in the beginning the initial condition sensitivity and the parameter sensitivities are quite good. So the initial observation help you to derive information on the initial condition of the parameters the latter observations help you to derive information about the boundary condition. So by having a combined distribution initially and finally we are able to maximize the transfer of information from observation 2 the control that is why we saw in experiment 2 the recovery was much better than in experiment 1. So this essentially tells you forward sensitivity method using forward sensitivity method not only one can do the data simulation which in principle is equivalent to the 40 bar it also helps you to ascertain regions in the spatial temporal domain where if I put the observations I will get the maximum benefit. It is this dual advantage of the forward sensitivity method we believe is one of the strengths of the forward sensitivity method. As it occurs with everything in life if there is an advantage there has to be a disadvantage what is the disadvantage in the case of 40 bar the backward recurrence relation involves only matrix vector multiplication but in the case of FSM the forward recurrence relation in needs matrix matrix multiplication. So for large scale problem the use of FSM would require excessive computational demands. So for large scale problem 40 bar is still preferable but if you want to be able to do some diagnostics as to the distribution of the observation within the framework FSM by running the model forward in time by plotting the variation of sensitivity in the spatial temporal domain one can ascertain a priori regions where the sensitivity is bounded away from 0 and hence if you put the look if you put observation in those locations you will get the maximum benefit to be able to do the data simulation. So both the methods have advantages and disadvantage while they are also equivalent in some sense that is the model of the story. Again this modules follows from our paper Lakshmi Varahanan Lewis advances in meteorology the title of the paper is forward sensitivity based approach to dynamic data simulation thank you very much.