 In this module 7, we are going to be talking about some of the classical methods for data simulation. These techniques look at prediction as an initial value problem. So, I am going to provide some background to motivate the kind of algorithm that has been used in the early 50s when data simulation began to be used by National Centers for Weather Prediction. So, it was realized long back that prediction, numerical weather prediction is an initial value problem. You may have, we may have already alluded to this in module 1.2 when we talked about data mining, data simulation and prediction. We argued that data mining, data simulation and prediction are three different components of the predictive science. So, the notion of prediction becomes important. If you are going to be using dynamic models to create prediction, it has been known for a long time at least since early 1900s that prediction as a mathematical problem is an initial value problem. This acceptance of the fact that prediction is an initial value problem for a given class of dynamic models calls for knowing the values of the state variable on the computational grid of the initial time. For example, if I have a dynamical model, if the solutions of dynamical model are predictions or you are going to generate forecast product as a functions of the solutions of mathematical model to be able to pull the model solution, I need to be able to get it started at a given time. And so, if I assume t is equal to 0 as a given time, I need to be, I need to be have the, I need to have the values of the state variable on the entire computational grid at the initial time and that in mathematics we generally call initial condition. So, to get the initial value problem going, I need an initial condition. In here I would like to bring couple of other constraints. The size of the computational grid that is often used in prediction is often limited by computing power. The larger the number of points in the computational grid, larger is going to be the time required. So, before you deciding on the size of the computational grid, in trying to solve the prediction problem as an initial value problem, you need to make sure what kind of computing power you have available for you to be able to create the prediction. So, we have, so we will now assume some of the few things that are needed to get going. I know the computing power. Contingent on the computing power, I have already decided on the size and the maximum size of the grid I could compute. I have the model dynamics. I have the model dynamics discretized on the chosen computational grid which is consistent with the computing power. So, continuous time models have been discretized, I have been reduced to discrete time models. Whether it is a continuous time model, discrete time model, so long as a dynamic model I need initial conditions. So, I am now going to explore some of the classical methods for transferring data from the observation network to the computational grid so as to initialize the model. In the early days while the computing power was not very high, so they were limited to regional models or very coarse global models. The standard models are given by partial differential equations. So, you discretize the given model on the grid to be able to initialize this model I want to have initial conditions. Initial condition is the unknown. We have been talking about estimating various quantities. Here the thing that I have to estimate is the initial condition. What is the input data from which I am going to have to estimate the initial condition that comes from the data observation. So, I am assuming we are given a set of observation stations. In the early days in the pre-satellite pre-radar era they essentially had ground stations. They essentially had balloons in different parts of the world. So, the sensor network was very sparse. The amount of the observation that were available for use in estimating the initial condition was much smaller. So, these are all some of the background information one need to keep in mind to understand how before powerful computers came to be, before powerful methods of dynamic data simulations were used to be able to generate prediction. How they managed to create reasonably good prediction, how they estimated the initial condition from the data. This is the class of problem that we are going to be dealing with in the next, in this and the next couple of modules. In the early days observation network were very sparse. They are essentially land based observations available at fixed sites around the globe. Modern days remotely since data are available along specific tracks using satellites or radars. This calls for lifting or interpolating the data from the observation network to the computational grid. In the early days this aspect of transferring the information from the sparse observation network to a denser computational grid was called objective analysis. So, you can in some sense say objective analysis where the forerunners of the modern data simulation systems that are used by meteorological centers around the world. This lifted data will constitute the initial conditions for the model. So, once the initial condition was fixed the model ran forward we made prediction. At this time I would like to be able to recall the relation between this and what we did in 40 watt. In 40 watt what is that we assumed same kind of thing. I have a model I have observations at different times may perhaps in different locations. I would like to be able to fit the model solution to the observation in the least square sense. We used an objective function to be able to determine the initial condition. Once I have determined the optimal initial condition using 40 watt forward sensitivity method we then ran the model forward in time. This run of the this forward run of the model starting from the optimal initial condition provided a reasonably good forecast. So, you can see the 40 watt is a method that came to be used around 1980's. The classical methods began in 1950's. So, you can see the similarity in the way in which they tackle the problem of trying to find the optimal initial condition those days in the pre 40 watt era in the pre formal data simulation era. So, that is what I am trying to provide a background on. So, until the late 1940's atmospheric forecast was generated from essentially subjective analysis which are based on analysis of weather maps. So, that is what is called subjective analysis. The only tool they had is was available to them was the isobaric surfaces depicted as weather maps. The new era of objective analysis began in 1949 with Hans Panofsky. He for the first time began this formal method of being able to transfer the information from a sparse observation network to a denser computational network to be able to lift or spread the information from observation network to the computational network to that end he fitted a third degree polynomial with 10 coefficients to the observations of wind and geo potential wind and geo potential. So, one could say within the parlance of atmospheric sciences 1949 the work of Panofsky may have been one of the early work which he called objective analysis which resembles the modern day data simulation framework. But this approach in 1949 please remember that the stored program digital computers came into existence only in 1951-52. Von Neumann and his group used the first available stored program digital machine in 1951-52 to solve the geostropic vorticity equation and they made a first 24 hour forecast this was pre computer era the computer era just about to break lose in 1949. So, he faced lots of computational challenges and unfortunately while the idea is decent and workable it met with several operational difficulties mathematical computational difficulties. Nevertheless, it is not out of consideration to think of Panofsky as one who sold the seed for an objective analysis of transferring information from observation to some to estimating unknown states. So, you can see the semblance of the ideas of estimation coming from the world of atmospheric sciences shortly after that in the mid fifties Brutthosen and Dewes used 300 kilometer computational grid covering the not Atlantic region. So, they because the computers were limited 1955 stored program digital computers have come into existence I want to give you an idea of how sophisticated computer technology at that time was they only had a computer they did not have any programming language they did not have any compiler. So, when Von Neumann and his group used the computers to make the 24 hour prediction they essentially programmed everything in machine language. Anybody who has done programming in assembly level language you would know how involved it could be to be able to code all the programs in the machine level language using simply codes using strings of 0s and 1s. The Fortran was invented only around the time the Fortran compiler came to be by mid 1950s. So, this notion of being able to write programs in a general purpose language such as Fortran became a commonplace around the mid 1950s. So, it is everything was very raw people did not quite understand how to utilize this monster called computers. So, along so much have to be developed in terms of system software programming languages compilers operating systems all these things were trying to double up it is at that time Brutthosen and Dewes used a computational grid with 300 kilometers to do a regional analysis covering North Atlantic region parts of North America as well as Europe. They created so they wanted to be able to bring in the observation on to a grid to be able to have a very nice initial condition from which to run the model forward the model solution will become a forecast. But they had a very good idea so what did they do? They thought of 2 pieces of information 1 they thought that I already have climatology based information on the grid which is called the background state. So, they introduced the notion of what is called the background state background state sorry they introduced the notion of what is called a background state. In model language what is background state background state is a prior is the belief that you have about the state of the system before you took the very first observation. How did they create the first background? They created as a linear combination of climatology and the procedure they use using which they had made a forecast the 12 hour forecast. So, this is the present time this is minus t they ran the method from minus t to 0 forward that is a 12 hour t is equal to 12 hours. So, minus 12 hours to 0 they made a forecast. So, that forecast had information on the grid about the state of the system they are looking at of course they also have the information from climatology. They simply took some arbitrary linear combination of the climatology and the forecast from 12 hours to start with you may ask where do they get the forecast from. So, to start with they did not have any forecast. So, they essentially used climatology to do the things. So, this is one way of trying to initialize the model in this particular case no observation can be is used. But then nevertheless having a background information on the grid is one piece of information. You can see the beginnings of Bayesian philosophy right there what is the element of the Bayesian philosophy there is a prior is the belief I had before I started doing anything then I started making observations observations give me some new information Bayesian philosophy always looks at combining the prior belief with the new information to get the posterior. You can see the elements of that Bayesian philosophy inherent in 1955 within the context of atmospheric science prediction in the work in the work of Bert Thorsen and Dewes. So, they in other words they created a state from all using all the available information at that time you can think of it like that. Then they had access to observations observations of 500 millibar height what did they then do. So, you can think of it like this now this is the computational grid. So, 1 2 3 4 5 1 2 3 4. So, n is equal to 5 times 4 is equal to 20 computational grid. Let us pretend the observation locations are in those points. So, given this grid and the location for the observation. So, you can think of the dots at the observation network mathematically one can think of interpolating data from the observation network to the grid or from grid to the observation network that is a simple mathematical interpolation scheme. I can do what is called binary interpolation scheme which are very simple. So, what did they do? They first built the background information on the grid then they interpolated this background information on the grid to the observation location. So, at the observation location they have the background information which is obtained by interpolating the background information on the grid to the observation network. Then at the observation network you conduct observation observations are available made available. So, that is the second piece of information that you have at the observation of the observation locations. So, what are the observations here? The observations are essentially 500 millibar height. So, they had an estimate of the 500 millibar height from the background information they have the observation of the 500 millibar height from reality. They have already interpolated the background to the observation side. So, at each observation side they have two pieces of information. Then what did they do? They used a distance dependent weight function to iteratively blend the increment the background and the observations. That is the very beautiful heuristics scheme it worked it became to be used in operational centres. You can see the fundamental philosophy of Bayesian inherent in here. So, instead of working in the computational grid they worked in the observation space. So, in modern language you can say you can do assimilation in the observation space or in the model space. They had a two way communication between the observation network and the computational network which is the grid. By developing this two dimensional bridge between these two worlds of observation the world of observation network and to the world of computational network they were able to go in between. So, they converted everything on the computational grid to the observation network. So, at the observation network they have the representative background information. They have the representative observation information they simply blended them in a heuristic way. How did they blend it? They essentially blended heuristically using a weighting scheme that depend on distance. So, that is the fundamental idea of some of the earliest work that happened. Now, we are going to mathematically describe a general iterative scheme that embodies the vectors and those ideas. That is what I am going to be talking about in a mathematical form. So, let us pretend okay. So, you can do this iteration sorry you can do this iterative process. As I mentioned in the previous slide either on the computational network which is the computational grid or the observation network. Now, I am going to reformulate that problem as one of doing on the computational grid itself. So, let us start the process. So, let us assume I have a computational grid. The computational grid such as the one that I had given in the previous one. So, let us assume I have a 20 point computational grid. So, in this case x is a 20 dimensional vector x 1 to x. So, I can number them as 3, 4, 5 all the way up to 20 all the way up to 20. It could be so in this particular case it is the 500 millibar height that is the state of the system that is being operated on. So, let x naught be a vector on this grid of size 20 that provides that gives that represents the background information. What is this background information? This background information x naught belonging to Rn. So, n in this example is 20. The background information is obtained as a linear combination of everything I know from previous four cases that are available climatology information so on and so forth. So, I develop a mosaic on my grid. So, x naught background information is simply a mosaic a smooth mosaic of the states obtained from anything I can put my fingers on. Then z is the observation that comes to us z is Rm. Please understand Rn and Rm m and n are different in general m is less than n. I had more number of grid points less number of observation that is how they started in 1950s until recently. Until remote sensing became possible such as radar such as satellite the amount of observation is always smaller than the size of the grid. With satellites and radar the amount of observation has become more and more computing power has become better and better. Our desire to increase the number of computational size of the computational grid namely to reduce the grid spacing becomes more and more possible. So, there is a race between how finer I can make the computational grid making n larger I can also there is also a race but how much more observations I can have from satellite radar and everything else put together. So, m is keeps increasing n keeps increasing as our ability to observe nature becomes better and better as our ability to create computers becomes better and better. So, m and n are not fixed they are changing in time from era to era to era to era. So, h is the interpolation matrix we have already talked about how to design h in our module 3.6 just to refresh your memory h is a matrix. So, this is the model space this is the observation space this is x this is z this is h. So, h is a mapping from model space the observation space is called the forward operator. Sometimes h comes out of the physics sometimes h comes out of empirical rules sometimes h comes out of interpolation formulas. So, if I have a grid as in the previous case if I have a grid. So, in this in this particular example n is 20 in this case m is essentially 5 I have 5 observation locations and 20 computational grid m is less than n. So, this is one of the situation that they were facing in the in those days the idea of this presentation is to make clear how clever they were in the pre formal data assimilation era how they mimic some of the ideas and how they developed some of the ideas which we know much more formally and they had lot of intuition to be able to come up with these ideas. So, h is the interpolation matrix I can go between the computational grid and the observational network. So, I have now everything I have a prior information I am sorry I have prior information given by given by x naught I have observation I have h. If I were to have done Bayesian I would simply combine x naught with z either in a well either using not either using it using Bayesian framework and would have created a posterior and use the posterior as my new initial condition to run the model forward in time. But in those days what did they do they picked a weighted a weighting matrix this weight matrix is the one that has a distant dependent weighting scheme. So, almost all the methods that were used in the 1950s and 60s in different parts of the world essentially USA and Sweden as soon as spearheaded the development of these schemes. But those are induced were doing in Sweden in USA the national weather centre also developed similar iterative schemes. Both the schemes have similar mathematical structure they differed in the way in which the matrix W the weight matrix W was defined and used this type of scheme became operational in the early in the early 1950s. So, what is the basic idea? So, now let us go to equation 1 x naught is the background z is the information h of x naught you can think of it as a model predicted observation z minus h of x naught is the new information that the observation has that the background did not. So, you can think of z minus h of x naught as the innovation you multiply by a weight you add to x naught so what is that you get you get x 1 is equal to x naught plus W times z minus h of x naught. So, x naught did not have any information about z z did not have any information about x naught but x 1 is a linear combination of x naught and z you could have also rewritten this to be I minus I I minus W h x naught plus W z. So, this is the weight matrix that is used for initial condition this is the weight matrix that is used for observation you can think of x 1 to be the linear combination of the background information and the observation this is what I want to I want to see the emphasis here is that you can see the basin philosophy. So, they are trying to use the basin philosophy with it explicitly formalizing it in the basin framework. So, they were very clever to anticipate some of the things to come then what did they what did they do they then did x 2 is equal to x 1 plus W z minus h of x 1. So, what is the basic idea here they were z minus h of x 1 is the residual from x from using x 1 z minus h of x naught is the residual from using x naught but this second residual should be in principle better than the first residual because first residual did not has x naught did not depend on z z did not depend on x naught but in this case x 1 depends on z and x naught. So, you can readily see this iterative scheme x k plus 1 is equal to x k plus W times z minus h of x k became a very simple minded scheme. You can see the beginnings of Kalman filter right here we have already seen in the last lecture on Kalman filters in Kalman filter what is the form the posterior which we call analysis is equal to x prior plus Kalman gain times z minus h of x prior this is the form of Kalman filter we have alluded to in the morning and we derived the basic equations. So, I have prior information z minus h x p gives you the new information that the prior did not have that comes from the new observation I multiply the new information by k k is called the Kalman gain prior is also called background. So, you can see previous information and the new information together makes x a which is called analysis. So, posterior prior observation innovation that we talk about in this language even though they did not talk in this kind of a language the equation one had all the underpinnings of the modern methodology except for all the mathematical artifice that goes with it. To me that is very interesting because without knowing many things they had good intuition I also want to tell you one more Kalman filter was not invented until 1960 61. So, in 50s when these folks were working in operational centres interested with the job of producing for cash they did not there was no Kalman filter to speak off. But they are all clever people they know dynamics very well they also knew reasonably good statistics they were aware of good statistical principles for estimation and at the top of it they are very clever people they know how if you have two pieces of information you need to be able to mix it that is a fundamental idea that fundamental idea carries even today and is a centrepiece of any and every data simulation scheme of all types. So, this became the equation one became the work horse of the data simulation industry and was used both in USA and Sweden to be able to generate appropriate initial condition. So, let us talk about that part now. So, if you run this iteration you need to truncate this iteration you cannot continue forever when did you truncate you truncated the iteration when Z minus h of xk the norm of that vector became very small that means there is no more juice left in Z that has not been transferred to xk if all the juice left in Z has been transferred to xk Z must get closer to h of xk if Z h of xk is closer to Z if h of xk is closer to Z the innovation becomes smaller and smaller if the innovation becomes smaller and smaller there is not much juice left I can truncate. So, at the time when they truncated I am going to get a state so let us assume they truncated and at the result the state that results out of the truncation is called x star what is this x star this x star would now be used as the initial condition in the model from which they generate the forecast. So, the x star is the initial condition that comes as an estimate by running one iteratively until convergence the reason I wanted to bring this because even though this does not embody any of the mathematics that we have seen you can see the underpinnings of later date data assimilation methodology already inherent in these ideas that was proposed in the in the early and mid 1950s. In the United States the person who was spearheading this game is Krusman in Sweden it was Brakthosen and two do's the paper by Krusman the paper by Brakthosen and do's even today if you start reading them it is very inspirational how they thought of incorporating data with prior information to be able to have the good facility to be able to transfer data between observation network and the computational network. Now I am going to talk about the weighting scheme so what is the only thing that we have not specified how did they pick the weight matrix how did the Krusman scheme and the Brakthosen scheme differ they differed in the way in which they describe or they picked the weight matrix so essentially both were running along similar tracks but used different weights because of their belief in different philosophy of the influence of observation and the grid sorry yeah. So now I am going to talk about the basic idea is the weighting scheme that is that was used by Krusman so weighting scheme. So consider the ith grid point the grids are numbered so let us go back to the little diagram again this let this be the computational grid in this case 1 2 3 4 5 1 2 3 4 5 I have 5 times 5 I have a set of 25 points so n is equal to 25 let me assume the observations are sparsely distributed like this maybe like that the grids are numbered this is 1 2 3 4 5 and so on. So let this be the ith grid point you pick a D greater than 0 and consider to be the radius of influence the radius of influence is very good idea so let us talk about this now does the temperature in Bangalore India does it depend on the temperature in Delhi India the distance between Delhi India is way too much therefore you would expect the temperature between Bangalore and Delhi to be less correlated. However if you consider a town which is only 10 miles away from Bangalore downtown there will be better correlation between that small town which is only 10 km as opposed to over 2000 km so what does that mean if I consider a grid point I if there are observations around the grid point which observation location will be affecting the quantity of state at the location I so they said well like everything else in life I have to consider what is called a circle of influence or a radius of influence this radius is called D so what does it mean you take a circle with IS a centre D is a radius I am considering a 2 dimensional problem now you can imagine a 3 dimensional version of this problem later. So if I consider IS a centre where D is the radius look at all the observation location that are within that circle we are going to postulate only those observations that are within a distance D from the point I are going to be influential in affecting the state at the point I so that is the very beautiful concept that arises out of the notion of influence of one grid point on the other grid point or one observation stations on the grid point so like NID be a sphere or a circle of radius D centred at I let M I be the number of observations that are inside the sphere centred at I let R I J be the distance between the chosen grid point I and the location of the jth observation inside the circle of radius D centre at I so what does it mean in this case this is an observation inside this is an observation inside this is an observation inside so in this case for this M I is 3 in this particular example and then I can also compute the distance. So let us assume this is the first observation this is the second observation the third observation so this is R I 1 this is R I 2 this is R I 3 so R I J is the distance from the Ith grid point 2 the jth observation J running from 1 to M I M I is the total number of observations that are bounded that are contained within the bound defined by a sphere or a circle of radius D I hope that is clear now. So I am trying to divide the observation into 2 groups one that influences I the other that does not influence I if I pick D to be very large every observation would affect I if I pick D to be very small only a smaller number of operations observations will affect I there is no formula for the choice of I but it makes sense to think about the notion of observation influence in grid and the influence dependent on the distance so that is a very beautiful and a fundamental idea. So what did Cussman and his group do they devised a weighting scheme W I J please remember that W is a matrix W I J are the elements of the matrix so W I J bar is the element I J the element of the matrix that affects sorry that affects the I th grid that affects the I th grid and so W bar I J is equal to D square minus R square I J divided by D square plus R square I J so this weight is less than 1 for all R I J less than D that was the non 0 weight for otherwise it is 0 so equation 2 essentially provided you a method by which they chose the weight once you choose the weight like this let me go back to my iterative scheme in equation 1 so I have a means I have a means of choosing W once I have a means of choosing W everything else is given in here I can run this iteration I can truncate it at an appropriate condition I can get X star once I get X star I have initial condition use that as an initial state to solve my initial value problem so that is how operational resource centers operated in those days. Now you may you may want to discuss why this scheme why not any of this scheme again I want you to understand this is simply a heuristic this is simply a heuristic so you probably cannot quibble with it too long too much because it makes sense it makes sense and in what way it makes sense when R I J becomes D the weight becomes 0 that means for points on the circle the weight is 0 for points inside the circle weight is larger for points out that circle weight is undefined so it is this ability to give that cutoff they probably decided the formula to be like this there are very many differing methods for computing this weight. Brickthorsten and Dewes picked weight according to one formula Krusman did another by another formula in those days if you look into the literature there are at least half a dozen different ways of picking W mathematically speaking all are equivalent that W essentially let us go back W essentially decide the strength by which I am going to update XK to become XK plus 1 in other words how much weight it gives to the innovation. So the Krusman algorithm continued so I have XK plus 1 is equal to XK plus W Z minus H of XK again X naught is the background state this is called the innovation I would like to be able to create my W we give W bar in the previous equation 2 I am now going to normalize my weights so I am going to define W I J to be equal to W bar I J divided by S I S I is simply the sum of all the numerator so this way I am going to make sure that the weights are normalized and the individual values are going to be between 0 and 1. So this is a very nice scheme 4 so if I use the scheme 4 I am sorry if I use the scheme 2 to define W bar if I use the scheme 4 to normalize it and if I use this normalized weight in 3 you get the you get the Krusman scheme. I want to reemphasize Krusman in the mid 1950's had anticipated Kalman filter even before Kalman filter was invented to me that is the novel aspect of it this is Kalman NMAN sorry Kalman filter was invented and that is a very important information that one need to keep in mind. Oh sorry we have done this sorry now I am going to tell you for the sake of completeness one more weighting scheme that game in 1964 that is called Barnes scheme that is a little story I would like to tell Barnes was working at the University of Oklahoma where I am currently working University of Oklahoma and National History of Strom's lab they essentially pioneered the use of radar meteor use of Doppler radar in meteorology. So they would like to be able to develop schemes by which they can utilize the radar information and transfer the radar information on to the grid. So they did something in the mid 60's similar to what Krusman and Bektorsen has done in the mid 50's but instead of 500 millibar height the emphasis by Barnes was essentially using radar related data. So Barnes essentially developed a weighting scheme that is still rest on the notion of a radius of influence all points at a distance r a j from I that means if this is the point I if this is the region of influence if this is 1 2 3 m I this is j the distance from here is r i j so if r i j is less than d d is the radius of this circle the weight w i j bar is equal to is a Gaussian structure you can really see exponential minus of r i j square by d square. So it is a Gaussian shaped function it was 0 otherwise so it is a kind of a truncated Gaussian sitting over a sphere of radius d you can really imagine that that weight function. So this Gaussian weighting function was used and he was also little bit more sophisticated what did he do he did not keep the radius fixed he first started with a larger radius what does it mean he wanted to bring in as much of influence from as many observations as possible. So if I pick d to be large larger number of observations are going to affect the grid point so and then he continuously started shrinking the sphere of influence and what is the scheme he used to reduce the radius of the sphere of influence he decided r k plus 1 is r times r k where r is less than 1. That means he started from a larger radius and kept on shrinking it because he would like to get the benefit of as many as possible but he wanted to make sure those who are closer had more influence than those who are far away. So he essentially he had an adaptive scheme by which he adapted the value of the radius in some fashion by which he shrank did he shrink it to 0 no when it became a particular value at that point he kept it because he wanted to make sure that that that that he did not shrink the region of influence to 0. Why am I mentioning this I just want you to think about there are all kinds of interesting heuristic ideas fixed d variable d Gaussian based weight function other heuristic based weight function these are all the variations in the theme but the ultimate aim is to be able to consider a background control the observation try to transfer information from observation to the grid the grid is the one that I would like to be able to update. So you do this iteration process this iterative process in some sense is the data simulation scheme that were used in the 50s and 60s it is here they they they they combined the background information with the observation to be able to create the new mosaic which comes out of the iterative scheme the mosaic being a good initial condition from which they can run the model forward. So all these schemes were offline experiments they did to be able to create the initial condition x star what is x star x star is the value of xk at the time when you cut off the iteration and from x star then they generated the forecast. I think from a historical perspective and also to be able to appreciate how people did in the early days of computer era I want to emphasize that the data the subjective analysis was the order of the day until 19 mid 40s it is it is there was already anticipation that computers are just around the corner Hans Panofsky described in interpolation scheme but it was riddled with problems computers came for now I am improved the use of computers in making from 21 forecast if you have the ability to make a 24 forecast the interest in getting good initial condition to run the model forward became centre stage. So phenomena work provided the motivation to be able to create a very good initial condition and with that as a motivation research in Sweden and the United States sprung up but thousand twos cussman bonds these are various kinds of early examples of data assimilation schemes where they are very cleverly combined background information and the data to be able to create this mosaic and what is this mosaic in model language in some sense we can think of that as an analysis because it is a combination of prior with observation. Now if you think back what does data assimilation do in general I have prior I have observation you get posterior in statistics they call posterior posterior is called analysis in geosciences. So analysis and posterior exactly one in the same so I wanted to again reemphasize the cleverness of the idea the cleverness of the formula the anticipation of common like scheme even before convalent filter came to be to me to me these are very very important observation historically as well as these ideas are very inspiring should be inspiring to anybody. So again if you have W bar I j given by 5 I can normalize it as in 7 I can normalize it as in 7 so the initial so the initial field please understand what is the idea here I am interested in creating the initial field initial field is obtained by iterating this starting from x0 the background and the observation so that is 8. Now I would like to indulge your interest in a theoretical question suppose you can you you iterated like this will it ever converge under what condition this iterative scheme will converge that means am I guaranteed to get an analysis if I ran the iterative scheme of Cursman as an example unto asymptotic time I think the convergence result of the convergence question becomes very fundamental and important that has already been worked out so I am going to provide you a glimpse into the concept of convergence of iterative schemes used in the early days in data simulation. To do that you multiply the equation 1 you may remember the equation 1 xk plus 1 is equal to xk plus W times z minus h of xk now multiply both sides of the equation by h now I change the variable so I am now trying to bring in so until now everything was little heuristic now I am going to talk about some mathematical structure relating to convergence of the iterative scheme to understand will it ever converge if I know it is going to converge after a longer period of iteration I can truncate it if I truncated it the truncated it should not be too far away from the convergence because I will improve convergence so to be able to understand the quality of the state that it will get when I truncated before at some after some finite iterations one need to be able to inquire about will it converge had I got had I not stopped but continue to examine this I am now going to concoct a new variable eta k is equal to h of xk I am going to change the name of the product matrix h and W to be t with this change of notation this equation becomes eta k plus 1 is equal to eta k plus t times z minus eta k here t is a m by m matrix m by m matrix look at this now while the scheme originally the scheme one was originally defined on the computational grid the equivalent scheme 10 is defined in the observation network why is that how did I bring the iteration from the computational world to the observation network by the magic of h please remember h is the bridge between observation world and the computational grid world or the model space so you can think of convergence in both the domains because I can translate the results from one domain to another domain so I would like to understand the asymptotic properties of the iterative scheme on 10th given by you can now rewrite 10 as you can subtract z from both sides let us go back you can subtract z from both sides and after little bit of an algebra you can read you can see you get the equation 11 eta k plus 1 minus z is I minus t times z minus eta k so iterating this iterating iterating this I think I yeah it I think this might be eta did I did I did I screwed up I think it must be this must be I am sorry one second let me check on that z eta k minus t z now this must be minus this must be plus I am sorry this must be eta k minus z the same structure as this sorry for that for that for that error so you got you got 11 now I can iterate 11 if I iterate at 11 eta k minus z is equal to I minus t to the power k eta naught minus z so what is eta naught eta naught is equal to h times x naught what is z is the observation so eta naught is my predicted observation x naught is my background so it is a predicted observation from the background using h z z z is the actual observation the difference between the two is the innovation that innovation is going to be multiplied by I minus t to the power k to get eta k minus z so you can think of eta k minus z as an increment so this is an initial so this is an initial increment increment at k equal to 0 this is the increment at a general time k the increment at time k is the kth power of the matrix time the increment at time 0 so if I had an equation x k is equal to 8 to the power of k x naught if absolute value of a is less than 1 you readily know limit k tending to infinity of x k is 0 therefore by in analogy with this now you can see if the kth power of I minus t goes to 0 as k goes to infinity eta k minus z will go to 0 so eta k will match the observation eta k will match the observation so what is eta k please remember eta k is equal to h of x k that means the state at the grid will match the observation if there is going to be convergence so the whole convergence now rests on the properties of the matrix I minus t to the power k or in principle it rests on the properties of the matrix I minus t please remember what is t? t is t please remind let us remind ourselves of this t is equal to is equal to h times w what is h? h is the interpolation matrix what is w? w is the weight weight was created heuristically w is again created heuristically by way of interpolation there are infinitely many ways of choosing h there are infinitely many ways of choosing w now we are going to ask what particular choice of w and h will give you a t that will induce the property namely I minus t to the power k will go to 0 that is the mathematical question here so what does it depend on it depends on that is correct now I am since eta k is equal to h of x k now I am going to substitute 12 in 8 so if you did this substitution now I have come back from the observation network to the computational network I have transferred from the observation network to the computational network iterating 13 since x0 is equal to xb what is xb? xb is the background xk minus xb what is xk minus xb that is what is called analysis increment xk is the current analysis at the result of kth iteration xb is the background so that is the analysis increment you can think of xk minus xb as a analysis increment z not is the initial increment in the in the in the prior information so you can see the analysis increment is expressed as the sum so for convergence as we have already pointed out I minus t to the power k must end to infinity if this goes to infinity convergence will happen when convergence happens we are guaranteed we will get decent combination of the prior and the observation decent combination of the prior and the observation so under what condition the convergence will happen so consider the matrix I minus again we are now we are going back to the matrix theory to be able to formalize some of the convergence results I under what condition I minus t to the power k0 it will happen only when the spectral radius of the matrix I minus t is less than 1 please remember spectral radius is related to is related to maximum eigen value maximum singular values eigen value singular values they are they are related concepts eigen value becomes the spectral radius for symmetric matrices singular values becomes the spectral radius for non- general non-symmetric matrices we already know rho a is called the spectral radius of a is equal to is equal to I minus t so if the spectral radius is less than 1 it is it can be shown that the power of the matrix is going to be is going to be tending to 0 now I am going to provide few other steps in here in general if a is the matrix I minus a inverse can be expressed in the power series and this series is the matrix analog of 1 1 1 over 1 by x is equal to 1 plus x plus x square so that is an infinite series we all know so this is equal to 1 minus x to the power minus 1 this is a standard series you all should know about it from basic calculus so if I put x is equal to a I get this series this series can be divided into two parts the first k terms the rest of the k terms and the rest of all terms I can take a to the power of k as a common factor from the second again this sum is infinite therefore this is equal to finite sum plus a to the power of k I minus I minus a minus this cannot be 1 this could be I am sorry this would be I therefore summation a to the power of j is given by this equation again little bit of algebra is given by this equation is given by this equation therefore this sum is equal to this sum this sum is equal to this sum and substituting 16 and 14 I get xk minus b is given by this therefore if I minus t to the power k tends to 0 tends to 0 xk tends to xa that is what is called analysis the limiting value of xk is called analysis therefore we have now found conditions under which this Christmas like scheme will converge the condition for Christmas Christmas like scheme to converge is simply that is simply that the spectral radius of I minus t must be less than or equal to 1. So, that gives you a design criteria and I am going to spend a couple of minutes on that now go back one can do a lot of simple experiments you can fix a particular method for h you can pick a particular method for w you can multiply this you can get a t and then you can do an Eigen analysis for I minus t and recovered and see when if I so you can keep h fix you can change w or you can keep w fix change h for which combination of h and w one gets a matrix I minus t such that its spectral radius is less than less than less than 0 if you can strike a combination you got you got you got the goal. So, that is the basic idea that is the design question. So, here we are not interested in talking about how to design we are in place interested guidelines to the design. So, I minus t the row the spectral radius being less than 1 provides the broad guidelines to the design of h and w. So, when this term goes to 0 again come back again this term goes to 0 17 reduces to 18 17 reduces to 18 and so x the analysis minus. So, what is that x a minus x b x b is the prior x a is the analysis. So, you can think of x a minus x b is called the analysis increment the analysis increment is simply calculated by w t inverse z minus eta naught you know what eta naught is and how do you solve this in order to be able to do this you do not have to compute the inverse of t. So, if you want to compute t inverse z minus eta naught you simply need to do the following solve the equation t y is equal to z minus eta naught then y will be equal to t inverse z minus eta naught I would why am I mentioning this I want to tell you couple of things now your interest is not in trying to compute t inverse you can do this by computing t inverse and multiplying, but I do not want t inverse I only want t inverse times z minus z naught. So, what is the best way to compute t inverse z naught please do not indulge in inverting the matrix t because it will take long time you simply write the linear equation t y is equal to z minus eta naught solve it y is indeed equal to t inverse z minus z minus eta naught. So, you solve the system you get so, you got this this is what we call as y. So, x a is equal to x b plus w y. So, I have now given you a pathway to compute the analysis starting from the background and an increment y the y depends on solving the equation in 19. So, that is the pathway to proving convergence and this is very educative because it tells the role of w it tells the role of h it also indirectly provides conditions on not on h alone w alone, but the product h w that creates t that creates t. So, what was thought to be a good heuristical method can be put in a in a for mathematical background once we do the convergence analysis. So, with this convergence analysis we have equation 20 equation 20 is the essence of the schemes that were used in 1950s and 60s. So, what is the basic scheme you give me an x b background on the grid based on the observation you compute w you compute I am sorry based on the grid and a rating scheme create a matrix w you create an interpolation matrix h you solve for y from 19 x a the initial condition mosaic that I am going to have to use to generate the model forward in time is x a it is simply the sum of x b plus w y x b plus w y. So, that is a very nice and important methodology that was used and very successfully 50s and 60s using these schemes they repeated this experiment every day to be able to create an analysis every day to be able to run the model forward in time and they were very successful in the in the early computer age with respect to creating good forecast. Now I am going to leave you with a couple of very good problems these are very important problems. So, I am going to spend couple of minutes on this generate generate a 10 by 10 unit 2 dimensional grid with the 100 points locate randomly 40 observations and the if I have 10 grid points there are only 9 times 9 grid boxes. So, what is the grid box that is come in here. So, there are 3 grid points here there are 3 grid points here there are 2 times 2 4 grid boxes. So, 1 2 3 4 5 6 7 8 9 there are 9 grid points there are 4 grid boxes your observations are going to go within the grid boxes. So, you are going to have to distribute 40 observations in 81 grid boxes generate XB which is R100 I would like to be able to ask you to generate the background mean is 90 WI is an is a noise you this must be normal this is normal 0. So, VI is normally distributed 0 mean and sigma square B as a variance I am going to ask you to pick sigma B square to be Phi that the that is the variance. So, you generate the noise add to 90 you get you generate 100 random vectors you have the background information then compute H which is the 40 by 100 matrix using the bilinear interpolation that we developed in module 3.6 module 3.6. So, that is that so, we have we have we have the bag. So, here we have generate the background here we generate H now I am going to generate observations observations are generated by this equation again this is again normal. Observational covariance is different from the background covariance. So, I have all the components now I have H I have XB I have Z you now need to compute the weight matrix using the Krusman method using bonds method there are 2 methods for doing this implement the successive iteration described above and iterate until convergence. So, it is a very beautiful scheme if you can do this on a computer then you can say you thoroughly understand some of the classical methods. These methods are still could be used they are very powerful they are very meaningful and these are heuristic developed but it is for the power it is for the simplicity it is for the elegance I believe these kinds of exercises should be done by anybody who is involved interested in trying to learn the tools and techniques for data simulation. Again the part 2 is I would like you to continue I would like you to be able to compute the matrix T which is equal to H times W using the H and W that we picked in problem the previous problem compute the Eigen values of T compute the spectral radius of A is equal to I minus T and check whether the spectral radius is less than or equal to 1 if it is 1 you are done if it is not the scheme still makes sense but it may not converge. So, what is that I would like you to do now vary the location and the number of observations in each case compute T is equal to H of W and compute the spectral radius and find out for what kind of combinations the spectral radius of T is greater than 1 for what kind of combination the spectral radius is less than 1 I believe it is a very interesting worthwhile research it 1 can even develop a master's thesis or perhaps part of a PhD thesis out of it. So, we only provided the general theory for convergence trying to utilize the general theory and transferring to the actual design questions of W and H I believe would be a worthwhile interesting research topic at them as PhD at least part of the PhD it can be one full master's thesis project. So, here this module follows chapter 19 in our book Lewis Lakshman and with that we conclude our discussion of the classical methods for data simulation. Thank you.