 In this talk we are going to be looking at a method that has come to be known as optimal interpolation. I would like to provide a quick historical perspective of this approach to prediction estimation based on a method that has come to be called optimal interpolation. During the 1960s and 70s operational centers in USA, Sweden, Japan and others routinely used iterative scheme like Bertoltz and Deuze or Krusman type that we talked about in the last lecture. These methods have also come to be known as successive error correction. You can see z minus h of xk in the iterative scheme which we can think of as innovation is also can be thought of as error and by iteratively performing the update we are trying to transfer the information from observation to from the observation network to the computational at work. So that method that class of methods Krusman type schemes have also come to be called successive error correction method. When this was going on in USA, Sweden, Japan and other countries in Soviet Union in 1960s a technique called optimal interpolation was championed by Gandine, Liv Gandine. Optimal interpolation was developed earlier independently by Norbert Wiener around 1914 in the USA and Kolmograf in the Soviet Union. The story goes that Norbert Wiener developed this method in the early 1940s but he did this work under a defence contract. So it was classified and by the time he could publish it it has to be unclassified. So it took several years before the first classification of Wiener ideas were known to the public. So the publication date the open publication date is around 1949 but he had also independently developed in the early 40s not knowing that Kolmograf was also working on similar problems. This goes to show big minds think similarly even though they have been working in geographically distinct locations. What is the basic idea of this method called optimal interpolation? I am going to illustrate this notion of optimal interpolation using a simple 2D grid problem. It can be extended to 3D grid as well. Here a 2 dimensional grid with nx times ny number of points, nx is the number of points along the x direction, ny is the number of points along the y direction, n is the total number of grid points. Let there be m observations inside this grid space. As an example I had nx is 4, ny is 4, so n is 16. I have a set of 5 observations z1, z2, z3, z4 and z5. We can think of these observations as scalar observation, temperature, pressure, humidity or concentration of some chemical whatever, quite different but essentially it is a scalar. So I am now going to consider 2 cases in our illustration. First let us assume the observations are perfect. Observations are perfect means there is no noise. Why do we do that? Even though we know observation generally are associated with noise I think it is good to get a grip on the idea by assuming that observations are perfect. So the field variable of interest as I said is a scalar field variable. It could be temperature, pressure, etc. Consider a time k, noon January 1st 2016 as an example. So k is a specific instant in time. There are m observational occasions. So at that time k there are m observations of the scalar field variable temperature pressure whatever and I have a m vector. The components of the vector are zk1, k2, km is a m dimensional vector at time k, zk. These are the observations that are available from the m observation stations. Now the observations that we are concerned with has a natural variability in itself. For example if you consider the temperature at the downtown Paris on December on January 1st of every year there is a natural variation the no 2 successive years will have the exactly the same temperature in downtown Paris at noon time on January 1st for the 100 years. So you can think of these scalar variables to be naturally varying. This natural variability is captured or is described as a random process. So given this natural variability we are going to consider zk as a random vector. So zk is random not because of the observation errors, zk is random because of the inherent natural variability due to the resulting from changes that are inherent to climate variables. It is assumed the observations are error free that is what perfect observations relate to. So the whole method of Wiener and Kolmograph which has come to be called optimal interpolation rests on the assumption that the properties of the random vector zk is stationary. What does it mean? The random process that describes the temperature time series is sorry has a stationary distribution. So what does it mean? z is a random vector every random vector has an associated probability distribution if that probability distribution is invariant in time that is called a stationary distribution. If that probability distribution changes in time is called a non-stationary distribution a fundamental assumption is that the temperature in downtown Paris at noon on January 1st of every year is drawn from a stationary distribution. The stationary distribution is the one that describes the underlying natural variability. So given the stationary distribution that is behind the realization of zk z bar is the mean of z z tilde is the anomaly z minus z bar expected value of z bar is 0 the covariance of z is given by expected value of z bar transpose and that I am going to call it c. So c is a m by m matrix c matrix talks about the covariance of the observations from different locations it is assumed that both the mean and the covariance of the observations are known. So let us go back in this case I have a grid with 16 points I have 5 observations I am now talking about observation vector. So let us fix the time so I do not have to worry about k let us fix the time k to be noon January 1st. So at noon January 1st I am getting 5 observations z is equal to z1 z2 up to z5 this will have a mean z bar this will have a covariance matrix c the covariance matrix c will consist of cij I runs from 1 to 5 j runs from 1 to 5 what is the meaning of cij cij is the correlation between the scalar variable at location i to the location j. So in order to be able so it is assumed that this correlation is known that is a key to the methodology. So the whole question is we assume the distribution stationary we may not be able to get a handle on the stationary distribution but what is that one can do I should be able to estimate z bar and c how would you estimate z bar and the covariance c. So let us look at the station so let us take let us take the time series of observations so z1 so if I now consider the time if I now consider the if I bring in the time now z becomes zk zk is equal to zk1 zk2 and zk5 therefore if I consider station 1 I am going to have a time series of that measurement over a long period of time. So we are for example I am trying to measure the temperature in downtown Paris at noon every day so that time series is known so once that time series is known so what is that we are going to assume I am going to fix the location downtown Paris I am going to fix a particular instance in time namely noon of a given day so every day in downtown Paris I am going to measure the temperature and there is a record of it and that time series is made available to me that time it is a long time series likewise so downtown Paris is only one observation station and I picked for an example in this illustration there are 5 such spatial observation towers or observation locations from each location at each instant of the day at each hour of the day there is going to be measurement of the scalar variable of interest so these sensors are going to spit out the values of the observed quantity and I am assuming I have already recorded so that is the fundamental assumption. So given a long time series of observation from each of these locations I can now compute the mean so I can say downtown Paris what is the mean so what is that I need to consider what is the mean temperature in downtown Paris on noon January 1st what is the mean temperature in downtown Paris on noon January 2nd so day by day by processing the data I should be able to compute the mean once I compute the mean if I have time series I should be able to compute an amelie once I compute the amelie I can compute the variance of the measurements at a given location this I can do for every location so I will have young locations for a given time I will have young means I will have young anomalies I have a time series of anomalies using the time series of anomalies I can also now compute the covariance so how do I compute the covariance let us say Cij Cij is equal to let us let us consider I am fixing the time right now so time k is fixed so I have an anomaly at station i I have a time series of that with respect to k so Zi tilde k is the anomaly at the kth instant in time at the IST station Zi tilde these are scalars I do not have to have even written like this I simply can square the anomaly once again I can I can square the anomaly sorry I can square the anomaly I can sum this anomaly over k is equal to 1 to n I can compute the by this and this will give me this is not ij this is ii so that is the variance at a given station now I would like to be able to compute the very the elements Cij this is essentially equal to Zi i tilde a time k Zi j k tilde the product of the anomalies at station i and j I am going to have to sum this up over k is equal to 1 to n over n so this is an estimate of the covariance between station i and station j this is the variance of the station i these two together decides the elements of the covariance matrix C so what is the fundamental assumption there are two key assumptions that are here I am assuming the scalar field variable of interest such as pressure temperature whatever it is has a natural variability this natural variability can be captured by a stationary distribution way stationarity in distribution analysis of non stationary stochastic processes are extremely difficult there are very few results in the analysis for the analysis and quantification of properties of non stationary stochastic processes the only thing we know is to be able to pin down analyze characterize the properties of stationary stochastic processes so in here Z k is the observation vector of time k is said to arise out of S under assumption is said to arise out of a stationary distribution stationarity is fundamental to anything that we can do computationally that is the fundamental key so it is the limitation that is that is imposed not because we wanted but because if it is not stationary there are not too many things we know how to do analytically so stationary assumption is fundamental in almost all of the time series analysis and likewise here I am assuming that I am having a parallel time series at each of the m locations one for each station from each of this parallel time series I can now compute the statistics the individual mean the mean vector the individual variances the covariances these two together summarize the vector Z bar and the matrix C and that can be done if you give me a long series of time series at various stations so we assume that we have availability of Z bar and C at m stations so that is what exactly is what I described is now reinforced in the slide let me quickly reinforce that so computation of the spatial covariance C is the computation of spatial covariance among m observation stations Z is the time series of the field variable Z bar is the vector of means so I am now doing instead of doing individual stations I am collectively doing for all stations Z k is a vector Z bar is a vector I am trying to take the average of capital T number of observations Z Z tilde k is the vector of an amelies earlier I talked about the individual an amelie now I can collect them in the vector it is a vector of an amelie if the vector of an amelie is known I can now compute the covariance matrix as Z tilde times Z tilde transpose the summation over k 1 over k this C is symmetric positive definite that is the assumption why if the number of samples is large if the number of samples has been collected over a long period of time I think one can readily see this C covariance the covariance is always symmetric if the number of data is small it may not be positive definite if the number of data is large that is the reason to believe that C so computed would indeed be symmetric and positive definite as well so I am going to make an assumption that C symmetric and positive definite given so what does C captures that is the important thing that is the fundamental idea of both Kolmogorov and Wiener and they propose this in nineteen early forty one forty two the matrix C captures the natural variation in the observation natural spatial variations please remember this is not related to the observation error covariance I should have said error here observational error covariance observational error covariance is R in this case R is 0 because we have assumed the observation are perfect so this is the this spatial covariance essentially captures the natural spatial variability of the field variable of interest in the region covered by the in the geographical region region covered by the YAM observation stations that is the fundamental idea that is the starting point now let me go back and talk about why are we interested in this correlation please understand our ultimate aim is to predict to predict I need to assimilate to predict there are only two kinds of things you can bank on to be able to generate a prediction one is causality another is correlation so if you are trying to use models to predict the models represent the causal relation that exists and that helps to describe the underlying physical process for example the dynamical system that relates to the motion of the earth around the sun the dynamical system that relates to the motion of the moon around the earth this combined dynamical system we have understood very well and using it we are able to predict the lunar and solar eclipses to very high degree of precision so this dynamical system essentially kept has captured the causality principle that underlie the motion of the moon around the earth and the earth around the sun so likewise every model be static be dynamic be deterministic be stochastic all models in some sense encapsulates some form of this causality principle momentum is concerned energy is concerned these are all causality principles in the case of time series I derive empirical models autoregressive moving average models they are empirical models these empirical models in time series are essentially derived out of correlation so what is that we do we look at a time series you compute this temporal correlation we see how the temporal correlation decays with time if it decays fast the process has less memory if it decays very slowly the process has long memory mathematicians have already analyzed properties of several different types of time series models and they have cataloged and they have created an album if you wish of correlation structures the various models so given a time series data you compute the correlation structure of the given time series and compare it against the album and look at which one looks closest it may not be one it could be a subset of 2 or 3 you narrow down the window and then try to use each one of these models to be able to distinguish which model is better so what is the underlying theme to be able to predict either you need causality or correlation this approach to optimal interpolation rush on an ability to predict based on correlation. So what does it mean if two quantities spatially separated is positively correlated means what if the variable in one of the stations increase there is a likelihood the other station will also increase because there is a positive correlation if there is a negative correlated if one increases other will decrease so I can infer from the increase or decrease of a particular quantity in a given station and knowing the correlation I should be able to predict what will happen at the other station that is the fundamental principle that underlie predictive science based on correlation so fundamentally ultimately the aim of data simulation is to predict data simulation essentially tries to help you to fit the model to data fitting model to data is essentially model calibration so the forecast generated from calibrated models are better than the one generated from uncalibrated model that is why we do data simulation the alternative to that is to be able to understand spatial and temporal correlation structure Venus theory applies both to temporal correlation analysis as involved in time series analysis as well as spatial correlation analysis as would be interest as would be of interest in any geophysical science. So the metric C captures the natural variability of the variable of interest in the chosen geographical domain now so I have please go back to my to my original picture here I have a grid I have an observation stations so the C is the covariance among the observation stations but my interest are in computational grid for example if I have established an observation network the observation network is going to be fixed in spatial distribution computational grid depending on the computing power I can change the computational grid at well if I have a larger computing power I can have a smaller grid size and a larger grid number if I have a smaller computing power I may have a coarse grid and a smaller number of total grid points so computational grid there is no fixed value for that it depends on what else I am interested what kind of processes I am interested in analysis and so on. So now what is that we would like to be able to do we would like to be able to transfer the knowledge from observation network to the computational network that has been the theme in the successive error correction that we have been talking about in the last lecture as well iterative methods so the theme is very similar except that this is this new idea is the rooted deeply in the correlation structure. So if I have values in the observation network please recall our ability to transfer information from observation network to the grid network this is Rm this is Rm this is this is Z this is H this is H transpose this is H transpose. So I can go from one network another network by this interpolation scheme. So what is that we have done from the observation stations we can also think of having think of having a time series of values at the good point using this interpolation. So if I have a time series so let us fix a particular time noon January 1st I have M stations I have a H matrix that can interpolate between the computational grid and the observation network yes you may say hey if you interpolate it is not the interpolation is going to incur error yes it may I am cognizant of that but I am I want to be I want the reader to appreciate that I have the ability to lift the information from one network to the other network through HM H transpose. So let us pretend now I have a corresponding time series of the same physical vary a field variable of interest at each one of the n grid points the time series the is the same in other words at noon I have an observation station I at noon I have interpolated value along the grid noon of today 2 o'clock of today 5 o'clock of today every day so by hour by hour I have a time series going over let us say 50 years of data I can do that. Again I want you to recognize while the observation network is fixed the computational grid may be changing so if I fix the computational grid and if the computational grid embeds the observation network there is one way to be able to lift it from the observation network the computational grid. So once I have the same time series at the computational grid I have a vector x from the vector z this vector x it belongs to Rn vector z belongs to Rm if I have a time series on the grid I have I can compute its expected value the proper then I can compute the anomaly the anomaly is such that it is mean is 0 now comes the important thing let D be a matrix that captures the cross correlation between the grid variable and the observation the grid and the observation network. So x tilde is a vector that is defined on the grid x tilde is a vector so x tilde is a vector belonging to Rn z tilde is a vector belonging to Rm so if I multiply x tilde by z tilde transpose you can think of this as this is the column that the row there is an outer product matrix this matrix is going to be Rnm if I took the expectation of this outer product matrix that is D so what is D? D is a n by m matrix that relates to the cross correlation between the grid variables and the observation variables so this is the cross covariance between grid values and the observation variables. So C and D are going to play a very major role in our analysis now please remember C may be fixed in time because observation network once we establish a network we cannot change but D can change in time I am sorry the computational grid can change in time n can change in time therefore D can change in time. So how do you get from observation to C to D is to be able to elevate interpolate from observation network to the grid network this interpolation scheme there are several such schemes I have already talked about a bilinear interpolation that interpolates between grid and the observation network so we can use one of minutes. So what is that we have I have access to C now I have access to D C is the covariance of the given field variable of interest at the observation location D is the cross covariance between the grid and the observation network so you have there is lot of statistical computation so if you have a time series over 100 years if you have a time series over 100 years hour by hour minute by minute your dataset may be very large from that set dataset you have to crunch the value of the matrix C you have to crunch the value of the matrix D that could be done very routine calculation it does not take too much time here I am just trying to describe formally the notion of computing the elements of the matrix I want to reinforce that again so let us quickly run through this calculations let XK be the state variable vector of the same field variable at the grid locations at time K we may have a time so if Z has a time series there is a corresponding time series for X once I have the time series I can compute the mean I can compute the anomaly I can compute the cross product which is D that is what so there we talked about a concept here we talk about algorithms so concepts algorithm concepts algorithms algorithm for C algorithm algorithm for D or given now what is the statement of the problem let I am sorry there is a spelling problem here there has to be a T let the observational covariance C and the cross covariance D be known matrix C and matrix D be known so what is C and D represent C and D represents the stationary values of the covariance between observations and covariance and the cross covariance between observations the grid because the underlying process of stationary C and D does not change in time especially C and D does not change in time if you consider a long time series and so what a stationary means I want to comment on that little bit if the regime of the climate has changed the underlying distribution would have changed so what does stationary assumption means in practice we are assuming the regime under which the climate is operating has not changed statistical that is the idea so how do you know the regime we are operating has changed or not that is a different question we should have a long time series you should break the time series into a different parts we should calculate different quantities in different times and see whether these statistical quantities have changed over 100 years first 100 years second 100 years third 100 years first 50 years second 50 years first 10 years second 10 years so the Kedel variation century variation annual variation so one can do statistical test for regime changes regime shifts if one has access to a reasonably long time series in other words one can test the hypothesis namely if the underlying statistics is invariant or change so we so it all it all depends on how much data you have if you do not have much data you cannot even do that testing if you do not have much data if you cannot do that testing just play the Lord and assume it stationary so that these these are some of the key things that one has to keep in one's mind now so what does CND refers in some sense they refers to the climatology and that is the important thing here CND refers to the underlying climatology the stationary aspects of the climatology suppose a new year dawns January 1st 2016 today I have computed CND over the past years a new observation from my observation network arises on this new day these the observations available only the observation network but I would like to optimally compute the values induced on the grid X so I hope you understand what I am talking to talking about CND are known CND are based on the past values they represent the climatology a new day dawns a new day brings a new observation Z the new observation C is confined to observation location M of them but I would like to be able to elevate that M to N good points how do you do that how do you do it optimally so that the statement of the problem given a new observation Z and the climatological information in embedded in CND how to optimally compute the induced grid value X from Z so I know Z bar what is Z bar Z bar is a long term average of Z because stationarity I assume Z bar has stabilized so Z bar is in varying in time so Z tilde is equal to Z minus Z bar there is an amelie of Z with respect to the long term average Z bar similarly let X tilde be X minus X bar I know X bar because I have computed D with it I do not know X X is the one I want to be able to determine but instead of computing X I am going to compute X tilde so if I compute so computing X is equal to computing X tilde because if I know X tilde I can add X tilde to X bar to get X so I am going to work with anomalies so that Z bar be the observation anomaly on a new day let X bar I am sorry Z tilde be the observation anomaly on a given new day let X tilde be the corresponding induced anomaly in the grid my job is to be able to optimally determine X tilde from Z tilde knowing C and D knowing C and D please understand the background X bar is known we can recover X once X tilde is computed so we are simply going to concentrate on computing the X tilde anomaly so what is the optimal interpolation approach please understand this is one of the earliest known methods in predictive science the basic idea of this of the optimal interpolation is to express the analysis increment what is the analysis increment X minus X bar so X tilde I what is X tilde I X tilde I is the new increment at the location at the grid point I that is to be gleaned from the new observation I am assuming I can express my X I tilde as a linear combinations of the observation increments Z tilde so let me let me I want to make sure my Z tilde is not that is right the Z tilde so Z tilde is the vector of observation increments I am going to compute location by location grid location by grid location I is the ith grid location X I tilde is what I want to compute now I have to concoct a model that relates the known to the unknowns X I tilde is not known Z I Z tilde is known so I am going to confine my attention to the class of recovery process to a class of estimators where X I tilde is expressed simply as a linear combinations of the elements of Z tilde so I want you to think about that that is it so what is that we are going to do if I am going to have a linear combination I need to have the weights so let W be the vector of weights that are going to be used in the linear combination W's are unknown weight vector so given this philosophy of connecting the known to the unknown let the unknown be the linear combination of the known so I could express this as W transpose Z tilde therefore under what condition X tilde I will be optimal it will be optimal under the optimal choice of W vector so the whole problem now reduces to finding an optimal vector optimal weight vector W optimal in what sense again we come back to these squares both Kolmogorov and Wiener independently in 1941 they were separated geographically they did not talk to each other but they had come up with very similar ideas so the mathematical problem now reduces to the following find a vector W belonging to RM that minimizes the mean square error between the value of the increment at the I at the grid from the linear combinations because underlying quantities are all naturally random I have to take the expected value so the objective function here is the mean of the square of the error so X I is the I at the component of the analysis increment vector at a given time so all these things are done at a given time if you are interested in various time I have to repeat it at a given time so with the loss of generality I am assume K is fixed K the time is fixed so Z tilde is the vector Z tilde I is the I at the component of the observation increment for convenience we have suppressed the time index because we have fixed the time this is an this is an analysis being done at a given time so if you have a sub routine that can do this I can repeatedly use this for various times I hope the formulation of the problem is clear now look at this now I am falling back to least squares but I am falling back to least squares not with respect to static model or dynamic model but based on correlation structure cross correlation structure stationarity assumption and so on F of W can be now written as this because it is like A minus B whole square expectation operator I can pull it into each of the term expectation of the sum is sum of the expectations the first term is the variance of Xi expected expectation of the square of Xi tilde is the variance expected value of expected value of Xi tilde Z tilde transpose it is essentially the I at the row of D so that is denoted by D I star please remember D is the cross covariance between the grid and the observation location so I know every grid point will be related to every observation locations so here I would like to talk about a particular difference between Cussman scheme and other scheme in Cussman scheme what is that we assume given point the number of observation station that I can affect the given grid point are those that are lying within the radius of influence D that came in 1950s mid 50s in Venus time in 1940s 4142 they did not restrict the influence of one and the other they assumed everybody is going to influence everybody but they were they were interested they were measuring the influence through correlation and cross correlation so I the I at the grid point has a cross correlation with every observation station and that is given by the I at row D I star so that this now becomes 2 times D I star W now I would like to go back to the W is a column vector I want you to understand W is a column vector D I star is a row vector so D I star W I am sorry D I star W that is an inner product it is a scalar because one is the row another is a column vector now that last term is essentially W transpose CW you remember C the correlation among observation locations now look at this equation 9 equation 9 relates to the Ith row vector of D the entire matrix C and the variance of the field variable at the Ith grid point all of them are known the what is the only thing that is not known W now please also realize this F of W in 9 is a quadratic form so what is that what is the mathematical problem pick W such that this quadratic form is a minimum how many times we have minimized quadratic forms in this class million times so one is a quadratic variable another is a linear variable another is a constant so computing the gradient of F of W in 9 I am not going to do the arithmetic I very strongly encourage you to apply the principles from multivariate calculus that we have talked about the gradient is given by 10 the Hessian is given by 2C we have already assumed C is SPD so Hessian is SPD say if I equate the gradient to 0 given that the Hessian is SPD it must be a minimum because the function quadratic function is a convex function is a unique minimum so the solution of 10 is given by 12 let us look at that now C is the matrix C is a m by m matrix W is a vector which is m by 1 vector that must be equal to D I star transpose please remember D I star transpose is the Ith row let us go back to now D D is a n by m matrix therefore in the matrix B if I consider the Ith row the Ith row is the m vector because this is m therefore D I star is the Ith row D I star transpose is the m column vector so this is of the form Ax is equal to b where a is an SPD now do you remember this is a linear system with symmetric passive different matrix I want to solve this how many different methods we have seen I can solve this by Cholesky I can solve this by qr I can solve this by SVD I can solve this by iterative scheme such as Gauss Seidel there are ever so many methods I can also solve this by gradient method I can also solve this by conjugate gradient method now you can see the power of the mathematical tools that we have already developed that is the key we are looking at the foundational aspects of algorithms to do data assimilation no matter what variety of assimilation process we are involved in please remember this idea is extremely different from any other ones we have already seen yet the tools we have developed are very helpful computing the quadratic form computing the gradient Hessian minimizing the quadratic form solving the resulting equation by matrix methods by direct optimization methods those are all tools in your toolbox in fact I would like to remind all of you that is the famous saying if hammer is the tool you hammer is the only tool you have every job looks like a nail I want all of you to write this down and put it in your study if you have only one tool and that tool happens to be hammer what is that you can do with the hammer you can only hit so the famous proverb is if hammer is the only tool you have every job looks like a nail you can only hit the nail so what you cannot do depends on what you do not know what you do not have so larger your toolbox better days now please understand our toolbox has been filled with tools rich tools that can be used in solving problems therefore 12 is a solution of a symmetric passive definite system I can solve this by one of several methods I would like the reader to indulge in this process and convince oneself that I can solve 12 by one of many methods but now by repeating this process for each eye we can obtain analysis increment over the entire grid but now look at this now if I want to solve this so for every eye the left hand side remains the same only the right hand side changes so we are going to have a collection of n linear system with the same matrix C but the right hand side is different what does it mean if you compute the Choloski factor for C once I can use it repeatedly for every solution for various grid points so that is a computational advantage so we can solve this efficiently by using Choloski decomposition of C once and repeatedly using the Choloski factor to solve the grid variables at each of the n locations this method has come to be called this method has come to be called optimal interpolation in the area of geophysics there is a method called Kriging I am sure many of you who are in geophysical sciences could have heard of that Kriging is a method essentially an optimal interpolation method. So, Wiener's method have been applied to many many different fields of activity it has been applied under the camouflage of very many different names Kriging is one such name that has been applied in geology and geophysics Kriging is if you look at the mathematics of it Kriging is essentially Wiener Kolmogorov's optimal interpolation the person who popularized the application of optimal interpolation especially in atmospheric as well as oceanographic sciences is the Russian scientist Liv Gandine Gandine has published a book that is totally devoted to application of OI to solve several different problems of interest in climatology oceanography atmospheric sciences and so on and I would strongly recommend the reader to take a look at this extremely good book by Liv Gandine. Now I am going to talk very quickly about the extensions case 2 noisy observations in practice observations are noisy noise is unavoidable therefore my observation Z is equal to Z bar is the mean Z tilde as the anomaly until now we had only Z is equal to Z bar plus Z tilde now we have a new guy coming into the table coming into the game that is V I am going to assume is Gaussian or is SPD I am also going to assume that the natural variability in Z and the observation noise are uncorrelated that makes sense the climate does not care for the instrument you used to measure so V essentially comes from the measuring measuring process the measuring process is not or it does not affect the natural variability of a climatic system. So the cross correlation between the natural variability of field variable interest and the observations are 0 in this case I can now compute the covariance of since we have discussed many of these things in data I am going to hit all the major points so Z bar is Z tilde is given by this so its covariance is Z tilde times Z tilde transpose there are 4 terms the cross terms vanish because there is no covariance between V and Z tilde so we are left with covariance of Z tilde is C plus R now look at this now what is the role of R R increases the covariance earlier R was 0 because there is no observation noise observations are perfect this is the only difference rest of it all the same so wherever there was C you replace C by C plus R solve the problem. So again it is reasonable to assume the analysis increment is correlated with the it is reasonable to assume the analysis increment is uncorrelated with observation noise sorry sorry sorry uncorrelated and that is described by this so I can define D D is given by the similar formula I have an FW which is the weight vector so the whole thing I am simply again assuming the grid analysis increment is the linear combination of the observation analysis increment. Now how good this is the linear assumption you can change that but to get the ball started Kolmogorov and Wiener assume that the given increment can be expressed as a linear combination of the observed one so you can probably in a given situation you can try other combinations of the dependence of X tilde and Z tilde and do the analysis but the trouble is if the instant you bring in nonlinearity that is going to be computational trouble. So you have to worry about is the trouble worth it if you take the trouble and spend more money to solve the nonlinear problem is it going to improve your prediction estimates so these are all open questions I am not going to indulge in all other possibilities it is simply to provide you a new way of thinking about prediction. So if you consider FFW and again I have given all the details I would leave this as an exercise so this is the quadratic form you compute the gradient you compute the Hessian you can see wherever there was C C is replaced by C plus R so except for that difference there is not much so the optimal W is now obtained as a solution of this if you set R is equal to 0 I get the previous equation. So knowing so what is the fundamental idea knowing the correlation structure of the field variable we can comma this is lower case we can easily lift the information from the observation network the computational grid that is the that is the essence of this discussion this module is contained in chapter 19 where we have discussed this at greater length book by Louis Lakshmi Rahandal 2006 with this I hope we have given you an opportunity to develop an appreciation for and another way of thinking about prediction, prediction creating analysis how to transfer the new information and combined with the old information I want to talk about that part that part a little bit Z the new observation that is the new information C and D old information so in some sense you can think C and D as an equivalent of prior Z has a new new information maybe base is creeping in very quickly here there are two pieces of information I am trying to combine them I am trying to combine them so as to minimize the error in the estimate least squares comes in creeps in secretly so base comes in secretly least squares comes in explicitly you can see the commonality between various techniques even though the basic assumptions with which these techniques are developed are totally different with that we conclude our discussion on the optimal interpolation techniques thank you.