 In this module 8 we are going to be talking about prediction in the context of stochastic models and observations corrupted by noise everything is stochastic. Again here we are having a stochastic model which could be linear nonlinear we have a noisy observation which could be a linear function of the state or nonlinear function of the state again one can consider 4 different possibilities model being linear nonlinear the observation being a linear function of the state or nonlinear function of the state. I am going to start with the simplest possible case where the dynamic model is discrete time linear model the observations are linear functions of time the model is not perfect the imperfections in the model are compensated by some random input we will talk about the properties of randomized input to that is meant to compensate for the deficiencies in the model the whole aspect of data assimilation assimilating noise the observation into stochastic dynamic models is what we are after the data assimilation algorithm ultimately when we derive it is called the Kalman filter equation so the name filter in Kalman filter has a very special connotation we will talk about what is filtering what is prediction and so on a bit later but we will start with some of the basic descriptions of the model of the observation the Kalman filter also refers to sequential state estimation you may recall from our discussions with static deterministic estimation case we can have an offline or an online estimation techniques online estimation techniques are called are also called sequential state estimation in the sequential state estimation things keep moving forward in the 4D war on the other hand the adjoint takes you back 4D war methods are in general offline or all offline techniques so this is an alternative to 4D war where there is no going back everything keeps going forward the estimation the inverse problem everything is solved sequentially much like we had a recursive linearly squares in the context of static deterministic problems. So the model is a discrete time stochastic or random model this is the general description of the nonlinear discrete time model we have already seen M is the model map M is from R n to R n X k is the state this is the example of a nonlinear map this is of R this is an example of a linear system the addition of noise is new here W k plus 1 W k plus 1 so let us talk about the timing diagram and this is little bit of notation this is time k this is time k plus 1 at time k I am no I have known X k I would like to know the state time k plus 1 the model map maps X k plus 1 to M so M is called one step transition map if it is a nonlinear function it is called one step state transition matrix if it is a matrix this is the this is the this is the matrix in the linear case this is a map in the nonlinear case we use the same symbol M the juncture will tell whether it is a matrix or a map the W k plus 1 is the noise W k plus 1 is the noise that occurs after time k before time k plus 1 that means I know X k I would like to be able to compute X k plus 1 if there is no noise X k plus 1 would have been M of M times X k or M of X k depending on the models the linear nonlinear so X k plus 1 is a sum of what a deterministic model would have given you plus a noise that comes after time k so to emphasize the importance of the noise coming after time k I am going to denote it as W k plus 1 so W k plus 1 is the noise that affects the evolution of the system given X k the noise occurs after time k before time k plus 1 so X k plus 1 the value of the state at time k plus 1 is a sum of the deterministic part plus the random part that is the interpretation for making it W k plus 1 in some textbooks you will see W k plus 1 is called W k really does not matter we thought in order to make it very clear that it is the noise affecting the system after the state k is known it provides a less room for confusion so W k what is that it is it is it represents a compensation for the model error it is it is it is it is also a random process one of the simplest of the random processes one can think of is white noise what is white noise W 1 W 2 W 3 this is the sequence of noise that affects the system we say W is the white noise if there is no temporal correlation what does it mean expected value of W i W j transpose is equal to 0 in other words if i and j are two distinct moments in time if I am considered the noise W i and W j they are temporarily uncorrelated for all i not equal to j so that is what is called white so is an is a sequence of uncorrelated noise that affects the evolution of the system you may so x k is the true state I do not know that so I am trying to add a noise to make up for the deficiency and m could be a matrix or m could be a map now you got all the things associated with the model with respect to the observation again nothing changes essentially the same this is the non-linear function of the observation the linear function of the observation I am trying to do both of them simultaneously because we have gained a lot of expertise in trying to handle linear non-linear observation models and other things h is a map h is a matrix covariance of v k is 0 covariance of v k is r k I am sorry covariance of v k is r k mean of v k is 0 and v k is again I should have said this v k is r m sorry r m so w k is the model noise w k w k belongs to r n v k is the observation I that belongs to r m hope the description of the model and the observations are clear now now I am going to talk about the technical definition of the word filtering smoothing and prediction this definition is due to Wiener this definition is also due to Kolmogorov these kinds of definitions have been introduced in the literature since early 1940s Wiener 1942 Kolmogorov 1942 as I mentioned when I was doing optimal interpolation Wiener and Kolmogorov independently were thinking of the same problem Wiener was working in frequency domain that because it was an export in Fourier analysis Kolmogorov on the other hand was working in time domain so except for this difference in the domain of interest for analysis they essentially uncovered the same set of results. So let us give a technical of what what what filtering is in what sense the word filtering is used in Kalman filters. In general in colloquially filter means something that stops from certain things going are going out for example if I have a radio the radio has a tuner the tuner essentially filters out all the signals that are that does not belong to a particular spectrum. So we can call a low pass filter a high pass filter band pass filter we talk about coffee filters. So we know what in what sense filter filtering in an ordinary sense is used now technically filtering has a slightly different connotation. So let us talk about that now suppose I have observations in the interval 1 to n the observations are z i let us assume the observations are coming in discrete instances in time 2 3 4 k all the way up to n that z i i running from 1 to n be the collection of observation let us call the collection of observation as f of capital f of n n is a subscript n denotes the number of observations that we have n also denotes the last instant we have the observation. So given a set of observations z 1 to z n if I want to be able to make a prediction about the system let me do one thing I do not want to put a k here that is right if I want to be able to talk about the state of an estimate of the state of the system at time k k greater than n please understand this is the time interval over which I have the observation I want to make an estimate of a state of a system at a time k beyond n at a time k beyond n that problem is called the prediction problem as you rightly know as you rightly know. So x bar k is the estimate of a state of a system at a time in future how do I tell the time in future gays greater than n. So you can think of n as today now if k greater than n means it is the future so given all the information up to now trying to estimate the state of a system at a time in future that is the prediction problem that is the definition of prediction or forecasting knowing what I know today what will be the price of an IBM stock tomorrow that is the prediction knowing what I know today what will be the temperature distribution in early spring in North America that is the prediction problem. So on the other hand suppose I want I have I have known all the information from 1 to n I want to go back I want to be able to evaluate the state of a system xk for some k less than n that means I have the benefit of information from 1 to n I am still trying to go back to estimating a quantity at a time k k less than n. So k less than n means past k greater than n means future estimating a quantity in the past when k less than n that calls smoothing because I have the benefit of the entire observation 1 to n I am interested in trying to find an estimate at a time k in between 1 and n I can I am allowed to exploit all the observations and that problem is called smoothing problem. So prediction problem smoothing problem if I want to make an estimate at the time k is equal to n now so what is the idea here I have been given a bunch of observation from 1 to n I would like to be able to get a state of the system at time k k is equal to n that is called the filtering problem. So filtering problem is an estimation problem where I use all the information up to the time n and at that time I would like to be able to get the best estimate that is the filtering problem smoothing problem is given a set of observation from 1 to n I would like to be able to estimate a state at a time in the past prediction problem is trying to estimate a state given the set of observations at a time in the future. So what is given to you what you want to estimate what is the relation of the time index at which you want the estimate to be depending on the relation of the amount of observation available and the relative relation of the time index k with respect to n n is the last time at which the last observation is available we have 3 problems smoothing filtering prediction. So filtering smoothing prediction are 3 classes of problems this is a classic definition widely accepted this is due to the pioneering work of Wiener and these classifications are known since early 1940s. So what is the problem of Kalman filtering that is what we are going to talk about we are going to assume everything is linear everything linear means what so let us talk about this now. I also want to take one more moment to talk about the problems of the problems related to models so we talked about the model being stochastic let us consider the linear model xk plus 1 is equal to m times xk plus wk plus 1 wk plus 1 is the white noise therefore xk plus 1 is also random this is a model every model needs to have an initial condition x not I am going to assume even the initial condition is random the initial condition is picked from a realization from a prior you can think of that as a prior distribution with a norm which is normally distributed m not as the main p not as the covariance. So initial condition is random so if there is no noise if there is one if the initial condition random the solution is random if the initial condition is deterministic and if there is a noise affecting the model evaluation then the model solution is random in here I am considering the two sources of randomness that affects the state of the system one is the randomness in the initial condition another is the randomness in the noise. So the noise that affects the system the noise that forces the system the initial condition that is random the observation are noisy. So if I did not have noise analysis of a dynamical system with the random initial condition and random forcing that is called analysis of stochastic dynamical system. So I need to be able to first understand analyzing the properties of stochastic dynamical system how I can characterize the evolution of the state or what are the probabilistic properties how do I characterize the probabilistic properties of the state that is the first task the second task is suppose I give you on the top of it observations how do I bring in the observation in addition to the stochastic model analysis to be able to combine the model and the observation to get an analysis. So the model solution is now going to play the role of background observations are going to still play the role that it has played all along. So the model forecast playing the role of a background provides the prior the observation is going to provide you the new information we are going to combine them. So you can think of Kalman filter again within the Bayesian framework. So there are three sources of randomness initial condition is random model forcing is random the observation noise is random we are going to assume that all the three noises are uncorrelated. So what is the basic idea given a set of observation fk from time 1 to k find last estimate of x hat hat of xk look at this now I have given k observations from 1 to k I want the best estimate x hat k you can see from the previous slide this is the filtering problem. So x hat k is the filtered estimate what is the what is the characterization of filtered estimate that minimizes the mean square error again a least squares the magic of least squares comes again and again and again is inseparable. So what is the idea here xk is unknown xk hat is the estimate that is the error in the estimate. So xk-xk hat transpose times xk-xk hat that is the covariance of the error I am sorry I should not say the covariance of the error this is the inner product of the two anomalies I am sorry this is the sum of the variances of all components of the of the forecast of the of the filtered estimate and that is essentially given by that is essentially given by the trace the trace of this matrix a bracket is missing. So this quantity is inner product as a scalar this scalar is equal to trace of the covariance matrix please realize this is this is the covariance that is the trace so that is equal to trace of trace of I should say this I do not think this is this is this is correct I am sorry. So that trace is enough so now we have stated the problem I want an estimate xk hat of xk such that it minimizes this mean square error a that is the statement of the problem because I am given all the information up to k if I am because I am interested in estimating the state xk this is also called filtered estimate xk is called the filtered estimate or the estimator that estimates xk hat is called the filter equation that is where the notion of the filter comes in if I also can show that this filtered estimate is unbiased we have already seen minimum square estimate minimum square a minimum squared error is equal to minimizing the variance if the bias is 0 we have we have seen this relation when we talked about the Bayesian setup. Therefore it is very prudent to analyze and arrange things such that the estimate not only minimize the mean square error it is also unbiased these two combined together will give you the minimum variance estimate minimum variance estimate. Now please understand we are not now talking about linear minimum variance estimate we are not bringing linearity right now we are simply say I want to have a minimum variance estimate linearity refers to the structure of the estimator so that is I want the best that is exactly the whole idea in here. I also want to bring out one more this problem is called linear quadratic Gaussian LQG is that is a lethal combination linearity is the model in the observation quadratic nature of the objective function to be minimized and the Gaussian nature of the noise involved Kalman first showed this LQG combination is the lethal combination lethal in what sense we can get absolutely beautiful results is one of the very few cases we have absolutely beautiful results. So, one can ask yourself the question well sell them in life is linear why are you barking on linearity the problem is well none of your problems are hard to solve anyway I cannot solve them I can only approximate them. So, mathematically is interesting to ask a question which problems are solvable in closed form and what are the properties of the solution at least I want to enjoy the moment. So, the moment of enjoyment occurs when you deal with LQG problem. So, in the literature on control theory actually Kalman was a control theorist Kalman introduced this within the context of control theoretic arguments therefore within the context of control theory LQG theory is a very famous very popular very fundamental theory and this is an instantiation of the beauties of LQG. So, before I talk about so where are we described our model the model has two sources of randomness initial condition and the forcing observation has another source of randomness which is observation noise given a bunch of observation given a bunch of given the evolution of the dynamical state I would like to be able to estimate. So, given the observation also given the model information I would like to combine everything whatever you can do you do give me the best estimate in the sense of minimizing the mean square error which in addition if I add the concept of unbiasedness also gives you the minimum variance estimate that is the problem is settled to solve and that problem when everything is linear is called LQG. So, before we so I would like to separate the thing in two phases first is model analysis model forecast analysis or model forecast step. Let us take the baby step 0 to 1 once I understand what goes from 0 to 1 then I can go from K to K plus 1. So, let us consider let us consider the transition from 0 to 1 please recall my initial conditions are random I have assumed the initial conditions comes from a normal distribution with the mean M naught and the covariance P naught. Now, I would like to be able to separate several quantities of interest to us XK is the pure is the true state XKF is the forecast of the is a forecast estimate of the true state XKF is the forecast estimate hat is the analysis. So, this is the analysis this is the forecast. So, these are the two quantities we will go back and forth XK is the state of the system. So, XK is the says to the system this is forecast that is analysis. So, initially at time 0 I do not have any observation the only information I have about the initial state is that it is normally distributed. Therefore, what is my initial analysis my initial analysis is the mean of the initial distribution. So, what is the initial analysis covariance PK hat that is equal to P naught please understand analysis supposed to represent the best information I have. So, initially the analysis contains only information derived from the initial condition if you give me a Gaussian random variable as an initial condition what is the best estimate of the random variable is the mean what is the best estimate of covariance the covariance underline distribution. So, I am now going to postulate the initial analysis X naught hat is M naught initial analysis covariance P hat naught is P naught that is exactly the statement as well as the statement. Once I have initialize the analysis and its covariance I want to be able to generate the forecast I want to be able to generate the forecast. So, I would like to be able to now use them. So, okay again there is no observation now only model knowing what I know at time 0 if I feed this information to the model if the model gives me an output how do I generate the forecast from the model output knowing that the model output is random that is the question this is what is called stochastic dynamical system analysis. So, given X naught the predict I want to be able to compute the prediction of the state X1. So, X1 is the state of the system X1 is going to be a random. So, I would okay when I am trying to talk about predicting a random phenomenon I can only I need to think about decomposing the random process into one of two things there is a deterministic or a predictable part there is an unpredictable part. We can only control the predictable part unpredictable part we have no control for example the noise but very nature is not predictable. Therefore what is that I am going to define now again I want to bring out one more fact from the when we did the mean square error estimate what is the theorem we have proven within the base within the mean square error analysis the best estimate is the conditional mean we have already shown that. Therefore I am going to want to create X1F please understand my notation what is X1F X1F is the part of the forecast of the state at time 1 that I have control over that predictable part and that is equal to conditional expectation of X1 given X0 hat what is X0 hat X0 hat is the information about the initial state who is going to create X1 model is going to pull the X0 hat into X1. X1 is the true state of the model the true state of the model according to the model equation is M0 X0 plus W1 now I am going to talk about one more little thing you can write the model equation like this Xk plus 1 is equal to M of Xk plus Wk plus 1 I can also write the equation to be Mk of Xk plus Wk plus 1 in star so this is double star these are all important things that is why I am trying to spend little time both are linear but in here M does not vary in time here M varies in time the algebra the mathematics of it is not much different between M varying in time M and invariant in time so I am assuming I am sticking a K to M so what is double star mean I have a linear time varying model. If I can analyze the linear time varying model the time invariant model is simply take the K out of M so without loss of generality I can assume the model is time varying therefore M0 so if I am using my model to be time varying model X1 is equal to M0 X0 plus W1 that is the model equation please go back to my model equation earlier that is Xk plus 1 is equal to M of Xk plus Wk plus 1 for simplicity to get started I assumed M is a constant now I am sticking M sub K M sub K essentially refers to the fact that model could also be varying in time the algebra is no different if I can get a free write why not I would like to get the maximum benefit out of it. So if I use the model this is what the model will tell you as your X1 is I am sorry this is what the model will tell you as where X1 is but X1 is random what is the what is the estimate of X1 conditional what is the best estimate of their true state conditional expectation so conditional expectation of X1 given X0 hat that is equal to M0 of X0 hat because look at this now conditional expectation of so conditional expectation a linear operator conditional expectation of a sum is the sum of the conditional expectations the second conditional expectation is expectation of W1 with respect to X0 hat that is 0 because W1 is white it does not depend on anybody else. So M0 I already know at time 0 X0 hat I already know coming from time 0 so given X0 the best forecast I can make at time 1 is M0 X0 hat. So now this error is going to this prediction is going to be an error so X1 is the actual state X1F is the predicted state the difference is called the error in prediction E1F superscript F always refers to forecast or predicted quantities hat always refers to analysis quantities. So I am now going to get an expression for E1F that is equal to so X1 is equal to M0 X0 plus W1 X1F is equal to M0 X0 hat so if I substitute and simplify I get this but by definition this is equal to E0 so now I get a recurrence relation for the evolution of the forecast error now look at this is a beautiful expression the forecast error at time 1 is M0 times the analysis error at time 0 plus W1 the analysis error in the previous step is going to dictate the forecast error the next time that is how the analysis and the forecast are related okay now I would like to come by. So analysis is filtering analysis at the given time forecast is prediction so we have talked about smoothing prediction and filtering so in this process I already have filtering and prediction part of 2 so you can think of E0 hat is the filtered estimate E1F is called the forecast estimate that is the predicted estimate errors and so on I hope this is clear this is where the rubber meets the road we need to combine several things in here so what is P1? P1 is the analysis I am sorry the forecast covariance at time at time 1 so let us go back in here this is 0 this is 1 I had X0 hat I had P0 hat is equal to P0 I have X1 I have X1F I have to have P1F I already know this this is equal to M0 of X0 hat now I need to compute what this one is if you understand the step the step of going from K to K plus this one will become trivial so PK plus P I am sorry P1FF is equal to EF P1F U1FF times even of F transpose U1FF from the previous pages this expression that is that expression there are 2 terms in each if you multiply there are going to be 4 terms we already know the error in the analysis in the previous step and W1 so the error in the analysis E0 0 in the step and W1 these are uncorrelated therefore of the 4 terms the 2 of the terms will die because of this uncorrelated nature as well as W1 is mean 0 I am left with only 2 quantities the 2 quantities are related by this so for because we are doing this for the first time let me try to write this down so what is this this is equal to M0 E0 hat E0 hat transpose M0 transpose plus W1 W1 transpose plus M0 E0 hat W1 transpose plus W1 times E0 hat I am sorry I made a mistake once again in your W1 times E0 hat transpose M0 hat you can readily see the multiplication of these 4 terms leads to these 4 quantities I am going to think of expectation of the whole expectation of the whole is equal to expectation of the individual quantities I am now going to distribute that E to every term so that is equal to E of this plus E of this plus E of this term plus E of this term now M0 being a constant comes out this is the previous analysis this is the future noise these are uncorrelated so this product is equal to 0 M0 comes out this is the next noise this is the previous error so this term is 0 this term is essentially the noise covariance Q Q1 this is essentially M0 E of E0 E0 transpose M0 transpose and what is that that is P0 therefore we get the expression M0 P0 M0 transpose plus Q1 there are 2 things what they have noticed here the covariance at time the predicted covariance the predicted covariance at time 1 consists of 2 parts for example one part is the initial covariance magnified by the model this is the initial covariance magnified by the model the second one is the covariance that is introduced by the model noise are you all with me place that is important you have to have an a ha here this is an a ha moment so if a model is stochastic dynamics if the randomness are coming from 2 directions Q1 is the uncertainty in the prediction coming out of the model noise the first term is an uncertainty that comes from the initial condition noise initial uncertain the initial condition therefore the prediction has 2 sources of randomness from the initial condition and the forcing and those 2 together additively contribute to the total covariance of the prediction why this is additive because we are assuming everything are uncorrelated if the if there is correlation between E0 and W1 then there will be other terms that is coming to cover this equation the correlation will also try to increase the value so why do we assume things are uncorrelated because I would like to have a plain simple elegant formulation and that is nothing can be simpler nothing can be more beautiful than this formulation I hope you got I hope you got the idea of going from step 1 as going from step 0 to step 1 therefore I now know the forecast I now know the forecast covariance once I have a predicted once I have a forecast I can create mischief what is the mischief a forecast is the best estimate I can have of the state at time 1 I have the forward operator now I can use the forward operator and the predictor state to create what is called model predicted observation so that is what it is what is the expected value of Z1 given X1 is equal to Xf Z1 is already given to you from Z1 is given by mother nature but I am interested in the conditional of Z1 expected value of Z1 with respect to X1 is equal to Xf that is what this is my now condition the knowledge I would like to be able to get Z1 from the model is H1 X1 plus V1 X1 is equal to X1f this is again conditional expectation of the linear operator thus the conditional expectation of V1 given X1 is equal to X1 is 0 therefore the model predicted observation is H1 X1f again I am assuming the model operator the linear operator can be changing in time so am I considering H or Hfk am I considering M or Mfk it turns out the arithmetic the algebra replacing by H by Hk M by Mk are no different from keeping them time invariant so without loss of generality we will pull the time index all through that idea. So now let us look at this now this is the model predicted observation so Z1 is the actual observation this is the model predicted observation so that is the error in the predicted value of the observation given the model state forecast so the product of the 2 is going to give you V1 inverse V1 that is equal to R1 as it should be R1 is the observational covariance error so that is a check on what we are trying to do so the basic this is the basic idea I have I have I have I have I already illustrated so you gave me X0 you gave me X0 hat and P0 hat I there you use my model I use my W1 or model M1 W1 I created my forecast I created my forecast covariance as I do that somebody is giving you observation and somebody is giving you the observational covariance so I have 2 pieces of information in here I would like to combine these 2 pieces of information to create an analysis and an analysis covariance at time 1 so combining combining these 2 to get this is called the filtering going from here to here is called prediction all we have done is to finish the predictive part we have not done yet the combination part so this step is called the data assimilation step DA step this part is called the forecast step so now you can imagine I started with X0 hat P0 hat I made a forecast then I got the observation I created the new analysis X1 hat X2 hat P1 hat then I am going to use the model to get X1 I am sorry X2 F P2 F then I am going to get Z2 R2 from these 2 I am going to get Z2 hat P2 hat and the system that is a sequential process so where is the data assimilation step comes in the data assimilation step comes in after the forecast is made the forecast plays a role of the background observation is a new information I am combining them you can see the Bayesian point of view and that is repeated it is because of this we call it sequential I think it is better to remind ourselves what do we do in 40 bar in 40 bar we first decide a time horizon n we get observations Z1 Z2 Z3 Zn we get all the observations then you try to fit all of them to be able to decide the best initial condition once you decide the best initial condition then we run the model forward anything beyond n is called the forecast that is what we do in 40 bar in sequential we never look back we keep only going forward sequential data assimilation is exactly what is being practiced in all forecast centres of the world these days so sequential is in other words I know what I know I simply want to update given the new information to get the new new analysis so if I if I know how to go from 0 to 1 that is essentially the same step to go from K and K minus 1 to K K to K plus 1 or K minus 1 to K so what is the general step now that I have described the process of going from 0 to 1 very clearly I can take liberty with some of the details in going from K minus 1 to K so I have been given the analysis and the analysis covariance at time K minus 1 from these two using the model M I am going to create the forecast and the forecast covariance these two are generated using the model equations observations come I do that I do the data assimilation part it is here the data assimilation is done so forecast prediction data assimilation filtering so prediction filtering are continued sequentially this combination of prediction filtering somehow call Kalman filtering so what is Kalman filtering Kalman filtering is the process of the process that underlie assimilating data into a linear model when the observations are linear so Kalman filter essentially assumes LQG Kalman is in fact a master of the LQG world he essentially here he has done many wonderful things in control theory Kalman filtering is only one aspect of his multifaceted contribution to control theory but it is this Kalman filtering that is applicable in the geophysical domain because in geophysical domain since we are concerned with natural occurring system there is no way to control the natural occurring you cannot control a hurricane you cannot control earthquake you cannot control the blowing out of a top of a mountain a volcano natural occurring system we can only observe we can only predict but engineering occurring systems you can analyze you can predict you can design you can control so that is the fundamental difference between engineering approach to engineering problem and scientific approach to natural occurring systems so geologist geophysicist and atmospheric science people are also interested in same kind of problems that engineers are interested the only thing is engineers have an added advantage of being able to control whereas in sciences you simply have to be able to predict so for example if there is a going to hurricane we cannot change the motion of a hurricane well engineers have sometimes suggested when at the time in the hurricane forms why not to just drop a bomb and dissipate it this is a definitely typically an engineering idea if you talk to an atmospheric scientist or anybody else they would simply laugh at it and go they would not even care to think of answering that so engineers mind is always if you know that there is a danger why do not we control it and prevent the danger from occurring that is the engineering that is how engineers are designed but it is very difficult to be able to control natural occurring systems but that is why much of what Kalman has done is unknown other than the Kalman filtering because it is the Kalman filtering is the only one aspect of the engineering solution that he has developed is applicable to the data simulation setup in fact Kalman did not call it data simulation in fact the ultimate paradigm in data simulation was stated by Kalman is embodied in Kalman filtering what is that I have a stochastic dynamic model I have observations I would like to be able to sequentially keep updating and creating newer analysis from previous forecast and an observation. So in my view 1960-61 when Kalman publishes paper called Kalman filtering that is a forerunner of all the data simulation systems known to mankind for example the 40 war came out only in the mid 80s Kalman filter was applied to meteorological problems only in the early 80s so Kalman filter as a solution to a data simulation problem for earlier than many of the folks in geophysical world had imagined dreamed that is what I would like you to think about why in the in the context of weather forecasting in 1960 what is the what are the kind of tool they were using they were still using successive approximation there was no 40 war there was no 3D war all these things came much later even the Kalman filtering idea for it to see through the engineering scientific literature it took well over 20 years. So in my view Kalman filter is one of the earliest of the complete solution to the data simulation problem in the context of linear stochastic model and linear of observation there are functions of the linear function of the observation corrupted by noise I hope the sequential aspect of the idea is clear now. So what is that I am now going to do I am simply going to run through the metal I have been given x k x hat k-1 p hat k-1 I am going to run through the model. So my forecaster time k pull the analysis this is the analysis pull the analysis through the model you get the forecast then you get the forecast error if you have the forecast error this is the expression for the forecast error then this is the forecast covariance which is the product of these two terms because the cross term vanish this is the expression for the forecast covariance look at this now no data only model. So I go from 0 to 1 I go from k-1 to k so let us look at the computational aspects of this now to be able to generate the forecast I have run the model running the model is essentially matrix vector multiplication that is cheap so this is going to be y n square but let us compute let us see what happens in the update of the forecast error covariance assume pk-1 is given pk-1 is a n by n matrix m is n by n matrix so I have to do one matrix multiplication another matrix multiplication each matrix multiplication is going to cost me o of n cube this is going to cost me o of n cube then I have to add two matrices that is going to cost me o of n square so which is the most expensive part in Kalman filter equation it is not the forecast it is updating the forecast error covariance updating the forecast error covariance is of the complex it is the order of n cube you please remember that we did if you want to multiply two matrices of each size one million and a teraflop machine it took about twelve and a half thirteen days we have already examined that so this multiplication will take thirteen days this multiplication will take thirteen days and this is probably several hours you are talking about a month time and that is only to do the forecast error covariance so what does it mean for large systems where the state of the system is the order of 10 to the power of 6 or higher while I know what to do in Kalman filters it is practical to get it done in practice because of the course of dimensionality such problems in computing field is called infeasible it is not that I do not know how to solve I do not know how to solve it is simply that with the kind of environment computing environment I have I cannot finish this what is the added outcome of this this promotes an idea telling computer folks you folks you need to build me larger machines so originally it was mega flop machines then it was a teraflop machines then it became petaflop machines petaflop machines are 10 to the power of 15 there are very few petaflop machines around the world now they are talking about exascale machines where the the the the flop rating is 10 to the power of 18 18 flops Japan China I am sure India also has joined this race America so almost all the wealthy countries in the world the government related in the the wealthy countries in the world they are putting enormous resources in the development of faster and faster computers because it is the availability of faster and faster computers that are going to be useful in making major technological innovations in the future so weather forecasting in this sense is one of the hardest computational problem not because we do not know how to solve but because we do not have powerful enough computers to support us to be able to perform large computations in a short time so that is a that is a aside story that coming out of this they made us to need deal with larger computers essentially come from these kinds of arguments again I am going to quickly run through some of the things which you have already talked about from going from 0 to 1 so the conditional expectation of zk given the forecast is this the I can talk about the condition the covariance of zk given xk now please understand everything is conditional why I am I am conditioning everything on the amount of information that is available what are the information available one coming from the model and coming from the observation so everything is a conditional analysis everything is a conditional analysis so what is the data simulation step ok now I would like to be able to come back to the data simulation step you go back I am sorry I can I can I can I can do it here so what is the data simulation step I want to cut through the mess and give you so this is k-1 this is k I had xk-1 I had pk-1 I have xkf I have pkf I have zk I have rk I am going to combine the 2 to get xk hat and pk hat so this is the analysis that is the forecast that is observation so what is that I am now trying to do please go back we already have lot we know lot of things we already have done the static carbon filter now do you see this is the static carbon filter look at this now I have I have forecast xkf and its covariance I have observation and its covariance so what is that I can do I want to be able to combine them I can combine them in the Bayesian way I can combine them at the linear minimum variance way I can combine them as a 3D war way we have seen all of them almost all these earlier experience leads to the fact the analysis of time k is equal to the forecast at time k plus zk-hk times xkf what is hk times xkf that is the model predicted observation what is zk is the actual observation what is my place if I have model I should be able to predict everything you should never be afraid of prediction the prediction may come through may be a bust if the difference is large prediction is a bust even though the prediction is a bust I am learning something so z-hkxkf is the innovation is the new information that the observation brought to 4 that I did not I could not I could not have known at the time when I made prediction when I made prediction so I am going to combine the forecast with the innovation the coefficient of linear combination is a matrix k k is called the Kalman gain matrix k is a rectangular matrix of size n by m so what is that we have done we have made the best forecast I hope you understand that so xkf is the best forecast available that the best background information that is available to me now I need to estimate so this is the estimator this is a structure of the estimator what is that is the linear structure this can be written as why this is the linear structure xk hat is equal to I minus I minus kk hk xkf plus kk zk do you remember this looks like L times xkf plus k times zk so L is a matrix k is a matrix in this particular case k is k k k l is this so why am I bringing this this is the linear so the estimator is a linear structure I would like this estimate to be unbiased I would like this estimate to be minimum variance so I am going to fall back on linear minimum variance estimation that is exactly what Kalman did and I do not have to do too much because I have already covered linear minimum variance estimation so what is that I have done I have essentially prepared all the concoctions needed I simply need to mix them it is like a fast food chain in McDonald's every order is met within 5 minutes how do they do that they anticipate and a certain amount of sale they try to prepare all the ingredients the ingredients are already stored whenever an order comes they simply need to assemble the already ready ingredients to make the product so every product they can assemble in a short time that is why they call fast food chain and that is the approach I have taken here I have prepared all the concoctions that are needed to do the Kalman filtering so what are the various things we did we have talked about given two pieces of information one called background another called observation how to mix them how many different ways in which we have looked at them based in way we have looked at linear minimum variance way we have talked about 3D war way so I have all kinds of concoctions ready and now I am facing the problem I can I can simply call any one of these framework and be able to solve the problem so I now know from the theory I have already talked about how to determine KK we already have the formula from gas to Kalman one of the one of the one of the one of the modules we have seen from gas to Kalman for one of the modules we have seen so this gives raise to I am simply trying to remind ourselves this is a forecast value this is the observation this is the actual model so true state forecast observations and the Kalman filter equations and also I would like you to understand the forecast the analysis feeds into forecast and the forecast feeds into analysis look at that now the previous analysis provides the next forecast the current forecast and the new observation decides the current analysis that is filtering so this is the filtering step I hope you enjoy this the forecast step I already have it here so the whole question is my analysis of the linear function of the forecast and the observation the linear function of the forecast and observation I would like to be able to I one more so I will I will erase this part you already know this part therefore what is involved in here I already know I I already know HK so the only thing I need to determine is KK linear structure is involved I only need to be able to compute the covariance of X khat the covariance of X khat is going to be a function of KK I am going to minimize the trace of the covariance with respect to the elements of KK that is the that is the minimum variance estimation I have already talked about the methodology for doing this in the previous class so I have earned the right to code the results but again even instead of simply coding I am trying to plug you through the various steps I am not okay let me quickly tell you the various steps I am not so this is the analysis structure I already know from the previous step of how the variables in time k are related to variables in time k minus 1 so I am relating this to I am I am sorry I am I am jumping I am I am relating these two variables in time k plus 1 from k to k minus 1 k to k minus 1 I am having the error I substitute the error in here I get the structure of the analysis error so this is a structure of the analysis error now please understand the structure of the analysis error involves the Kalman gain I have not determined Kalman gain I only talked about what is that therefore the analysis covariance is given by this structure is given by this structure and I have to be I have to be able to come okay so if you do these and simplify sorry sorry so I hope you got this structures again as a yes it is easy for me to do this because look at this now e k is a sum of 3 terms 1 2 and 3 so there are going to be 6 terms when you multiply in pk if you carefully analyze these terms and simplify you get your pk to be this expression this is the ultimate expression dk comes in here please recall this is exactly the expression we have done in one of the earlier modules this is the term which is quadratic in k this is linear in k this is the linear in k I need to be able to minimize that trace of pk with respect to kk kk has nm elements the conditions for the Kalman gain minimization of the total variance we have already done we have already done and that analysis I am now quoting the optimal Kalman gain is given by this is given by this so now I have computed the optimal gain once I compute the optimal gain I wanted to go back the optimal gain depends on kk so I know the value of kk so if I substitute that the optimal gain value in here I get the minimum expression for the minimum value of the analysis covariance the minimum expression for the minimum value the analysis covariance is given by this which can also be written like this yes there is ton of algebra to be done I am assuming you will do the algebra not only do the algebra but enjoy the lessons coming out of this algebra is a very very very very educative algebra as any other algebra is so the expression for the Kalman gain is given by this the expression for the optimal covariance is given by this so I have completed the Kalman filter equations so what is that now we have said let me let me do it once more in here I know some of you might feel that I have gone a little faster but there is nothing I have done here is new I have already built everything in here so if I am going from k-1 to k I have to I know how to compute x hat and pk hat so that is the Kalman filter equations we summarize this Kalman filter equation in a tabular form little later before that we are going to we are going to provides several comments relating to the structure the Kalman gain.