 Now, that we have given an expression for the Kalman gain we have an expression for the forecast analysis for cash covariance analysis covariance. I am going to spend little time and understanding the structure of the Kalman filter in special cases. An interpretation if you wish of the Kalman gain. So, let us try to consider a special case when m is equal to n. In principle in practice m is not equal to n, but anyway this is a mathematics. Sometimes if I can do some mathematical analysis under special cases that may throw some new insight into what I already have done. So, let us say what kind of insight I can gain by specializing some of the quantities in the expression for Kalman gain. Let us assume m is equal to n. Let us also assume hk is equal to i. What does it mean? My zk is equal to xk plus vk because I am measuring the state itself. That is very simple case. Let us also assume my forecast covariance is diagonal. In general it may not be, but I am trying to interpret things I am trying to specialize. So, I can specialize many things. So, I am assuming rk is also diagonal. Look at this. There are lots of assumptions going to that. Why these assumptions? Why not? Let us have fun. If nothing let us have fun. To see what comes out of this and that is the curiosity. So, I am going to assume pk is diagonal, rk is diagonal, hk is i, m is equal to n. In that case Kalman gain is a square matrix. This is Kalman gain. I substitute all these things in here. I get this expression. Please remember pk is diagonal, rk is diagonal. Therefore, pkf times pkf plus rk inverse have an explicit expression in here. Therefore, if I substitute this form of Kalman gain into the analysis equation, it becomes this. So, this is what I am after. Now, please understand kk is given by this matrix. So, I know a special structure of kk that is coming out of this, that is coming out of this. So, xk is a vector. That vector is equal to this matrix times xkf plus kkz. But kk is a diagonal matrix. So, this essentially tells you kk is diagonal. If kk is diagonal, i minus kk is diagonal. Therefore, xk hat is equal to a diagonal matrix times the forecast plus a diagonal matrix times the observation. If I consider the ith element of xk, that is xkik, that means ith element of this vector. Now, ith element of this vector now has an explicit expression. This is not r. This is pii. Sorry, there should be same as that. Oh, I am sorry. I am sorry. That is ri. No, no, no. I think I was a little, ri, correct. This is ri is the diagonal element of the r matrix. Piif is the diagonal element of the forecast matrix. These are some of the two. Now, you can see the sum of these two. So, you can see this is the weight. This is the weight. So, what is that? Analysis is the weighted sum of the forecast and the observation. That is very simple. What are the weights? If you call this alpha, this is 1 minus alpha. So, that is the convex combination. So, you can readily see in this special case, I have forecast. I have forecast. I have observation. So, the analysis is simply a point in the line joining the two points analysis, forecast and the observation. That means analysis lies on the line segment joining the forecast point and the observation point. M is equal to n. Observation space is equal to the model space. So, these two points are lying in the same space. So, what does this tell you? The Kalman-Felter equation essentially tells you in this special case, analysis is the linear, is the convex combination of forecast and the observation. We have already seen this result earlier in a simple case. Therefore, Kalman-Felter equation is very much consistent with everything we already know. It is the check of this internal consistency is the result of the analysis of the special case. Not only that, I would like to talk about the adaptivity. Suppose RII is much larger than PIIF. What does it mean? The observation is less accurate than the forecast. The variance large means is less accurate. Are you may? Therefore, so if RII is larger than forecast, forecast is more reliable than the observation. Therefore, RII is larger than PKI. The denominator are same. So, this equation gives more weight to the forecast. That means analysis always favours the one that is more accurate. On the other hand, if RII is less than PII, that means observation is more accurate than the forecast. In which case Kalman-Felter gives more weight to the observation. So, what does it mean? It is like in a committee of seven people, every committee member has the same vote. We do not distinguish one vote is more valuable than the other. If we did it, there is no democracy. In a democracy, all votes are equal. In a committee, all votes are equal. So, you can think of the analysis as a committee decision. Committee of what? Committee of two. Who are the members of the committee for cash and observation? But do they have the same vote? No. Under what condition they will have the same vote? When PIIF is equal to RII, alpha is equal to 1-alpha. If alpha is equal to 1-alpha, alpha is half, 1-alpha is half. Both of them have the same weight. The analysis is simply the average of the forecast and the observation. But seldom is the case where PIIF and RII are the same. RII comes from the instruments. PII comes from the model. Seldom, the case PIIF and RII can be the same. So, either RII is more or RII is less. And that essentially tells you this Kalman scheme is very intelligent. It is very adaptive. It gives more weight to the information with less variance. It gives more weight to the information which is more accurate. That is the beauty. That is the specialty. And we have already talked about this when we derived linear minimum variance estimate. We have already talked about this with the context of the Bayesian estimation, Bayesian structure, linear minimum variance structure. All have this property. All have this property. So, what is another important thing? Forecast has some variance. Observation has some variance. Analysis has some variance. When you combine these two random terms, I am getting a new term which is random. The variance of the combinations analysis is less than the variance of the individual component. That means I am having two bad decisions. I am able to create a better decision from two bad decisions. If you want to call bad in terms of variance, in terms of the fact that variance is not 0. So, that is the beauty of data simulation. Data simulation tries to improve, tries to provide a linear combination whose variance is less than what goes on. And that should not be new to any one of us who have done anything in probability theory. For example, in probability theory, we have this follow. Let me, I am sorry. Let me, let me, let me quote this result. We have already seen that earlier. If Xi is a random variable, if Xi is iid random variable, if Xi have 0 mean and variance sigma square, if I compute X bar as summation Xi, i is equal to 1 by n, i is equal to 1 to n. X bar is also random but the variance of this is sigma square over n. So, individually they all have a larger variance but I combined them linearly as a linear combination with equal weights. Average is a linear combination. So, the variance of the average is sigma square over n. When n goes to infinity, the variance goes to 0. So, what is that we are trying to do? We are trying to develop a quantity for random which is unreliable, a quantity which is more reliable than what goes in and that is the general context of central mid theorem. What is the central mid theorem says? That if you have a sequence of iid random variables, if you compute the average of that, the average has a variance that goes to 0 as time goes to infinity and I can also normalize this average, the normalized average. So, what is the normalized average? If I, if you divide by the variance, it tends to have a normal distribution. That is what is called the central mid theorem. So, the idea what we are seeing is that is very similar to what central mid theorem says except that central mid theorem is an asymptotic theory. We see an embodiment of the principle. What is the principle? I can, if I can, if I, if you give me two random quantities, I can combine them in a clever way. The combined quantity is random but its variance is less than the variance of the two quantities that went in. That is the basic principle of data simulation and that is borne by this analysis. I also want to make another comment. Pick a, pick a hat. To be able to do pick a hat, I do not need the observation. I can pre-compute them. I can pre-compute them. They are independent of the observation. I, I, I hope you, you, you recognize. Let us, let us go back. Let us go back to the, to the expression. I think that can be talked about only when I look at the expression. Look at the pick a hat in one, in here. Sorry. Look at the pick a hat in one, in here. The expressions for pick, pick a hat involves forecast. I already know the forecast needs only the model. Look at the other term. They all need only the forward operator. They do not really need the observation itself. So what does it, what does it mean? The expression for the analysis covariance while it depends on the observational covariance forward operator, it does not depend on the actual values of the observation. So if you set up a problem, if you know the forward operator, even before you take the first observation, you may be able to analyze the structure of some of these covariances offline ahead of time. That is the idea. Now I am going to talk about the case when there is no observation. What do you mean by when there is no observation? I simply have the model. I am going to run the model forward. What happens? Let us have fun. So in this case, xk f is equal to xk hat analysis is equal to forecast because analysis differs from the forecast only when there is observation. When there is no observation analysis forecast, forecast is analysis. In that case, analysis covariance is forecast covariance for all k. I hope that becomes very clear. The forecast itself is generated from the previous forecast. So the forecast at time k is mk minus 1 times xk minus 1 f because there is nothing else. That essentially tells you xk f is equal to the product of all the matrices times the initial forecast. But the initial forecast is initial analysis. Therefore, x0 f is equal to x0 hat. And what is xk hat? It is the mean of the initial distribution. Now let us consider the forecast covariance. Forecast covariance, look at the recurrence. k depends on k minus 1. So if you open this up, there is a product of model matrices times p0. The transpose of that product plus the product of model and qj this where mij is given by this. So when there is no observation, this is what is the forecast covariance. Now when there is no model noise, I can set qj is equal to 0. No observation, no model noise. So this is the variance of the forecast when there is no observation, when there is no model noise. Please understand the forecast covariance at time k only depends on the initial covariance and the product of the model along the trajectory. So I want you to be able to look at all the special cases. When there is no observation, when there is no model noise, what happens to these expressions? This is simply a sidekick arrangement of analysis which is helpful to recognize the role of each of these. When m is not equal to n, when hk is not equal to i, when there is no observation, when there is no model noise, these all provide different interpretation of the derivation. Now let us consider when there is no dynamics, static case. When there is no dynamic means there is static case. That means mk is i, wk is 0, qk is 0. There is no model noise covariance, there is no mk, mk is i in which case xk plus 1 is equal to xk is equal to i. In which case zk is equal to hk x plus vk, that is a static case, that is a static case because there is no dynamics, that is only one time k. So in which case the forecast is equal to analysis, this is the initial covariance, the forecast and the present forecast is equal to the previous analysis. These are the Kalman gain in this particular case. If you do that they essentially take this following form. These are exactly the same as in the static case. Again I am going to leave the verification of these as exercises. Why is that? I am going, by doing this exercise we are now going, we are going to establish. I have already done static deterministic data simulation. I have already done static data simulation. We have derived formulas for the optimal estimate, there are covariance and everything else. Now I want to understand are they and the Kalman filter is related? Yeah, if you take the dynamics off, if you take the model noise off, it becomes a static case. The Kalman filter equation reduces to a form that is already discovered within the context of static analysis. And all these exercises are meant to show the beauty, the nesting. What is the nesting? When you, when do you say some two results are nested? When you specialize one you get the other. Linear and nonlinear results are nested in the sense when you assume nonlinear is a special case linear you get the old results back. So in mathematics getting a set of nested result is beautiful in itself. Why is a set of nested result is beautiful in itself? That gives you a room for consistency check. If nothing we can check the consistency. If nothing we can understand the static theory and dynamic theory are one is called the extension of the other, another is called the specialization of the other. As animals they are not too different from each other. It is that realization of nesting that enables you to see that this theory is not an amorphous collection of ideas. It is a monolithic, it has a monolithic structure. It is essentially understanding appreciating this monolithic structure, monolithic nature of the ideas that the underlying concept of least squares brings to the forefront. It is the least square that ties everything static, dynamic, special cases, generalization. And that is the beauty of the whole discipline of dynamic data simulation and static case as a special case. Now another last one you can see there are lots of important special cases to consider. When the observations are perfect R k is 0 in this case the analysis covariance take this form, the pk takes this particular form, pk takes this particular form. So when the observations are perfect, when the observations are perfect you can readily see you are going to have to require to compute the inverse of hk pkf plus hk. Please understand this is the quantity that decides the Kalman gain. So this must be kk sorry this must be kk. So go back this is the Kalman gain expression R k is 0. If R k is 0 I have to compute the inverse of this, hk is m by n, pk is n by n, hk transpose is n by m. Therefore this matrix is m by m matrix the whole question comes in how do you know this matrix is invertible. So that could be trouble. So when you have a stochastic dynamic model when there is a perfect observation Kalman filter computationally could be could face difficulty because this matrix may not be non-singular, this matrix may not be non-singular. Why let us look at this now, let us suppose m is greater than n, let us suppose h is a full rank matrix. So the rank of h is n, the rank of pkf is n but this matrix is m by m, m is larger than n. So a matrix is made out of matrices of smaller rank, a larger matrix built out of matrices of smaller rank and that is the key. So even though the expressions are simple, hk pkf hk transpose is a m by m matrix, m is larger than n, hk even if it is a full rank this matrix need not be a full rank. If this matrix is need not be a full rank I cannot get the inverse. So in this case what is one way computationally to deal with I simply take the generalized inverse which we have seen so generalized inverse. So that could be that could be numerical difficulty, computing the generalized inverse is not easy. Therefore what is the story? We would tend to think that if the observations are perfect means there is no observation, there is no error means that will help you. So within the context of Kalman filter equation perfect observations are in nuisance, perfect observations are nuisance in this particular case. For the rank of pk so that is exactly what I have talked about sorry. So from here you can see the rank of pk is less than or equal to n minus 1 that matrix may not be SPD and if that matrix is not SPD that could be computational difficulty. Therefore what is the story when rk is small this could cause computational instability that is again comes in the analysis of the special case. The last one is called residual checking how do you check the correctness of suppose you have written a filter you think your program is right how do you check the correctness of that that is what is called residual checking. What is the residual? Residual is zk minus hf xkf innovation xk hat is given by this the Kalman filter equation is the forecast plus Kalman gain times the innovation. Therefore rk is equal to zk minus hk xf I am going to substitute for zk from here I have the forecast I can be rewritten like this it can be written like this. Therefore residual has this equivalent expression the covariance of the residual is given by this you can readily see that you can readily see that. So what is that one could do so if you are trying to implement if you are trying to implement the Kalman filter at every stage you can extract from your program the residual. So you will have a time series this is the residual if you have the time series of the residual from the time series you can compute the covariance of the time series. So the computed value of the time series from the residual must match this you know hk you know the forecast covariance. So you can theoretically compute this you can practically compute this if these two agree that means the implementation is pretty good that is the way to check for the residuals. So now I have talked about computational cost the forecast step let us look at very quickly the forecast step matrix what to multiply because to multiply matrices you have to multiply and add let us talk about this now if I am going to compute the inner product of two vectors so a b c x y z if I want to compute the inner product of these two a x plus b y plus c z I need to multiply also I need to add. So you have two three vectors I have to three multiplication two additions so if you have to do the inner product of two n vectors I have to do n-1 multiply n I am sorry n multiply and n-1 addition therefore there are basic operations basic number of operations. So matrix vector multiply two n square to compute this quantity is going to take four n cube plus n square to compute this quantity is going to take this much time to compute the inverse is going to take that much time to compute the the Kalman gain is going to take this much time. So the total cost of computing the Kalman gain is given by this expression so this is the price you have to pay that tells you the amount of addition multiplication subtraction division you have to perform your computer must be able to perform all these in a short time that means I need more powerful computers. So this talks about the workload and who should be interested in this if anyone who is going to be able to write a program to develop a system they have to worry about these things again that thing continues this is the analysis covariance I am sorry this is the this is the analysis covariance analysis covariance computation takes this much the total cost of computing the analysis covariance these are the various steps of this that is n cube this is the residual computation that takes this much time this is the multiplying the residual by the Kalman gain takes this much time then I have to add the forecast with that that is n time so the total cost for the analysis computation is this much. So in the previous we talked about the total cost of computing the Kalman gain now we are compute the talking about the total cost of computing the analysis. So I believe we all have a good handle on what things are happening yes it is a very dense set of lectures these lectures use many of the results from the previous analysis this way of presentation is a modular way that way I can I can I can present many results ahead of time we become familiar with the individual components of these results now we are trying to assemble all the results within the context of Kalman filters. So with this we have in a sense completed the analysis of linear Kalman filter equation this is the classic problem is the sequential data simulation scheme even though Kalman did not call it data simulation that is exactly what he was doing and I want folks in geophysical sciences to realize that Kalman coming from an engineering discipline actually solve the data simulation problem for the first one to solve it for the first one to solve it that is the importance of this. Now where do we go from here I am going to illustrate these things by some examples and I am also planning to give a table with the complete algorithm and that is what we will do as the first order of business in the next lecture. Thank you.