 So, now let us write out the equations of the Kalman filter. If you remember, we are referring to the system which is given here, system whose state and observations are given by these. Remember that wk's and bk's are independent Gaussian noise. The initial state x0 is also independent of wk and vk and it is itself a Gaussian vector. The xk's evolved linearly as written here. The observations that we get are also linear observations. The disturbances uk are known exogenous sequences. Now what we will do next is actually what I will do is just write out first the Kalman filter equations and then we will see how these are. We will then try to interpret and understand them, okay. So, here is the Kalman filtering theorem. So, consider the linear Gaussian system above and we will assume that all the parameters are known with known parameters and what are the parameters? They are the ak's, the ck's which appear in the observations, the fk, gk which are coefficients of u, the qk which is the variance of wk, rk which is the covariance of vk, the initial the mean of the initial state x hat 0 and the mean and the variance of the covariance of the initial state which is sigma 0, okay. And with known prior distribution, known prior distribution pi 0 which is given by x hat 0, sigma 0, okay. Then pi k of x, remember pi k of x was just the probability that xk equals x given all the observations from up until time k. Then this particular this pi k of x is given by pi k of x is a Gaussian distribution with mean given by x hat k and its covariance given by sigma k, okay. So, what is x hat k? Then x hat k is the conditional expectation of xk given by 1 to yk and sigma k is the covariance of the covariance of which is this, sigma k is x minus x hat k into x minus xk minus x hat k transpose, okay. So, this is the covariance of covariance sigma k, okay. So, pi k is a Gaussian random variable with mean x hat k and covariance given by sigma k. Now, the algorithm, so and now the distribution is specified in terms of x hat k and sigma k. Now, how does this, how do we find this x hat k's and sigma k's? So, that is where the recursive computation is comes up. So, here x hat k, sigma k are computed recursively as follows. So, I will write out the algorithm on the next page. So, this is the Kalman filtering algorithm. Here is the Kalman filtering algorithm. So, for time k, time k equal to 1 to whatever and given observation yk, so compute the following recursively. So, your x hat k plus, so first we have like we had for the other for the more general filtering equation, we first have prediction step, okay. So, first we predict the mean, okay, given the observations you have up until that time. So, x hat k plus 1 given k. So, this turns out to be a k x hat k plus f k uk. We also do a prediction of the measurement that we would make using the information that we have up until that time. So, sitting at time k, we predict the measurement that we have, we would get at time k plus 1. That is therefore, in a straight forward way is equal to x hat k plus 1 given k plus g k plus 1, sorry this is c k plus 1, g k plus 1 uk plus 1. Then you have sigma k plus 1 given k. This is equal to ak sigma k ak transpose plus q k. Now, what is the significance of sigma k plus 1 given k? Well, actually what these the x hat k plus 1 given k is only the mean of the of x k plus 1 given the observation up until time k. Here, you have the covariance of the same term of x k plus 1 given the observations up until that time k. But the important thing is that these actually these two are enough for us to specify actually the distribution of x k plus 1 given the information up until time k. So, it is enough to give the predicted distribution, the predicted distribution. And so, and the reason for that is that the predicted distribution is also itself a Gaussian because of what I said we have all these x's and y's are all jointly Gaussian. So, this here is itself these two here are specifying for us together a Gaussian distribution. So, this here is my let me write here this is a prediction step prediction step and specifies a Gaussian, specifies a Gaussian distribution and the Gaussian being mean x hat k plus 1 given k comma sigma k plus 1 given k. So, this is your this is the this is the distribution. So, this so, now notice that this depend now x hat k plus 1. So, this is the predicted step predicted step prediction step this depends itself on x hat k which is the the which is the mean that you have from the previous time step it also depends on sigma k which is the variance that you would have from the previous time step. So, putting these two together now what we need to do is use this to use this to compute the similar mean and and similar mean and covariance at the next time step. So, you have s k plus 1. So, in order to do so what we need to do now is compute a similar mean and covariance for the next time step. So, in other words now what we need to do is the next step which is the measurement update step. So, to do the measurement update step let us write out some intermediate calculations. So, let us define s k plus 1 as c k plus 1 sigma k plus 1 given k c k plus 1 transpose plus r k plus 1 this is s k plus 1. Now, using s k plus 1 we do our measurement update. So, we have our new x hat k plus 1. So, which is the conditional mean of x k plus 1 given the observation up until time k plus 1. So, we have x hat k plus 1 equal to the predicted x hat k plus 1. So, you have this plus sigma k plus 1 given k c k transpose s k plus 1 inverse times here is where the measurement update comes up. So, this is your new measurement minus what we had as the prediction of the new measurement. Now, this gives us therefore the mean at the next time step the mean of the of so, this gives us basically the mean of pi of the distribution pi k plus 1. And what is the covariance of that distribution the covariance of the distribution is sigma k plus 1 and that it sorry sigma k plus 1 here sigma k plus 1 is equal to sigma k plus 1 given k minus sigma k plus 1 given k c k plus 1 transpose s k plus 1 inverse c k plus 1 times sigma k plus 1 given k. So, this therefore completes our description. So, what we have here is that x hat k here at any time k. So, your pi k remember was equal to the Gaussian with mean x hat k and covariance sigma k. So, you start off with something like that and what we get is a new Gaussian now here with which is pi k plus 1 which is which has mean x hat k plus 1 and covariance sigma k plus 1. The the as I mentioned there are these two steps one is the prediction step the other the other step here is the measurement update. These are precisely the two steps that we do when we were when we wrote out the more general equation for general filtering recursion. Now what this is the important thing to note here in the in the measurement update is this beautiful new term that that shows up in a very explicit manner and that measure and that is what allows us to make this so so easily recursive. The new term that shows up here is this y k plus 1 given y y k plus 1 minus y k plus 1 given given given k. So, what is this term? This term is is basically we can we can think of this term as this term here is the difference between difference between tomorrow's observation and today's estimate or today's estimate or prediction let us say today's prediction of tomorrow's observation. So, it is the it is this difference between what you actually see when you you would actually see tomorrow and what you think you would see or what you predict you would see tomorrow based on the information that you have so far. So, this is precisely you can say the additional information that you have learnt from the new observation that you could not have that you did not know from the previous from all the previous information that you already had. So, if y k plus 1 which is the new the new observation if that turns out to be predictable using all the previous observations then you would actually get that this this would be 0. So, the new observation gives you no additional information than what you had from from what you already had from your previous observations. So, rightly so this term is actually often called also the innovation. So, this sequence is called the innovation. So, the reason this is essentially is the novel or new innovation new information that has come to us because of the new observation y k plus 1. So, what does this basically tell us then it tells us that the way to do the measurement update is to take your earlier prediction which is coming from here take the prediction that we have earlier and take a linear combination of that with the innovation with the and why do we need to take the innovation because this is this here remember is the additional information that is coming up and we take the linear combination because after all remember if our new estimate has to be a linear function of everything that we knew so far. So, it is a function it is a linear function therefore of our prediction best prediction and the new information and in the incorporating the new information we are really only keeping that information that we could not have predicted from our previous observations. So, the as a consequence of this the this is how the sort of the new update looks. So, you have a combination of a prediction plus any you know you look at all the incremental new information that has come in from the innovation from the new observations in terms of the innovation sequence and then use the linear combination of the two to get to the new estimate. So, this only helps us of course estimate the mean from here we also need to do something similar for the for the for the covariance and that is actually exactly something similar is what is worked out here for the covariance as well. So, this is these two put together give us this prediction step and the measurement update step together give us the Kalman filter. So, what is the secret of that makes this Kalman filtering equation works so easily and Kalman filtering work out so easily. So, I mean we have we as I said we have a complicated integral where for which we do not in general have an explicit solution why did we get an explicit solution so neatly in this particular in for this linear Gaussian problem. The reason for that is actually a property of Gaussians itself and that that property is often known by the name by this very colorful name it is called the Swiss army knife of Gaussian distributions. So, this is called the Swiss army knife of Gaussians. So, if you remember this Swiss army knife is this multi-purpose knife which can be used to for for many many many many unique little and assorted different tasks. So, this is this result is something of that you know it has it it is it can it is it is a combination of many different tools all wrapped into one. So, how does one what is the how does one apply what is the result let me let us write that down. So, consider Gaussian densities the densities are given in as for us so it is so one is a density. So, here you have a density which will be written as a function of y the mean is c times x and the covariance is r and the other density is a function of x its mean is mu and its covariance is p. Now, what is if you so if I just multiply what this the result basically says is that this the product of these two can be factored in this knit way n of y times x mu p this this here is the following this product of these two densities is is the following it is it is actually can be written as p of this n of y with mean c mu and covariance c p c transpose plus r times n of now x with the mean m plus let us say k bar times y minus c mu and a covariance p minus k bar c p I will explain what m and k bar are k bar here is p c transpose c p c transpose plus r the whole inverse m is mu plus k bar y minus c mu. Now, what is the let us let us just absorb this a little bit and then I will explain the consequences of this as well. So, the so what what do we have here you have a density here which is written in terms of y and another density in which is written in terms of x but if you notice that the density of y its mean actually depends on x right and if this is the same x here. So, in other words this here this density here is actually a density a conditional density of y given x. So, you are fixing an x here out here and fixing the x and then you are computing the density of y and it turns out to be in the form c you know with mean c x and covariance r. So, this is really capturing a model of the form this here is capturing a model of the form if y equals c x plus noise something of that sort has been a model of this form is captured here. Then you have on the next term here is simply this is a adjusted density a marginal density of x itself when you take the product of these two what we are effectively doing is so this here is capturing a model of as I said y equal to c x plus noise and the density here is the density of y given x equal to x. So, it is a probability density of y given x equal to x that is this term the term here is simply the marginal density of x. So, this is simply the probability of x equal to x. So, as a consequence of that this here the left hand side here is the probability that of y equal to y times and x equals x and that has that is effectively been written in this form it is given it is written in the form probability of y given x equal to x times probability of x equal to x. Now, this that is what we have here on the left hand side. Now, what do we have on the right hand side well on the right hand side the roles have been reversed now what we have here this now looks like a probability of x given y equal to y because if you see there is there is now a y here which has come up in the mean of x the same y that was present here. So, the y this y has shown up here which was which is also present here. So, this is now a probability of x given y and what we have out here is just the probability of y itself. So, it is really what you have done here is written the joint density of of x and y in two different ways one by writing it as y given x times the density of x and the other case you have written it as x given y times the density of y. But what is the importance of this well it is giving us therefore a way of going from posterior to from prior to posterior. So, if you have a model in which you have a prior distribution x right and you have you have a prior distribution x you have an observation y and from there if you wanted to you wanted to know what is now the posterior model well the posterior model is given by this the posterior distribution from the observation y is given by this expression here it is p of it is it is this is your expression p of x given x given y right. So, let me let me write out the consequence more explicitly here. So, we have therefore we have a couple of results. So, first is if I take the integral of the left hand side the integral of the on the left on the left hand side with respect to x this now because of this decomposition that we have on the right hand side see earlier the what we have what we would have had on the left hand side is that we would have had x sitting in both of these densities. Now, what has happened is the x has come out x is now here right. So, if I integrate this out on the on the right hand side I am left with just this the first term here which is n a normal density or as a function of y with mean c mu and c p c transpose plus r. So, as a therefore therefore what we are left. So, so the therefore the term on the right hand side that the the term that got integrated out this particular term is in fact the posterior density. So, in other words the n of y given c x comma r n of x mu p divided by the integral of the same term this is equal to this is what we have here. So, now from here we it is very easy to actually compute to compute the Kalman filtering equations all one needs to really observe is all one really needs to do is is apply compute the prediction and the measurement update steps by using this particular Swiss Army knife formula that we have. So, let us let us do that in a moment. So, so first you have the prediction step we have our prediction step which we will in the prediction step not remember we are we are doing p x k plus 1 given y 1 to y k. Now what we will do is we will actually apply induction for we will we will assume that the filtered Gaussian density. So, we will assume that pi k of x k is is in fact a Gaussian. So, we will assume that the filter density is a Gaussian and then from there conclude that the the density at the next time step is also Gaussian. So, and since we are starting from a Gaussian density this will ensure that the density at each time step is Gaussian. So, we will assume pi k of x k is pi k is Gaussian and equal to x hat k with sigma k all right this is with this particular density. Now, we let us compute this p x k plus 1 given y 1 to k and that is equal to integral n x k plus 1 a k x k plus a k x k which is the mean q k times n x k x hat k. So, this is remember just coming from our from our filtering master filtering equation this is this is the system model this is the prior this is the prior and what we find is that the prediction is itself Gaussian we using the Swiss army knife equation. So, we get that this is the Gaussian function of x k plus 1 with mean x hat k plus 1 given k comma with covariance sigma k plus 1 given k and now we can apply the measurement update. So, measurement update the measurement update is p gives us p x k plus 1 given y 1 to y k plus 1 since we are predicting now using the additional observation y 1 to y k plus 1 and that is simply given by this c k plus 1 x k plus 1 the mean this with this mean and r k plus 1 times the prediction that we that we just derived which is x k plus 1 normal of x k plus 1 x hat k plus 1 given k comma sigma k plus 1 given k divided by the integral of the same thing integral remember over over the integral over x k plus 1. So, it is normal of the x k plus 1 and again this is using the Swiss army knife equation we can actually find that this is nothing but again a Gaussian with mean x hat k plus 1 given k plus k bar k plus 1 times y k plus 1 minus y k plus 1 given k and covariance sigma k plus 1 given k minus k bar k plus 1 c k plus 1 sigma k plus 1 given k where k bar is simply sigma k plus 1 given k c k plus 1 transpose s k plus 1 inverse and s k plus 1 is is as above in the in the in the in defined here this is your s k plus 1. So, that is what we have we we can now so all that we have used here is really just the is is only the Swiss army knife formula and it is just by induction given us that if you start off with the Gaussian prior then you get a posterior and you have Gaussian noise in your measurements then you get a posterior which is Gaussian and the explicit formula can be computed using the Swiss army knife equation. So, this completes our study of the of of the Kalman filter explicitly we will what this is we will see in the next in the next class is combine this filter with the linear Gaussian problem and make a few observations about it and then and and and take the theory of partially observed systems in to the next step then. Thank you.