 Welcome everyone. So, in the previous lecture we had posed the problem of filtering. If you recall the problem was that we were there was a state of a system denoted xk which we wanted to estimate using observations y1 to yk of the system. So, we were the state of the system evolved in this as a Markov chain. So, xk plus 1 given xk in this sort of form is in this sort of form. The observations were we had observations yk which we received at each time k whose density was given by p of yk given xk. And then using the observations up until a certain time let us say it was denoted y1n here is denotes the observations from y from time 1 till time n we had to we had to use that to estimate xk. The case of filtering was when k was equal to n that means the time instant at which we are we are we receive our latest observation is the time instant where for which we also as we want to estimate the state. And so, this the x hat k given k which was which was basically the conditional expectation of xk given y1 to yk this is this is what the problem of finding this is was filtering. Now, this in turn required us to compute this density which is the probability density of x of probability density of xk given y1 to yk. So, we said the problem of filtering is really to is to compute this this pi k of x which is probability of xk equal to x given y1 to yk the problem of filtering is to compute this recursively right. So, in other words we needed a function t such that we which mapped the previous the previous density the previous density like this which was which is pi k minus 1 and the new observation that we received which is yk and map that to a new density pi k. Now, if you recall this by this this density here pi k is all was also called the posterior density you had written this term out here posterior density. So, the problem of filtering was really about computing this posterior in a recursive manner right. So, let us now write out the main theorem of filtering the main theorem on which the logic of filtering is based. So, filtering essentially relies on this one simple theorem. So, consider the model above consider the above model which means your state evolves as p of xk plus 1 given xk this is the state density state transition density and then we have p of yk given if you recall here we have p of yk given xk as the observation density. Now, we are from here we are asking okay what is pi k plus pi k plus 1 we write it here pi k plus 1 of xk plus 1 which is p of xk plus 1 given y 1 to yk plus 1 satisfies the following recursion. So, pi k plus 1 of xk plus 1 is t of pi k comma yk plus 1 and is equal to is equal to this p yk plus 1 given xk plus 1 times the integral over x of p of xk plus 1 given xk times the previous density that we had which is pi k of xk integral is with respect to xk. So, it is dxk here divided by now here we are we have two integrals there is an integral here yk plus 1 given xk plus 1 this integral times. So, the integral of this times the same thing that is there in the numerator which is p xk plus 1 given xk pi k of xk dxk and the outer integral is with respect to d with respect to xk plus 1. So, this here this this particular formula here is the filtering recursion. So, it is what gives us as it is what takes as input here this the previous density the density at the previous time step that is what is written out here the new observation that we have received that is what is coming here and if you see all the other variables involved have been integrated over. So, the other variable involved is xk. So, that has been integrated over this new density is being evaluated at xk plus 1. So, that is why there is an xk plus 1 here in the numerator. So, after all of this is done what we are left with is a function of xk plus 1 and the observations y 1 to yk plus 1 and that is what is this particular density here. So, how does one prove this well the proof of this is rather simple and so, let me highlight quickly run you through the proof of this the proof is. So, first we really have to only use the definition of the posterior conditional density. So, that is remember this. So, really we have pi k plus 1 of xk plus 1 is nothing, but p xk plus 1 the joint density of p xk plus 1 and y 1 to yk plus 1 divided by the marginal density of y 1 to yk plus 1 and this in turn can be written in the following way. So, this here can be written as I will write this on the next page. So, this can be written as p yk plus 1. Now, since I have here what I will do is in if I since I have in the numerator y 1 to yk plus 1 and x xk plus 1 I can write this as yk plus 1 given all the other variables times the density of those other variables. So, you would have yk plus 1 given xk plus 1 comma yk plus 1 times the density of the other variables the other variables are is. So, the other variables are really these. So, whatever it is that we are conditioning on it is p xk plus 1 comma y 1 to k here, but this particular term here can be written in a better way. What we can do is we can we can in fact include another term here which is what I will do now. So, I will also take the density of along with this of xk and then put in a y 1 to yk and then integrate this over xk. Now, what is the advantage of doing that? So, this is my numerator the I can write out a similar term in the denominator by the denominator is simply the integral of the numerator with respect to xk plus 1. So, what I will do is I just repeat the same term here and then also integrate the numerator integrate also separately with respect to xk plus 1. So, I have now what is the numerator here? Let us focus a bit more closely on this term here. This term this is this density this is the joint density of xk plus 1 xk and y 1 to yk. So, I can now write this in the following way. So, let me just pull out this particular term and write that separately. So, this density xk plus 1 xk y 1 to yk this here is equal to the probability of xk plus 1 given xk and y 1 to yk times probability of xk and y 1 to yk. Now, this and this in particular further can be we can we can now simplify this further. So, the way to do that is to observe that if you that this particular density here this is just the probability of the state transitioning from xk to xk plus 1. Given that you are at state xk the probability that you are at the next time step at xk plus 1 given also that given also the observations up until y 1 to yk. Now, given the state transition from xk to given the state transition really the probability of the this particular probability does not anymore depend on y 1 to yk. So, you gain no further information from y 1 to yk once you have already given xk as far as the transition transitioning to the state xk plus 1 is concerned. So, consequently this here the first term here just becomes p of xk plus 1 given xk the and then you have the second term which is p xk comma y 1 to yk. Now, p xk comma y 1 to yk itself as can be can be further simplified p xk comma y 1 to yk is okay. Now, p xk comma y 1 to yk can be further simplified we can write this in fact as so, I have my first term here and then I can write this as pi k of xk times the probability of y 1 to yk and what is pi k of xk remember pi k of xk was pi k of xk was simply probability of xk given y 1 to yk. So, this that is what we get here this now this probability of y 1 to yk is going to be there in the numerator it is also going to be present when we do a similar expression here in the denominator. So, as a result of that it actually cancels out and once we substitute this substitute for this term here in this expression as well as in this expression. So, as a consequence of this so, we can write out the above expression as follows that it is equal to p of yk plus 1 given xk plus 1 comma y 1 to yk times the integral now of p xk plus 1 given xk pi k of x dxk divided by integral p x yk plus 1 given xk plus 1 comma y 1 to yk integral p xk plus 1 given xk pi k of xk dxk dxk plus 1. Now, this is what I have done by cancelling out y p y 1 to yk you know is a little bit has a little bit of mathematical subtlety involved in that and the reason we had to cancel this out was because we actually considered the joint probability here one could have to begin with considered only instead of the joint probability by taking of x k plus 1 xk and y 1 to yk instead of considering that joint probability one could have considered a conditional probability itself. So, you could have condition we could have taken in this a condition on yk plus 1 here and a condition on yk plus sorry condition on y 1 to yk and condition on y 1 to yk here as well and it would the expression would then go through and no cancellation of p y 1 to yk would be then required these are these two approaches are you know more or less the same there are some mathematical subtlety is involved but eventually you should get the answer that I am that we are claiming in the theorem. So, this is where we have reached so far we have we have got this expression here and if you look back at the expression that we have in the theorem it is almost similar to that we have got this integral here we have got this integral we also have the integral outside with respect to x what we do not have is this term what we do not have is this term and likewise this term. So, how are the terms different well this is y probability of yk plus 1 given xk plus 1 whereas if you see here you have yk plus 1 given xk plus 1 and y 1 to yk now but what is probability of what is this particular probability we can try to look at this on the side here. So, what is this probability p of yk plus 1 given xk plus 1 comma y 1 to yk well this is the probability that you will get the next observation as yk plus 1 given that the current state is yk plus 1 sorry xk plus 1 and all the previous states are y 1 to yk but the probability that you will get the next observation as yk plus 1 depends only on the current state we the given the current state it is independent of all the previous observation. So, so this here is in fact equal to the term that we are that we need its probability of yk plus 1 given xk plus 1. Now, the reason this is this we get this equality is because the we are essentially assuming that the observation noise is observation noise is independent is independent across time across time instance right. So, the noise that affects your observations at any time is independent from that which affects your observations at any other at any other time. So, consequently this is this we have we basically have this particular we have this particular identity and once you substitute that out here in this in these two terms we will get the filtering the box equation that we have written out here. So, this basically gives us the filtering equation that we that we desired. Now, this filtering equation can is often written out in as in two different steps the first step is first step is what is called a prediction step. So, the prediction step is so the prediction step involves predicting the next state given the observation so far. So, that is pi k plus 1 given x given k. So, that as a function of xk plus 1 this is really p of xk plus 1 given y 1 to yk not y 1 to yk plus 1 y 1 to yk plus 1 is the thing that we need as far as filtering is concerned. So, this step here is your prediction step and this this is in fact p xk plus 1 given xk times pi k of xk integral dxk. So, this here is the term that we have in the numerator. So, the let me box this write this here so, this term here is your is the prediction step. So, the prediction step is what we have there and the next step involves what is called a measurement update. So, we can think of filtering is happening in these two steps. So, the measurement update basically it considers the new measurement and using the new measurement updates pi k plus 1 given k to pi k plus 1 that means pi k plus 1 given k given k plus 1. So, you get pi k plus 1 of xk plus 1 using the you take the previous observation previous the previous prediction that we have and then update your posterior distribution based on the new information that you have received. So, in that sense the previous prediction that we have here the previous prediction here this can be thought of as a prior it is as if we have this particular prior distribution on xk plus 1 we then receive a new observation yk plus 1 and then based on that new observation we compute a posterior on xk plus 1 and that posterior is the is this expression pi k plus 1 of xk plus 1. So, this is equal to p of yk plus 1 given xk plus 1 integral pi k plus 1 given k of xk plus 1 divided by the integral of the numerator with respect to xk plus 1 which is. So, you can see this here is nothing is nothing but Bayes' rule itself. So, you are you have a prior distribution given by this term that I am circling here and then you have a measurement which is coming in which is dependent on which depends only on the present state and then based on the present measurement the prior distribution you update your distribution about the state using Bayes' rule. So, this here is Bayes' rule. So, you are doing this using Bayes' rule and that is really the measurement update state. So, recall that I had mentioned that filtering the filtering equation is really an application of Bayes' rule and this is it this is being borne out right here because after all we are finding that the first the first step of the field of filtering is prediction and that prediction is simply is carried out here and that prediction basically gives you the new prior distribution that you want that the prior distribution that you want to work with. And then with the new information that comes in the new measurement that has come in which is y k plus 1 you do a measurement update using Bayes' rule and the earlier and and the prediction the prior that you have got from the prediction and then based on that you update your update your belief and that is what comes out. So, that is your new posterior there. So, you can think of this as a prior and a posterior alright. So, this therefore gives us the way basically this is our filtering equation that. Now, what we will now do is we will in the next lecture what we will do is we will actually compute this in closed form for one extremely popular class of problems which is which is the one where the state evolves as a linear system and the noise is Gaussian and the observations are also linear functions of this linear functions of the state and noise. This results in a specific kind of filter and that filter is what is called the Kalman filter. So, in the next lecture we will look at the Kalman filter the in all of these filtering problems including the one the the one that the Kalman filter solves all of them involve computing these integrals. Notice that the computational hardship involved in going from the previous from the previous transition density to the next transition density in other words the computational hardship in computing this this t here is lies in computing these integrals. So, the genius of the Kalman filter is that these integrals here are being computed effortlessly almost you know in in it turns out that when the when we have the system that I just mentioned a lot of these these calculations can be done extremely extremely easily. So, that is what we will do in the next lecture.