 So, welcome back everyone. So, let us go back to the optimal control problem that we were considering the stochastic control problem that we had and we had seen that the solution comprised of two different parts. The first is an estimator here which is computing the best estimate of a state of a certain state given the information and the other is the controller. The controller which was applying it a control action on whatever it is that the estimator was providing. Now, if you recall how the controller was being computed here, the controller was being computed in this kind of recursive fashion. There were these expressions that we had written out and they required us to compute them recursively. So, the controller depended on controller was a function had a coefficient here Lk, Lk in turn depended on kk plus 1, kk plus 1 was being computed by recursively like this by first computing kn then computing pk using pk for k in order to compute then kk through and so on. So, we got we had this recursive computation that we had to do. Now, that is that is basically our method for computing computing the optimal control. Now, the what we would like is something similar to some similar method to compute this particular quantity which is the conditional expectation of xk given ik. So, the conditional expectation of the state given the information if it could be computed using the previous state that previous estimate that we have had of the previous state given the previous information that is what we would like. So, that then this conditional expectation could be compute does not need to be computed afresh every time through this sort of optimization or some other means what it needs to be it could simply be updated using from the previous estimate that we have remember though that the previous estimate is of the previous state and using the previous information. So, this is what makes the problem of translating the estimate from the from the previous time step to the next time step a little bit more a little bit involved. This type of problem is what is called the problem of filtering this is this problem is called the filtering problem if you have to which involves recursively compute this expectation of xk given ik from you know the previous estimate and any additional observations and so on. So, from this and any additional and additional observations additional information. So, the problem of filtering is all about how to make use of the additional information and the estimate of the state that you already have in order to get a good estimate of the next state. This is essentially the main challenge in filtering. So, what we will do now is actually study filtering in a as a problem in its own right. We will develop a few tools and come and acquaint ourselves with a few results that apply when we are doing filtering with when the noise when we have a linear system but which is but when the noise in the system is Gaussian. So, in order to study filtering let us first let us first establish some notation. So, we will have a Markov a Markov chain that evolves on a state space on a state space x this is the the state space and this is we have a Markov chain that evolves on it. The transition density of that Markov chain will be state transition density is denoted by this p xk plus 1 given xk where xk belong is a is an element of the state space. We also get observations and these observation observations and I will denote the observations the observation space is denoted y we have observations that are coming in a space y and the observations come to us with this probability density. So, the observation likelihood so, this is the probability that you get an observation yk given that this current state is xk. So, this is yk now belongs to the observation space and xk belongs to the state space of the Markov chain. So, we have at any point in time we have a history and let us say the we have a history of observations let us denote this history by y1n this is the history of n observations. So, this will be y1 till yn this is our history of observations. And what we want to do at at at any any time at any time k is to build an estimate of the state at time k using the history of observations that we have. So, what we want to do is come up with an estimate which uses this this history of observations that means an estimate which is a function of these observations that minimizes a certain error. So, the the this is called the minimum. So, let me write out the error first. So, that minimizes this error. So, this here the error between this between the state xk and any and the function let us let us call this function let us say gamma of y1 to yn transpose xk minus gamma y1 to y1 to yn. This is the what I have written out here is the mean square error the mean squared error and this is being minimized therefore we call it the minimum mean squared error. So, the minimization is done over all all functions gamma the functions gamma can be chosen from a certain class usually it is this class of Borel functions. So, we are allowed to pick any function of any Borel function of the observations that you have made so far in order to estimate xk. Now, you will notice one thing here which is which which is kind of odd that the observations here are from 1 till n and the state is at a time k. So, the n n k may not be the same and this is deliberate because I wanted to tell you about that there are actually three different types of problem classes that appear. Now, the the the three problem classes that come up are as follows. So, the problem of the problem of prediction the problem of prediction comes up is when k is greater than n that means when the state is in the future and you have observations up until a certain time which is not including the current state current time when the state for which you are estimating the state. So, the you are estimating then a future state based on based on information as of now not the current state as of now the future state as of now this is what is called the problem of prediction. Then we have a problem of filtering which is the one I have I talked about at the start of the lecture the problem of filtering is when k is equal to n. So, this is when you are predict when you also have some information about the present state when you are estimating it is of course, noisy and incomplete information, but you have imperfect information, but you have some information about the present state while when you are estimating it and then there is a problem of what is called smoothing. Now the problem of smoothing is about estimating the state in retrospect. So, you have observations that have happened up until time n and you want to estimate the state for a previous time a previous time k where k is less than. So, here what we are doing is we are revising or improving on our estimate as about an event that has happened in the past as more information comes up comes to us at the present time. So, in all cases the present time is n which is when we have got the information the estimate is being done at a time k which can be either in the future if it is in the future we call it prediction if it is in the present we call it filtering and if it is in the past we call it smoothing. The problem that concerns us in our in stochastic control problems with noisy state information imperfect state information is the filtering problem. So, this is the one that we will concentrate on. So, we will concentrate on filtering. Now what is the so all these problems are posed in this exact in the same way they are the you are minimizing the mean square error between x k and any function of y 1 function of y 1 to y n. What is the solution of this particular problem? So, this is something I had alluded to in the previous lecture. So, this is this is in fact a theorem that that we should know theorem is that the theorem is that is the the optimal estimate minimum optimal or minimum mean square estimate minimum minimum squared error estimate is the minimum mean square error estimate is given by gamma star of y 1 to y n and that is equal to the expectation of x k given y 1 to y n the conditional expectation of x k given y 1 to y n and this here we will we have a notation for this we will denote this by x hat k given n. So, what does this this say this says that well the if you these two problems the problem when you are minimizing the mean square error what you are really computing in effect is the conditional expectation of the state given the information. This justifies why in the in the in the problem with in the stochastic control problem I was always referring to the conditional expectation as the best estimate that is best in the sense that it is it minimizes the squared error between the state and the estimate. Of course, there can be other measures of error in which case you would get other other other other notions but in since we are talking of the the squared error this is the one that would this is the one that we care about. So, whenever I refer to the best estimate really implicitly we have in mind that we are looking for we are trying to minimize the squared error and which is also one of the most standard notions applied in in control theory. So, what this is effectively telling us is that we when we the best estimate that you can do get in in in mean square error sense is the conditional expectation. So, the solving for the so solving the filtering problem and doing this recursively computing this term recursively really amounts to computing the optimal solution of the earlier optimization problem recursively as we keep getting more information. One other point which can which should be made about this notice that this this expectation here is this expectation I can write this as I can write this expectation as the integral over x k with respect to the distribution density p of x k given y 1 to y n this is my density this is and this is with respect the integral is with respect to with respect to x k. So, this is simply the conditional expectation computed of x k with when I am plugging in the density here of x k given y 1 to y n. Now, notice that this density here is is not really what is given to us in the primitives of the problem the problem was given to us in terms of in terms of the density of the state and in terms of the density in terms of the observation density which is why in terms of the density y k given x k not x whereas what we need here is x k given y 1 to y n. So, this here has to be if you see this quantity is what is in in statistics is known as the posterior this is often known as the posterior distribution it is the it is the probability distribution that you have about just about a certain event after you have received the information it is the it is the it is what you would revise your probability probability density to be after you have been given a certain amount of information. So, this is called the posterior and the way to compute the posterior is using Bayes rule. So, we compute the posterior using Bayes rule and as a result of this this estimate here is often called often also called the Bayes Bayesian estimate because implicit in computing implicit in computing this this estimate here is is really computing the posterior that is implicit in this because after all without the posterior I do not see how one could have computed this. So, we need to compute effectively the posterior and the posterior is computed using Bayes rule. So, it is effectively what we are doing here is applying Bayes rule in order to on this particular problem in order to compute this particular quantity. Now, the the the trick however is that we cannot simply it is not just a one time application of Bayes rule we remember we need to do this recursively or the goal is to solve this recursively. So, as a result of this the all our challenges are about computing the posterior in a recursive manner. So, the problem is about computing the posterior compute this recursively. So, this is where the root of the of the hardness is it that it lies in compute in coming up with the problem the posterior distribution recursively using the previous posterior distribution which is itself conditioned on the previous information. The other the other thing to note here is which is another term that comes up very often which is that which is that this estimate here is what is called what is sometimes called an unbiased estimate. So, this is called an unbiased estimate. Now, why is this called an unbiased estimate is because if you look at the mean value of this estimate means you take the mean value of the estimate that you have taken which is a mean value of this quantity is actually equal to the mean of the original of the thing that you are looking to estimate. So, which means that the estimate preserves the mean of of the original of of of xk of whatever you are looking to estimate in that case the term the we say that this is this is of we use the word we call it an unbiased estimate means that it does not introduce any bias of its own on average on average it whatever is the bias that is caused by the estimate cancels out and you get back the average value of the estimate is no is no different from the average value of the thing you are estimating in the first place. So, our focus is going to be on the filtering problem. So, the filtering problem as I said is when k is equal to n. So, our focus is going to be about computing xk given k here or x hat k given k right. So, this is this is this is what we want to focus on. So, filtering that is what we want to derive and as in other words what we are looking to derive is x hat k given k which is the expectation of xk given by 1 to k which is itself the integral of xk with respect to the density p p of xk given by 1 to k dx k the integral is over x. Now, what is this what is this density this density here is the posterior density as I just mentioned we will denote this for this y pi k of x pi k of x is simply the probability density value of xk equal to x when conditioned on y 1 to yk this is for all x in R for all x in x. One other point about our notation remember that this p itself could be dependent on k but we have it has been suppressed that dependence on k has been suppressed here the probability density of course could vary with time it has been suppressed just to keep the notation light and easy to follow. So, remember that this density is not a time invariant density. So, these all of these in fact all these densities that have been mentioned here are not time invariant densities. So, they could they could in general vary with time. So, the problem of filtering is then comes down to estimate is on how to compute this pi k of x recursively the problem of filtering is computing by k of x recursively as I said a fresh computation is it can always be done but it gets as time goes on your histories become longer the history of observations become longer and it becomes harder and harder to compute something like this. So, we want to do this recursively and that is the problem of that is the problem that we are looking to solve when we are seeing we want to solve filtering. So, as a result what do we what we need is basically a formula a formula of the following form we need a formula which says pi k is equal to some function t of pi k minus 1 and y k. Now, what is pi k minus 1 here remember pi k minus 1 would now be the density that you have at time k minus 1. So, this is the probability density of the previous state given the observations that you have up until the previous time instant y k here is the new observation that we have. So, this therefore is the so, so what we want to do is in filtering is to come up with is to derive or come up with a formula like come up with a formula which is find a formula t a formula t that does this. Now, we will do this we will write out the actual formula in the next lecture. But one thing I want you to note in this is that there is really one technique as far as doing this and that technique is simply Bayes rule. We remember we have any kind of recursion like this any kind of relation that connects pi k to pi k minus 1 can be derived very simply basically from Bayes rule. The only difficulty lies in the hardness of computing the individual integrals or individual terms that arise in Bayes rule. There is conceptually there is really no alternative to doing that the computationally there can be multiple different alternatives and that is where the as where you know estimation and filtering theory is has been prolific at that. But we are not going into the computational aspects so much and the diversity of approaches that we have for computation. But rather we want to focus on one particular case which is the case where that I mentioned the one with Gaussian noise. We will do that in the next lecture. But remember that in all of these things the only thing that is being attempt the only conceptually the only thing that is being really that is being done here is the application of Bayes rule. So, we will see that in the next lecture.