 So, in the previous lecture we have established the lemma that for that the which showed that the error term here between the state and the conditional expectation of the state given the information. This error term is a function of only the noise in the system and the function also does not depend on the policy that is being used. So, it is independent of the policy. So, as a result of this we can now revisit the dynamic programming equation we had written out at time step n minus 2. So, we had observed that this term here the first one let me underline this with a different colour. We had observed that this green term here is independent of u n minus 2. So, that does not depend on u n minus 2. The yellow terms did depend on u n minus 2 and the blue term was the last green term is also independent of u n minus 2. The question mark was about the blue term. The blue term we were not sure if that was dependent on u n minus 2 and we just concluded that the blue term actually has this error here x n minus 1 minus the conditional expectation of x n minus 1 given n minus 1. It has this particular error here we just concluded that this term is actually independent of u n minus 2. So, as a consequence of it the entire blue term I can make that into a green term because that also does not depend on u n minus 2. So, we are therefore left with only these 2 yellow terms which depend on u n minus 2 and then we can now observe that well these 2 yellow terms are actually the same as what we had got when we had when we are looking at the problem with perfect state information. And therefore, once again we can conclude that continuing the dynamic programming DP algorithm we once again conclude that u n minus 2 star is a function u n minus 2 star of i n minus 2 and it is in fact equal to once again it has the same sort of form as we got for time step n minus n minus 1. We conclude that u n minus 2 is a function of the conditional expectation of the state x n minus 2 given i n minus 2 just the same way as we did for time step n minus 1. So, you get we conclude that this is actually equal to some function some linear function let me write it here let me write l n minus 2 times the conditional expectation of x n minus 2 given i n minus 2. So, now that this is been done we can now substitute for u n minus 2 back again into this particular into the dynamic programming equation that we had and compute j n minus 2 as a function of i n minus 2 and then proceed again to step n minus 3 and so on. But in each step all the arguments that we have used so far will continue to hold and therefore, we would get we would be able to conclude the optimal policy is in fact a linear function of the conditional state the conditional expectation of the state given the information and all these observations that we have made which is that the optimal policy of is this is this of the problem is the same as the one with perfect state information all of these observations will also continue to hold. So, in other words for all times t this particular observations that we have made in fact are valid for all times t. So, for all times at every step the optimal policy is equal to the optimal policy of the perfect state information case applied to the conditional expectation of the state given the information. So, this here what we have shown so far is a landmark result it has shown we have shown that essentially the way to proceed with a problem with where we do not have we do not have perfect state information is to in some sense ignore the fact that we have we do not have perfect state information just work with what we have namely we make the best estimate of the state and assume that was the true state and then apply the control that we would have applied if that were the true state. So, that is the form of the optimal policy in fact the optimal policy also has a structural a kind of structural feature associated with it which I had mentioned earlier as well in fact there is more if you recall the optimal policy for the perfect state information case was in fact the same as the optimal policy for the noiseless case. As a consequence of this we have an equivalence across three different problem classes you have three different problem classes that give us the same optimal policy the first is the problem class where you do not have noise at in the system the second is the problem class where you have noise in the system but you have perfect state information the third is the problem class where we have noise in the system and we have noisy observations of the state all of these in all of these three problem classes provided the system is linear and the cost is quadratic and observations are linear in provided these assumptions hold the optimal policy is the same it is just applied on a different on a different on a on a different entity though in the in the first two cases when you have the state information you apply it on the state itself in the second case you apply it on the on the best estimate of the state. In fact the the optimal policy in the problem with imperfect state information can also be understood in a in a different structural sort of form the the the structural form is as follows. Let me write out the form of the optimal policy so that we can appreciate this kind this structural result that we get so we have at every at every k we have that mu star the optimal control u k star is equal to mu star k of i k and that is equal to as I said l k times expectation of x k given i k where this l k matrix is can be computed in the following way it is minus r k plus b k transpose k k plus 1 b k whole inverse p k transpose k k plus 1 a k and the k k matrices are given recursively. So we have k n equal to q n and the k k has to be found in the following way you first define p k as a k transpose k k plus 1 b k times r k plus b k transpose k k plus 1 b k the whole inverse b k transpose k k plus 1 a k this is p k and then you have k k is given as a k transpose k k plus 1 a k minus p k plus q k the so this is the form of the optimal this is the form of the optimal policy. Now the structural observation that we can make about this is that is that the optimal policy can be understood in terms of the following in terms of the following block diagrams. So let me draw this diagram for you in a moment. So first you have you have your state equation. So the state equation here runs as x k plus 1 equals a k x k plus b k u k plus w k this is your stated this is your state equation the state equation is powered by noise that comes from outside. This this is this stated equation when so the x k the previous state the state you have at time k this results along with observation noise results in an observation z k which is equal to c k x k plus v k. Now this z k will contribute to and contribute to another block that I will call an estimator. Now what goes into this what what else goes into this what goes what else goes into this estimator block well what we also get about in this estimator block is the previous control action. So in order to write this previous control action the estimator block let us first understand what the estimator block is going to produce. The estimator block is going to produce for us the conditional expectation of the state x k given the information i k. Now this then gets multiplied by l k the gains that we have just computed and that results in the control action u k which goes into this into this particular into this particular equation and gives rise to the next state. So this is your u k. Now u k remember does not come to us immediately we when we are using the estimator the u k the estimator has information of the controls up until that time which means the so which means the control that was applied at the previous until the controls that were applied at the previous time step. So you have u k along with with a delay this delayed the the outcome so u k comes into the estimator with a one step delay. So you have with a so with a step delay of one. So the estimator takes information z k that comes out of the observation block and u k minus one. So and of course it has all the history of the previous information that it also had so that is what it accumulates as i k. So you can see what is going on here so this this this is this is the structure of the optimal policy. So what the optimal policy is doing is is doing two things separately on the on the one hand it is doing an estimation here using an estimator. The estimator takes all the observations that we have so far and and and the control actions that you have applied in the past and produces for you an estimate of the state. The the estimate of the state is then fed into an l k which then which is which then produces the control action that needs to be applied. Now the remarkable thing here is that the s that your method of producing the best estimate has nothing to do with how you are how we has nothing to do with the control action or has nothing to do with the control policy that you would be choosing. The control policy that you would be choosing is the same as that you would have chosen if even if there was perfect state information. So in other words the the estimator and the controller here so this is here the controller. The controller and the estimator here are being designed independently of each other. So they are the they and what does it mean for them to be designed independently? Well we can think of we can think that the estimator here is solving a problem of its own which is the problem of estimating you know coming up with the best estimate of the state given the information. Remember we just showed in in the previous lemma that the estimation error that we would get is the same as the estimation error that we would we would get in a problem the estimation error is really a function of just the noise. So it does not depend on the control policy that that you would use. So in other words it the estimation error in this problem that you would incur has nothing to do with what your the control policy that you would be applying downstream. So this is where we have what we what we have got in this particular case can be called thought of and is often called a separation principle. It means that the design of the joint design of the controller or the design of the controller for this particular problem can be broken down into two separate designs. The first is a design of a regular controller assuming no estimation is needed assuming the system is in fact deterministic or stochastic but with perfect state information. So it is a it is a design that is agnostic to how you are estimating the state. So this controller design does not the controller design does not care about what is going on how you are producing your estimates. Whereas the estimate and the estimation design the estimation error that you get well that estimation error has nothing to do with what policy is being used. So the estimation error in fact we can think of the estimator as minimizing in fact we can give an optimal optimization based interpretation to this as well. We can show that conditional expectation of xk given i k this is in fact the minimum of minimum mean square error estimate. So it is the it is the argument of y the norm of y minus xk the whole square given i k overall y. So if you were to if you were to pick y is that were a function if you are allowed to pick y which is a function of i k and in order to minimize this particular error it turns out that the optimal y for you to pick to minimize the expected squared error between y and xk is in fact the conditional expectation of xk given i k. So therefore the estimator here is doing its own thing with independently of what is happening with downstream with the controller. The controller is being designed independently of what how the estimate is actually being arrived. So this as a result is as I said is called a separate the separation principle. The separation principle has a has a an enormously important place in industrial control systems because it essentially is help helping us separate two different disciplinary tasks one is a control design task which is which is which is based on control theoretic principles the other is an estimation task which is based on statistical principles. These two are being separated out and it turns out that they can both be done independently optimally in their own way and combined in a manner that the resulting that the result is still optimal. This is a stronger result than just the certainty equivalence result which we had seen earlier. Certainty equivalence only said that you can replace all the noise in the system by its mean and then the and the answer that you get from the resulting deterministic system is also optimal for the stochastic system. Where here in fact what we are getting is that the that the here the noise is not being replaced by its mean here is we are we are really saying that you you know the control engineer can do the control task the statistician can do the estimation task and they can both come together and the resulting thing would be optimal for the for the stochastic control problem. So, this this kind of separation usually results in an explosion of technology of technological progress because because each party that is an expert in in in its respective sub-discipline can continue to can continue to perfect its art regardless of what is happening in the others in in the other sub-discipline and it as a result of this this is one of the most widely applied results in in in the in in practical and theoretical control and and has a has a special place in all of control theory. Let us despite all the praises that I am singing about this result we should we should we should make sure that we understand the premises under which this result has been derived the result is assuming that we have we have a linear system a system that it was linearly the noise is independent across time the cost is quadratic the observations are linear and the information that we have at each time step is equal to all the observations that we had so far and the all the actions that we have taken so far. So, in under this premise the the the all the conclusions that I have just mentioned continue to hold. So, in the in the next part what we will do is we will we will go a little deeper into the the estimation term here so far we have just know we only know that the optimal controller is a function of the conditional expectation of of the state given the information. Now, this conditional expectation may be easy to calculate may not be easy to calculate. So, this this requires a little bit of work and in order to to see how this can the the calculation of the conditional expectation can also be made simpler and and more wide range. It turns out there that we need to make assumptions about the nature of the noise. So, far we have only assumed that the noise is independent and 0 mean we have not assumed any specific distribution probability distribution about the state about the noise itself the we will use the probability distribution of the noise in this part when we will start when we will try to derive expressions for the conditional expectation of the state given given the information. As I had mentioned there is also a certainty equivalence type of angle here because because we had seen that there is a the optimal controller is in fact a applying a these the control that it would have applied if if if the system was deterministic. So, in fact, it is the same as what we would the the system that it would be the same controller that we would get if we remove all the noise from the system and and the state was replaced by by its by its conditional mean. So, effectively that is that is that is the certainty equivalence angle in this particular problem. So, in the following lectures we will now we what we will do is we will we will start relaxing some of these assumptions we will first go into we will first go in depth into in seeing how the conditional expectation of the state given the information can be computed and then we will start relaxing some of the key assumptions that we have made so far.