 Welcome everyone. So, in the last few lectures we have been discussing a phenomenon that we called the dual effect, the dual effect in stochastic control problems. The premise of behind this phenomenon was that typically the control action that one takes has just one primary role which is to minimize the cost. So, therefore, the policy, the dependence of the policy is only in the cost function of the problem or the loss function of the problem. It does not affect any other aspect of the problem. However, what we saw in Wittzenhausen's problem is that the policy chosen at any time step can also have an effect not only on the cost of the problem because of the action that control action being present in the cost, but also on the information available to the later acting controllers. So, the later acting controllers their information is depends on the policy and therefore, the policy chosen at later stages is a function of the policy that is chosen in the earlier stages. So, this policy dependence is what is known as the dual effect. So, if you recall we had we said this we say that there is a dual effect in the problem if the information of U2 depends on the policy gamma 1 that means the policy that the first controller chooses depends influences the information that is available to the second controller. This is what we call the dual effect in a problem. Now, one thing I want you to want to emphasize today was that the dual effect is not only is something that is not just present in you know in esoteric information structures like the non classical information structure. It is also present in information structures that we have already studied so far. Now, if you recall we have studied the information structure for Markov decision processes that means our problems where the state was perfectly known and there we studied a class of policies that we called Markov policies. Markov policies were those policies where the action was taken as a function of just the present state. Now, one can look at the N information structure in which the only the present state is known to a controller. So, the action therefore has to be chosen only as a function of the present state. Now, in this case what would happen? What would happen is that a previous acting controller would take its action based on its state based on the state at its time and a later acting controller would take its action based on the state at that at its time. So, as a consequence of this the information of the previous acting controller would not be available to the later acting controllers. Although the previous acting controllers action would influence the state at a future time which is the information of the later acting controller. So, the information of the later acting controller the information of the later acting controller would depend on the policy of the earlier acting controllers. But it would not have access to the information that the earlier controllers had when its own action was being chosen. Now, this is essentially the same thing that we have seen in Wittesendhausen's problem that a previous controller affects the action of a late affects the information of the later acting controller but the later acting controller does not have access to the information of the previous acting controller. Now, so as a result of this there is in fact even in an MDP if we restrict ourselves to Markov policies there is in fact a dual effect there too. But the dual effect has no bearing on the hardness of solving the problem. The problem can be solved quite easily in spite of the fact that there is a dual effect that is because the cost is a function of the state the dynamics is a function of the state and we already know the state. It is this structure that is being exploited in solving the problem despite the fact that there is a dual effect. Now, to concretely see why there is a dual effect in these sort of problems let us actually look at let us look at our let us look at one more variation of the Wittesendhausen problem. So, this in this variation we will assume as I said as I was just discussing we will assume a Markovian information structure. So, this is variant 4 this is variant 4 with Markovian information structure and our goal is to check if there is a dual effect in this with this information structure. So, what do I mean by Markovian information structure? I will say that u1 is a function of the initial state. So, u1 is sigma of x0 belongs to sigma of x0. So, this is the same as what we had in the Wittesendhausen problem as well but the difference will be in u2. u2 is going to be a function of the state that it is at the next time step. So, u2 is a function of x1 remember earlier u2 was a function of x1 plus the noise. So, now the state is x1 and that is observed perfectly. So, this is essentially by choosing u1 and u2 in this manner where effectively choosing a Markov policy for this problem. So, now the claim is my claim is that there is a dual effect here. So, my claim is that there is a dual effect in this problem. So, how do I establish that there is a dual effect in this problem? So, let us first write this out in a slightly different way. So, remember u1 is sigma of x0 and u2 is in sigma of x1 but x1 is nothing but x0 plus u1. Now, let us suppose we let us take to what we need to show that there is a dual effect in this problem. What we need to do is show that the choice of the policy. So, choice of different policies within this class within the class of sigma of x0. Choice of two different policies within this class can result in two different pieces of information for the second controller. So, like we did earlier let us suppose let us first suppose that we have suppose let us take one policy gamma suppose u1 is equal to gamma 1 of x0 and we will take its for example let gamma 1 be such that x plus gamma 1 of x is invertible. So, to show the dual effect it is always useful to look at extreme cases like this. So, one extreme case it is we are checking if x plus gamma 1 of x we are considering gamma 1 such that x plus gamma 1 of x is an invertible function. So, this function that ever which maps x to x plus gamma 1 of x is invertible. Now in this case when let us look at what u2 knows u2 sees x0 plus u1. So, and since so in other words he sees u2 sees x0 plus gamma 1 of x0 but gamma 1 is chosen such that x0 x plus gamma 1 of x is invertible which means from this information here u2 should be able to equivalently know x0 itself. So, this is therefore from here therefore u2 knows x0. So, u2 therefore belongs to sigma of x0. Now consequently so this is effectively the same as u2 knowing the initial state that was known to controller 1. So, when so that means when gamma 1 is chosen such that x plus gamma 1 of x is invertible then u2 has the information of x0 in that case. Now let us look at another case in which suppose we look at now another extreme. Suppose gamma 1 of x gamma 1 is such that gamma 1 of x is say a constant minus x some constancy where c is a constant. So, gamma 1 of x is now some constant minus x it is just some affine function like this. Now what happens here? Now in this case u2 again is belongs to sigma of x0 plus gamma 1 of x0 and that then is implies that u2 belong. So, if I substitute gamma 1 of x as c minus x here I get that u2 belongs to sigma of c where c is a constant which means then that u2 has no information means u2 has no information has no information about x0. Since u2 only knows the constant and constant is knowing that constant is essentially implying knowing no information about what is happening during the problem. So, as a result of this if based on the choice of gamma 1 the information of u2 is changing. So, if gamma 1 is chosen such that x plus gamma 1 of x is invertible then u2 has access to the initial state which is x0 if gamma 1 of x is chosen as a constant minus x it has no knowledge of the initial state the initial state is lost for them. So, as a result there is a dual effect. Now you can see here this particular manifestation of the dual effect has nothing to do with the cost function of the Wittson-hausen problem and is not anything very specific to the kind of example that you have considered. Of course, we have made certain uses of the example here but this dual effect can manifest itself even in an MDP or stochastic control problem with perfect state information. So, even as I said at the start of the lecture such problems can also exhibit a dual effect. But the issue at hand is not only that whether there is a dual effect of course there is a dual effect but the question is does the dual effect matter? Well in this case it so turns out that the dual effect does not matter why is that? Well the reason for that is for instance here you could do the following for example because the second controller knows x1 and remember the goal of the second controller is to estimate x1 from its information. Since it knows x1 it will estimate x1 directly and the first controllers goal remember this was our this here was our cost function let me go back to the cost function that we had written out this was our cost function for the problem. So, you had k square u1 square plus x1 minus u2 square now if u2 has the knowledge of x1 then x1 minus u2 can be made 0 if and u1 then can be chosen to be to be 0 itself. So, as a consequence the entire cost can be made 0. So, I will just write this out. So, the our cost was we wanted to minimize over gamma 1, gamma 2 the expectation of k square u1 square plus x1 minus u2 the whole square but then remember u2 is gamma 2 of x1. So, u2 can be taken so we can take gamma 2 to be gamma 2 of x1 to be equal to x1 itself and gamma 1 of x0. So, this is u2 and u1 equal to gamma 1 of x0 can be taken to be identically equal to 0. In this case we get cost equal to 0 and the above controllers are optimal. So, what do we learn from this we learn that although there is a dual effect the dual effect has no bearing on the hardness of solving the problem. So, there is a dual effect but there is no it has no influence on how difficult it is to solve the problem because we already know the state and once we know the state we can use that information to find the optimal policy and in this case the optimal policy is rather trivial. The same situation manifests itself when we are doing stochastic control problems with perfect information because there also again the cost is a function of the state the dynamics are given by the state or the transition probability matrix and we have perfect information of the state. So, we make use of all of this to compute the optimal policy using the Bellman's dynamic programming equation. So, this is the lesson that we have so far as the dual effect is concerned. So, far so now let us go back to the Wittzenhausen problem. I have been telling you so far that the Wittzenhausen problem is a hard problem and that it is a counter example to a claim to the belief of the time that linear controllers are optimal in any LQG problem and it gives a counter example by showing that if the information structure is not classical then the linear controllers are not optimal anymore. Now, what we have not yet discussed is what is how did Wittzenhausen actually show this. So, what I will do now is briefly give you a brief outline of Wittzenhausen's argument for showing that linear controllers are not actually optimal. So, let us go back to the Wittzenhausen problem. The original problem is the following you have x0 then as initial state x1 is equal to x0 plus u1 remember x2 is equal to x1 minus u2. Your observations are y0 equal to x0, y1 equal to x1 plus v and u1 was is gamma1 of y0 and u2 is gamma2 of y1 and the cost is we want to do what we want to do is minimize over gamma1 and gamma2 the expectation of k square u1 square plus x2 square. Where x0 and v are independent and in particular of course we later take them to be Gaussian but for the moment it is enough that they are independent. So, what we will do is we will do a reformulation of this problem. So, let us do a reformulation. So, instead of gamma1 what I will write this other notation f of x is equal to x plus gamma1 of x. So, and g of x will simply denote gamma2 of x and x0 will be denoted by x. So, then the problem gets reformulated in the following way. So, I can write a cost as a function of f and g now j of f, g is the expectation then of k square remember u1 then u1 is actually equal to so u1 was gamma1 of x. So, if I want to write now u1 in terms of u1 in terms of u1 was gamma1 of x0 if I want to write u1 in terms of gamma1 in terms of f and I would get k square into k square times x minus f of x the whole square plus fx minus g of f of x plus v the whole square. This is my cost as a function of f and g. Now, we can now let us let us take a note of a few things here. So, the note is that the optimal g here what is the optimal g well the optimal g is g star g star of let us say t equal to expectation of f of x given f of x plus v this is something we have seen multiple times before that the second the optimal choice for the second controller is to simply estimate the state given the information and all of that has so even with this reformulation that continues to be the case that is because in the reformulated part g appears only here in this portion and so what all that g is doing is minimizing this error given knowing the f. So, it is minimizing therefore the conditional it is finding the conditional expectation of f given fx plus v. So, in fact, I should write this as this equals t this is g star. Now, remember here g star would depend on the function f this is also a point that we made before that g star for if I want to evaluate g star t for even a single value of t I need to know for that the entire function f and the reason for this that is because this here the conditional expectation is simply the this times the conditional probability density of f x given f x plus v now and the conditional problem the integral with respect to this and the condition and this conditional probability density if I want to evaluate this density I need to integrate out I need to write this as basically the joint density of this divided by the integral of the joint density and if I want to integrate this when I integrate this out the entire function f matters because this integral is with respect to the first variable the entire function f matters not just the value of the function at any one particular point. So, so therefore, g star of t is actually you can write this rather as I remember g star of t is actually g star sub f of t this is to make it clear that it depends on the optimal g star depends on f. Now, let us now let us now what we will do is we will we will dig a little deeper into this and see what kind of structure can we find in this problem. So, in the first case so suppose suppose first suppose I take I fix okay. So, now we will assume that we will assume that x and v are both Gaussian suppose x comma v are Gaussian all right. Now let us suppose f was is linear okay. Suppose x and v are Gaussian and suppose f is linear now if f is linear then which means f is say something of the form lambda times t right. So, suppose so f of x is lambda times x okay. So, are Gaussian and I am so suppose x and v are Gaussian and I am going to take these to be 0 mean 0 mean. So, in that case it is and suppose f is now linear so f is suppose some lambda times x now if f is lambda times x then then what let us see what happens to all these other variables. So, f is lambda times x. So, in this in my objective here I get a lambda times x here this is now a lambda times x this here the information that the second controller has that information that g has is lambda x plus lambda times x plus v. Now if f is lambda times x and x itself is Gaussian then f x since x is Gaussian is Gaussian lambda x is Gaussian and remember v was also a Gaussian and independent independent of x then consequently lambda f x plus v that means f x plus v which is lambda x plus v is also Gaussian. So, if x is Gaussian lambda x is Gaussian and f x plus v which is lambda x plus v is also Gaussian. So, now let us go back to what we will go back to this part here we have fixed f to be lambda times x and we want to find the optimal g and optimal g remember is simply the conditional expectation of f x given f x plus v. But then notice what we have here we have now estimating f x which is lambda x given lambda x plus v which is which is also Gaussian. So, g star f of t is the conditional expectation of lambda x given lambda x plus v equals t. So, because x and v are are independent Gaussians now this is this now comes down to expectation finding the expectation of of a Gaussian subject to another Gaussian where the two are jointly Gaussian. Conditioned on another Gaussian where the two are jointly Gaussian. So, consequently this therefore is what the mean square estimation theory tells us that essentially because lambda x and lambda x plus v are jointly Gaussian this has to be a linear function of the information. So, it has to be some function some linear function of the information that means it has to be some mu times t. So, since this is true since lambda x comma lambda x plus v are jointly Gaussian. So, therefore the what have we therefore what do we conclude from this we conclude that if f is linear right. So, if f is linear. So, we we got to this supposing f is linear right. So, if f is linear then the optimal choice for g is also linear. So, if f is linear we get we got to this stage where we computed what the g star f is and we found that g star f is also linear it is equal to some mu times t the optimal g is also linear. Now, this does not mean that f has to be linear that is not what is being claimed here it just says that if f if the first stage controller was chosen as linear then the second stage controller has to also be linear the optimal second stage controller choice in response to the first stage controller being linear is also a linear controller that is all that is being claimed here. So, this is this is interesting because it tells us that there is somewhere here there is there is a sort of a linear solution nearby it seems to suggest that there is a linear solution nearby. So, we will delve a little deeper into this in the next lecture.