 Welcome everyone. In the last couple of lectures, we started our study of a landmark paper in stochastic control. It was the paper of Hans Wittzenhausen. That paper basically gave us a counter example to a claim or a belief of that time regarding linear quadratic Gaussian systems. The belief of that time was that any system when so long as any problem so long as it has a linear system with linear observations and all the noise in the in the system was Gaussian and the cost was quadratic. In such a problem would always admit a controller controllers that were linear in their information. So and this was a consequence of the separation result that we had shown earlier where you had to you your control this was this was believed because of the separation result we had shown earlier where we had we had the optimal controller taking the form of a superposition of the deterministic controller followed by followed by an estimator. The estimator because of the jointly Gaussian nature of the problem of the random variables involved always gave an estimate that was linear in the information. And as a result of that the control the optimal control also turned out to be linear in the information. Now this belief continued for a while because that is that it seemed like this is all that was needed. The 3 assumptions were L, Q and G which is the linear system quadratic cost and Gaussian noise and that always gave us a linear that it was believed that that will always give you a linear controller. Wittzenhausen's counter example which we discussed last time was that he showed that this does not hold if the information pattern is not classical. So if the information pattern is not classical which means that if at some stage you have that the information known at the previous stage is not known at the current stage some there is some loss of information across stages then this is not true anymore. And his the example that Wittzenhausen talked about was this was this simple 2 stage problem. If you remember there was an initial state initial state was this was a this here was taken to be Gaussian this was Gaussian there was this state was observed perfectly by the first controller who produced an action u 1. So this state was observed as y 1 y 0 the first controller produced an action gamma 1 of y 0 that then changed the state to x 1. So x 1 became x 0 plus u 1 this was observed in a noisy fashion by the second controller. So the second controller saw the observation as x 1 plus noise where the noise here again this again is taken to be Gaussian and independent and in fact independent of x 0. So you got x 1 y 1 equal to x 1 plus noise this is what is observed by the second controller and then the second controller takes an action u 2 based on this and so he produces an action u 2 equal to gamma 2 of y 1 that is this one and that the resulting state then is x was x 2 equal to x 1 minus u 2 right this was the resulting state. And the problem was to minimize this this cost which was quadratic cost but the challenge here is that u 2 unlike in the in the classical information pattern u 2 is being chosen as a function of y 1 alone and u 1 is a function of y 0 alone. So in the classical information pattern u 1 would be chosen as a function of y 0 and u 2 will be chosen as a function of y 1 and y 0. So u 2 would also have access to the information that would be that the first so second controller would have access to the information that the first controller would have but in that is not the case anymore. So this part here this here is the non classical information pattern non classical information pattern or information structure. Now if you write out the cost function is explicitly as a function of gamma 1 and gamma 2 here is what you see you see that you have u 1 substituted in terms of gamma 1 that is a term you get here then you get u 2 substituted as a function of y 1 which is x 1 plus v but x 1 itself is x gamma 0 of x 0 x 0. So plus sorry this is x 1 is gamma 0 of x 0 plus x 0 my mistake here this is there is a plus x 0 and that plus you get that plus v and then you have out here again this term has come in because there is x 2 is x 1 minus u 2. So this becomes therefore the cost. Now the important thing that is happening in this problem is that the information of the first controller of the second controller the information of the second controller is influenced by the action of the first controller. The information of the second controller is influenced by the action of the first controller. Now but controller 2 which is the second controller does not know what controller 1 knew while choosing this action. This is basically the issue of non classical information pattern. So controller 2 does not know what controller 1 new which is Y0 while choosing this action. Now one point I want to make is that the first part here which is you know the information of control 2 being influenced by the action of controller 1 this also holds in an MDP this also holds in an MDP this is basically holds in every dynamic problem because the information that is that is that is that later controllers will have will depend on the state and that state would get influenced by by the actions of the previous of the previous controller. So this all holds in every dynamic every kind of dynamic problem the non-classicality comes up because of the second part that I have written here which is that controller 2 does not know what controller 1 new while choosing this action this is the non-classical this is the non-classical information structure. So what is the how is this therefore different from from the classical information structure the the difference that that occurs is is because when the when the difference occurs because because when this when the second part holds when the when the second controller does not know what the first controller new while choosing this action then the then the policy then the policy of the first controller affects the information of the second controller. So you can see that you can see that explicitly out here see if you if you if you have to choose gamma 2 as a function of function of its information right the information is x 1 plus v but x 1 itself is x 0 plus gamma 0 of x 0. So there is a implicitly a presence of gamma 0 here right. So gamma 0 needs to be known in order for you to be in order for you to evaluate what the optimal gamma 2 should be. On the other hand if you also knew the if you also knew x 0 here or in other words y 0 if you also knew x 0 here then in that case gamma 0 of x 0 could be reconstructed from that information x 0 itself and therefore this presence of gamma 0 then would not be important because all the information that you can that you can get from gamma 0 of x 0 can be reconstructed once you know x 0 itself right. So as a result of that knowing the once the the non classical information pattern exists what tends to happen is this particular issue that the information of the later acting controller depends on the policy of the first acting controller not just the action the entire policy of the first acting controller ends up making an appearance here. So how is this different from an MDP well in an MDP also the past action affects the affects the information in the future but the important this distinction from the in the non classical problem is that the that in an MDP although the past action affects the information the past policy does not affect the information. So one does not need to know what policy resulted in that action because we already have the information that is going into that policy for choose you know for choosing the action. So all all that we needed to know is completely encompassed in that input information and therefore the policy itself does not matter right. So in an MDP past action affects information but past policy does not right. So how does a so this is this is basically the key difference that that happens because because of the of the non classical information pattern. So if you if you recall we were also talking about how exactly does the past policy which means gamma 0 sorry which means there is a mistake here this should be should be gamma 1 yeah how exactly does gamma 1 affect the choice of gamma 2 right. So we can see that we can see that out here. So remember that so suppose you know suppose let us let us look at the let us look at the the the optimal gamma the gamma 2 the optimal choice of gamma 2. Gamma 2 star remember of for any value t is really some simply the conditional expectation of x 1 given x 1 plus v and x 1 that in turn is equal to the conditional expectation of x 0 plus gamma gamma 1 of x 0 condition on x 0 plus gamma 1 of x 0 plus v. So this equal to t let me write it this way this equal to t right. So now if I wanted this here the this therefore is the is the is is gamma 2's gamma 2 star of t. Now if by chance suppose if you if you had a classical information pattern so if gamma 2 star gamma 2 star also had access to x 0 then the conditional expectation would then change it would then change to we would get gamma 2 star of of t okay we or we would get let me write in the conditional expect then the conditional expectation then the conditional expectation would change we would get this equal to the conditional expectation of x 1 given x 1 plus v comma x 0 and that in turn would be expectation of x 0 plus gamma 1 of x 0 conditioned on x 0 plus gamma 1 of x 0 plus v comma x 0 itself and as a result of this what happens as a result of this what what you get is this actually becomes very simple this becomes just gamma 0 plus gamma 1 of gamma 1 of x 0 the the the the the knowledge of x 0 helps this term come out of the expectation right. But more importantly what is happening is that the the knowledge of x 0 is more than that what is happening is now that there is now that there is x 0 known here I do not need to know x gamma 1 of x 0 out here. So, this here it becomes equal to simply the conditional expectation of x 0 less gamma 1 of x 0 given x 0 given x 0 and if you want we can from there we can also compute v so it becomes just x 0 comma v right. So, in general if you see if you have something of the following form so suppose you are you are trying to you have you are evaluating say g of t g of t which is the expectation of some function f of x given f of x plus v equal to t this conditional expectation remember is this here would be a this as a this is now a function of t and for evaluating this function for any value of t right. If I want to evaluate this function for any value of t I need to know the the probability distribution of f x plus v right. So, what this here remember is a random variable is a random variable let us call this let us say let us call this z say this is a random variable z its distribution depends on x v and f. So, the probability distribution of this random variable depends on the distribution of x v as well as on it right. So, whatever you are conditioning on its distribution itself depends on the function f. Now on the other hand notice what is happened once if I if I gave the information if I gave access to x 0 then the dependence on this gamma 1 has gone and all that is remained is just x 0 and v right. So, in particular suppose if in the same way if I had here given given access here to if I had given access to x here then this conditional expectation would then become just conditional expectation of f given x comma v and as a result of that this the it would not the whatever you are conditioning on this would be independent of f of course it is a different matter that f appears here which is which is what you are taking expectation of but that is not that is not important the point is that the whatever you are conditioning on it becomes independent of it right. So, so this is where the difference becomes if you had where you go when you go from classical to non classical if you if you are in the classical information structure that means if then you would have access to x and then whatever you are conditioning on would be would become independent would become independent of this function f here this function f here. So, whatever you are conditioning on is would be a random variable whose distribution is independent of this function f. So, in the case of a stochastic control problem that the thing that you are conditioning on would become independent of the previous policy right. So, in the so in an MDP this is exactly what happens in an MDP past action affects information but past policy does not whereas in a non classical problem in a non classical problem past policy can affect the information of the future. So, as a result of this the optimal choice of the policy of the future would depend on the policies of the past right this is in fact would be functions of the policies of the past. So, the optimal policies of the future. So, optimal policies of the future which means optimal policy at time two is a function of the policy at time one. So, if you remember the way we solved MDPs is that we MDPs came about through a nesting of policies the you had an action that you took at a certain time that result in an a state that state then resulted in an observation that observation then resulted in the next action and this is how we there was a nesting of policies where the previous policy came up was being composed with the next policy which was being composed with the next policy and so on. So, in an MDP what we see is a composition of policies right. So, the policies of so policy of in other words policy at time two time at time two this gets composed with the dynamics or policy at time one let us say gets composed with the dynamics which then gets composed with which gets composed. So, policy at let me write this. So, what we see in an MDP is something like this. So, you have a you have policy at at time one which gets composed by with the dynamics which gets composed with the policy at time and so on which will get composed with this dynamics again. So, you see that there is a composition happening of policies and so on. On the other hand in a non classical problem you do not have a composition what you have is that the policy at time two is a function of the policy at time at time one. In a non classical problem policy at time two is a function of policy at time one. So, the policies are function the later policies are functions of the policies of the past. So, because there is no so because of this reason you cannot nest these policies. So, there is no nesting no nesting of policies possible possible and no dynamic programming arguments no dynamic programming. So, you cannot make there is no dynamic programming arguments possible. So, as a result of this the policies of the past have to be determined alongside the policies of the future knowing that the policy of the future has to is a function of the policy of the past. So, what one needs is basically to what one has to do is if you want to any plausible way of solving this problem what one needs is that the policy at time two should be written as a function of the policy of time one. So, you this and then you search over the space of policies in order in over the space of policies at time one and then optimize over that set of policies. So, this therefore becomes this enormous complexity that I was talking about in the previous lecture. Now, the other thing that manifests in the Witsenhausen problem is this. So, as I just mentioned the policy of the past affects the information of the future, but the policy also has its native purpose which is to minimize the cost. So, the policy appear and ends up having two roles to play. On the one hand it has to minimize this cost term here and on the other hand it also has to give the right amount of information to the future because it is implicitly present in both of these terms. So, if you see this term the policy is present in this yellow highlighted term here it is present here as in the again as a cost term, but it is also present here as an argument to this to the policy of to the policy at the second state. So, consequently the policy comes up makes an appearance in a dual way. So, we say so the policy of the past contributes to the cost through an action, but also to the information available in the future. So, as a result of this policy we say that the policy has a dual effect we say in that case the policy has a dual effect. So, I will define this more formally in the coming lecture, but this is what the dual effect basically means that the information of the future it depends on the policy of the past. It of course depends on the action of the past and that is given and that is true for every problem, but it depends on the policy of the past which is the issue at hand. In that case we say that there is a dual effect in the problem people also use the word by in which they say that signaling is present signaling is present. So, we will now what we will in the next lecture we will look at some more some variants of the Wittson-Hausen problem you know and there we will examine whether the dual effect will formally define the dual effect and check if the dual effect actually holds in the in these variants.