 Welcome back everyone. So, in the previous, at the end of the previous lecture we were discussing the form of gamma 2, gamma 2 star here. Now, gamma 2 star remember we realized is actually nothing but the conditional expectation of x 1 given y 1, where x 1 was the state was the state at time 1 and y 1 was y 1 here was nothing but y 1 here is nothing but x 1 plus v. So, really all you are doing here is estimating x 1 from x 1 plus v, but the problem here is that x 1 is not a constant or is not a constant as far as the optimization happening in this problem. The x 1 here is decided by is itself chosen through our decision variable gamma 1. So, if you once we write this out explicitly we see that really x 1 is in fact, we can write this out explicitly and see that x 1 is really gamma 0. So, x 1 is x 0 plus u 1. So, x 1 is x 0 plus u 1 and u 1 is gamma 0 of x 0 and what is y 1? y 1 is x 1 plus v. So, it is y 1 is x 1 which is this plus v. This here is your optimal control action. So, the difficulty that arises is that and the difference that arises from the earlier problems is that now there is an explicit dependence of the gamma 0 in the state that we are estimating and also the information that we have in order to estimate this. So, we will do we will see how this is we will make this more explicit we will actually take examples and see precisely what is happening here, but this is something I want you to know. Let us now let us go back now and try to understand intuitively once again what is happening in this problem. So, as you said the second stage problem is really about estimating x 1 as a using the information that we have. Now, knowing this then knowing that the second stage is going to be doing an estimation of this kind we should ask ourselves then what is the first stage doing? What is the choice of gamma 1 really about? Well gamma 1 then is what it is doing is it is trying to shape the information that is going to be away it is trying to do two things here. Gamma 1 is trying to do two things here. Gamma 1 is has its own business which is that it has to minimize this particular cost term here right which is it has to minimize it has gamma 1 influences directly the action action remember u 1 is just gamma 1 of y 0. So, it has this particular influence here, but it also influences what happens in the next in the next stage because it appears also in this in this term here. So, gamma 1 is sort of doing a compromise here between minimizing its own stage wise cost and also somehow you know influencing what is going to happen downstream. But you might ask well was not this also the case in a stochastic control problem was not it also the case that you know the action that we took at this at a current stage influence what was going to happen in the next stage and yes indeed that was the case, but remember it is the action that had that effect what happens in a. So, when the classic when you have in a classical information pattern it is the action at the next stage and the action that influence that comes up in the in its influence on the present on the stage wise cost and in the in the cost in the next stage. So, if you see the Bellman equation we are minimizing that as a function of the action. So, you have a stage wise cost which is a function of the action and a cost to go which is a function of the action and that action is what links these two. The thing that changes once you have non classical information pattern is that it is not action anymore, but the policy. So, it is the entire policy that you choose in the first time in the first stage that influences the effect that you are going to have in the next stage. So, there is a this is the fundamental difference. So, why does this why does this happen let us. So, now what we will do is let us we will go deeper into this and try to understand this where is this what is the difference between between just being in between just action linking from one stage to the next, but versus a policy linking from one stage to the next. So, the main thing here is what you need to see is actually the observation that you need to make see notice the notice what we have here this is u 2 star right u 2 star is the conditional expectation of this term subject given this. Now, if you look at this condition this if you how would you compute this particular conditional expectation well you would compute it by writing out the conditional distribution of this term given this right and if you see how does one compute the how is the conditional distribution defined well what is the conditional distribution of any two variables let us say what is the conditional distribution of x given y well it is that is all that is equal to the joint distribution of x and y joint distribution of. So, what is the conditional distribution of x given y this is equal to the joint distribution of x given y divided by the distribution of x of y right. So, if these are conditional continuous random variables then these should be interpreted as densities. So, this the numerator would then be the joint density of x and y denominator would be the density of y right and what is the density of y well the density of y is itself equal to the integral of the density joint density of of x x and y the integral being taken over x. So, you can see what is happening here now if you if you consider here if you replaced capital X by by x 0 plus gamma 0 of x 0 and capital Y by x 0 plus gamma 0 of x 0 plus V if you if you if you did the following then in that case P of x given y here would then have the for you would then have gamma 0 of x 0 present in in each of these in each of these expressions here and it would be get integrated over and so on and the resulting expression would be a function of gamma 0 right. So, this here U 2 would be U 2 star of gamma 0 now what does this mean for this to be a function of gamma 0 so when two functions when you when you say that you have you have a function that you have U 2 star which let me write it in fact this way U 2 star is a function gamma 1 gamma 2 star of gamma 0. So, what does this mean so gamma 2 star is a function of gamma 0 but this is you remember this I am not this is not the same not the same as saying gamma 2 star is a composition sorry gamma I made a mistake here in my notation this is gamma 1 this is gamma 1 remember this is also gamma 1 gamma 2 star is a function of gamma 1. So, this is not the same as gamma 2 star being a being a composition with gamma 1. So, when we have two functions we can write we can they can interact in two different ways you can have two functions f and g suppose you have two functions like this f and g you can have a composition of f and g so f composed with g or you can have f as a function of g now f as a function of g means what it means that if you want to evaluate the value of the of f at any point you need to know the entire function g whereas f composed with g means that if you want to evaluate the value of f at any point you need to know the value of g at that particular point that is it. So, if you wanted to evaluate f o g at some point x you just need to know what the value of g of x is but if you wanted to evaluate f f of g of x then it is not enough the this is for this it is not enough to know only g of x you need to know the entire function g means how you need to know the values of g for every particular every value for every value of its argument every value of z for every z you need to know g of z. So, what does this mean this means that essentially what is happened and why does that happen here the reason that happens is that when you look at these conditional expectations these conditional expectations involve integrals and when you are integrating over over the over a space here like you are doing here you are integrating here over x what is going to happen is that you are going to end up integrating over the entire range of gamma 1. So, it is not any point value of gamma 1 that matters but rather how the entire shape of gamma 1 over the over the space. So, so as a result gamma 2 star becomes a function of gamma 1 but not a composition with gamma 1. So, the whenever you have this particular thing here you will get something that is a function of gamma 1 but not a composition of but not a composition with gamma 1 and the reason that is happening is because this gamma 1 is present in in whatever we are conditioning on and also whatever we are taking the expectation of. Now, this is the impact of the non classical information structure. So, what I will show you now is that once if you have a classical information structure all these complications just do not happen somehow they just do not happen they this this will never this kind of manifestation never comes up when you have a classical information pattern. So, in order to so before so the so what is happening in the in one way of understanding what is happening in the in in this particular problem is that you the second in the second stage you are choosing gamma 2 as a function of you are choosing gamma 2 which is to you know which applies the which is just the conditional expectation. So, so given the gamma 1 it is very easy to decide what gamma 2 has to be chosen. But then remember the the dependence of gamma 2 on gamma 1 is horrendously complicated because gamma 2 itself is a function of gamma 1 and then once we plug this this in here once we plug in that the optimal action here is a is a function gamma 2 star of gamma 1 what we what we get then the resulting thing that we get is is something of this sort you are minimizing with respect to gamma 1 k square u 1 square plus a function that so we are doing x 1 minus gamma 2 star of gamma 1 of x 1 plus v the expectation of x 1 minus that the whole square that is what is being solved here. So, the resulting expression therefore is is is a extremely complicated highly non-linear expression in function space. So, this here is a highly complicated non-linear expression in gamma 1. So, as a result of this this actually tells you what might be the state of the state of the research on this problem it turns out that even for this simple problem like this that I have written the Wittson-hausen counter example we are standing today in in 2022 well after 50 years of the Wittson-hausen counter example and it turns out that we still do not have we still do not know what the optimal control for this particular problem is and the basic reason for that is this extremely high complexity that comes up that you know the optimal control problem that you have to choose at the first stage impacts in an extremely non-linear and complicated way the optimal control that you would choose in the second stage and that particular optimal control is something that we have not been able to we have not been able to completely understand. So, intuitively let me explain this in another way essentially what what is happening in this problem is that the the there are you can think that there are basically two controllers acting in this particular problem that is one controller here which is which is your gamma 1 then there is a second controller here which is gamma 2. The second controller is simply wanting to estimate he wants to estimate a state given given his information. So, his problem that way is very easy to understand and therefore he all he is doing is conditional expectation. But it is the first controller's problem which is complicated because the first controller has has a dilemma between balancing this particular cost and the way in which his his his choice of gamma 1 is going to influence the optimal cost of the second stage. And this this this choice cannot be broken down into a choice where you are just choosing actions at each the value of the action. So, you cannot break this down into something where you choose the optimal U 1 for at each stage at you know you you you cannot choose optimal the the optimal action for each value of the information that is what you would have done if you were doing the if you are writing out the Bellman equation where for every value of the information that means for every value of y 0 you would be choosing you would be choosing you would be choosing a U 1. But then that that can be done provided the second stage cost also depends only on U 1. The second stage cost unfortunately does not depend on U 1 it depends on on on the entire function gamma 1. So, it depends not only on what the value of gamma 1 is for y 0 but also for other y 0 dashes. And that is how that is why this this sort of a once you write out a problem in this particular way you cannot really break this down into you know a stage wise or step wise problem the way we did for in in in the in the case of the classical information structure. So, the the the dilemma happening in this problem is essentially on the part of the first controller the first controller has to decide how to what is the optimal policy by which he can he optimizes his first stage cost as well as his influence which comes up in an indirect way on the second stage cost right. This this this this particular dilemma has been given various names this the most common one of them is what is called the dual effect. So, this here is this here is a manifestation of what is called the dual effect. So, I will I will also define this particular term in more formally later but I just wanted to give you an introduction essentially that is the dual effect comes up because the policy the when the policy effects not just the action chosen at a particular stage but also the information available to another agent in the future stage. So, that the the so, why is all of this happening this is happening because essentially you have these two controllers gamma 1 and gamma 2 trying to pick actions the action of the second controller that is u2 has to be chosen as a function of of of y1. Now, y1 itself is impacted by the choice of gamma 1 but the second controller does not have access to the information that that the first controller has when he when he chooses his action when he chooses his action which ends up influencing the information of the second controller. So, the the first controller can influence the information of the second controller the first controller here can influence the information of the second controller because it he can influence this this particular he can influence Y1 but the second controller does not have access to the information that the first controller had when he is actually when he is choosing his action. So the second controller does not have access to gamma Y0 which is the information that is there with the first controller and that essentially comes down to this particular the violation of that it comes down to this highlighted inequality that I1 is not a subset of I2. So, this is essentially are the core kind of dilemma that is going on in this particular problem. So, what we will do now is let us look at a few special cases of the Wittson-hausen problem and let us try to understand where exactly this dilemma is showing or this difficulty is showing up right. So, let us take first for example, suppose the cost was just expectation of X2 square. So, suppose we had only this cost, so suppose the cost is expectation of X2 the whole square. Now, if this is the so in this case therefore there is no first stage term, so there is no cost on U1 square right. So, there is no first stage term no cost of on U1 square, so that means the entire problem is about minimizing the norm of X2 and X2 remember itself what was X2 well X2 was simply X1 minus U2, so this so the cost is just X1 minus U2 the whole square. Now what is so what would be the optimal the optimal what would be the optimal controllers in this case. So, remember the you gamma 1 has to be chosen in order to gamma 2 is always being chosen in order to minimize this error and gamma 2 is regardless of what happens in the first stage is always a conditional expectation of X1 given the information right. So, gamma U2 is gamma 2 of Y1 which is conditional expectation of X1 given Y1 right and this is this is equal to conditional expectation of X1 given X1 plus V. So, this is this is alright, so now how should if this is what has to be minimized in the second stage then how should gamma 1 be chosen, how should gamma 1 be chosen well gamma 1 remember is how does gamma 1 influence all of this well X1 remember is X0 plus X1 is X0 plus U1 and U1 is gamma 0 of gamma 1 of X0, so this is X0 plus gamma 1 of X0, so in that case now if let us see how one can choose gamma 1 remember we all we want to do is minimize this the error between X1 and the conditional expectation of X1 given V right. So, now suppose one here is one way in which gamma 1 could be chosen what gamma 1 could do is that what the first controller could do is say well suppose I just choose I what I could do is you choose gamma 1 in such a way that it nullifies X0 right. So, in other words choose gamma 1 in such a way that X1 becomes some constant. So, all the uncertainty or lack of information and so on which is which is there which the second controller does not know of all of that is getting nullified at this stage. So, in particular say let us say take suppose we take gamma 1 of X0 as just minus Xj then in that case then X1 then will be a constant equal to 0 regardless of what gamma 1 is regardless of what X1 is going to be a constant equal to 0 regardless of what X0 is. So, X1 is always equal to 0. Now if X1 is equal to 0 then then no matter what the what information is available with the with the with the second controller the second controller can always produce this constant estimate equal to equal to 0. So, he can always take say U2 equal to gamma 2 of X1 plus V equal to 0 identically equal to 0 right. So, in that case this here is necessarily the best conditional estimate of a constant of a constant value 0. So, with this what would you get we would get then that the expectation of X2 square is in fact equal to 0 because so why because this would then give me this is simply X1 minus U2 square and X1 is now 0 and U2 is also 0. So, this would be 0 and since this is a non negative term that we are trying to minimize we have got a value 0 it means then that this choice of gamma 1, gamma 2 is optimal. In other words if you did not have the first stage at all then what the optimal thing for the first controller would to do would be to just nullify the effect of the noise. So, he chooses an X0 he chooses an action U1 such that the state at the next time step is a constant regardless of what the X0 is. So, in other words the lack of that the second controller does not know what is known to the first controller or does not know X0 in particular that becomes irrelevant as a result and then as a result of that he is we find that that is that cost is actually optimal. But remember this cost is optimal not for the original Wittson Hausen problem it is optimal for this particular problem for where the cost is the expectation of X2 square. So, this is the optimal cost is for when we are trying to minimize minimize this particular term. Now, in fact what happens if we try out this cost in the original Wittson Hausen problem it turns out you can try this out. So, with this we can apply this policy this policy for the Wittson Hausen problem and when we apply it what do we get we get that the we get that the cost is the we would get that the optimal cost let me call this jopt or j star let us say the optimal cost optimal cost of Wittson Hausen is going to be less than equal to the cost of this policy and that is actually equal to simply case the cost of this policy turns out to be the case where times expectation of X0 square. So, this is the cost of the above policy. So, when we apply this above policy in this Wittson Hausen problem we find that this is the optimal cost of the Wittson Hausen problem is upper bounded by this particular term. Now, this is we have this it turns out this is very far from optimal this is just a very simple naive policy that we have used it turns out that this is far from optimal and the reason for the reason this is so naive is because it is trying to do trying to sort out the information asymmetry between the two players in the by doing the most sort of cruel thing it is trying to make the state in the next time step independent of the information that the second controller does not have and therefore you know the constant action turns out to be optimal. In the next lecture what we will do is we will look at another type of another type of problem. So, where we will do is we will change the information structure and see if the problem you know manifests any different nature. So, we will do that in the next part.