 So, our journey in stochastic control so far has been as follows. We started off with stochastic control problems where the state was perfectly known. We then wrote out different classes of policies for these problems. We realized that for such problems Markov policies are optimal and the way to find the policies, the optimal policies was by solving Bellman's dynamic programming equation. From there we went to stochastic control problems with imperfect state information and from the imperfect state information there we realized that all of these problems can be formulated as problems on with the perfect state information in one of two ways either you take the information vector as the state, but then that leads to a blow up of the state space or you take the belief state as the state and that keeps the state space the same through at all times and then the dynamic programming equation could be written using a combination of these the perfect state dynamic programming equation and a non-linear state transition associated with the filtering equation for updating the belief state. This then gave us the theory for imperfect state for stochastic control problems with imperfect state information. From there we went into a special case which was linear quadratic Gaussian problems and there we saw that the optimal controller was linear in the information and that was because of is because the optimal controller had this structure that was linear function of the conditional expectation of the state given the information and this conditional expectation of the state given the information in turn was linear in the information because thanks to the property of mean square estimation of Gaussians and then the Kalman filter algorithm that helped us compute this estimate in a recursive manner. Then we went into the Wittson-hausen problem. Wittson-hausen problem taught us that all these results are also contingent on the information structure that is being assumed namely that we are assuming that the information structure is classical that means that the information known at previous time steps is also known at future time steps. Once this information structure is not classical anymore or is non classical then in that case the linearity of the optimal controller in the LQG problem is not true anymore. In fact the optimal controller is not just non linear it is not even known. We do not even know what the structure of the optimal controller actually is. So all of these problem formulations that we have studied so far are in the state space domain in the sense that there is actually a state and the state is the key sort of description of the system and because that is what determines the cost that is what undergoes transitions and so on. And the issue of information is either about whether you have information about the state and if you do not have it then which entity how does that information the imperfect information vary with time. So that what we will study now is a different model for stochastic control problems in which we will explicitly involve a thinking of thinking where we will not think of any controller as a single entity but rather think of every control action as being performed by a different controller. So you could have in general multiple controllers acting at multiple times and that will then give us a relation bit that will then give us a model in which the state of the system is abstracted out. What we only care about then is about what does a particular controller at a particular time know about the about all the events that have happened in the past. Either this includes the actions that have been taken in the past or the noise that has that is arising from the environment in the past. This is going to be our description of the information structure. What this effectively amounts to is basically eliminating the state using the state equation. So one does not really need an explicit state equation really one only thinks of the system as evolving like an input-output model. There are actions, actions read to information, information leads to next actions and life goes on and this is how the system evolves. So this sort of model which is where you eliminate the state completely and you only think of inputs to the system and information coming from the system is what is often known as the intrinsic model. The intrinsic model of stochastic control because it is essentially describing the system intrinsically without having to refer to any particular subjective choice of what we would like to think of as the state. So thanks to this the cost of the system is also described completely in terms of the actions we choose and in terms of any external noise that affects the system. So this is a new way of modeling systems. It is not a new type of system altogether it is just a new way of modeling systems. But the advantage of this way of modeling is that it makes very clear what what really is known to every controller at any instant in time and this allows us to really talk of information structures in a much more holistic way without the interference of the state variable coming in between. So here is therefore the intrinsic model of stochastic control. So the model is as follows. So we have here we will assume we have n agents. So since this model is concerning information structures there is no loss of generality just starting from to begin with assuming that we have n agents instead of one agent. So we will assume we have n agents. These agents receive information they receive information y i t at time t. So this agent i let us say agent i receives information y i t at time t. Now what we will do is of course the information that he receives is a function of all the events that have happened in the past. So this includes the actions that other controllers have taken an action that it has itself taken and so on. But it is also affected by random noise from the environment. So what we will do is we will accumulate all the sources of noise that occur in the problem. If this includes the initial state includes noise by that is present in the system what we call system noise. We will include also noise that appears in measurements what is often called as measurement noise. So we will consider all of this noise as noise from arising from the environment. It is what we call the environmental noise. So we will gather all of these variables into one variable called psi. This is called the environmental noise. So the environmental noise comprises of this comprises of all the sources of noise whose distribution we cannot affect. Whose distribution we cannot affect. So the information that agent i receives y i t is then a function of the actions that have been taken in the problem and the environmental noise. So y i t is equal to let us say psi i t sorry eta i t of psi and u. What is u? Well u itself comprises of u 1 to u t where t are your time instance. Capital N is your number of agents. Notice that we have changed the change the the notation somewhat. So earlier N was our time horizon now t has become our time horizon but hopefully that will not lead to any confusion. Now the so u u u is u 1 to u t where u 1 to u t each of these u sub t is in fact itself composed comprised of u 1 t to u n t where this here is the vector of is the collection of actions control actions of the N agents at time t. So therefore once we we can we can put all of this in and write out this particular information equation here. So this is the information equation so or this is you can say not the information equation it is the observation equation. This this is the the the information the observation equation of of agent i at time t. Now notice that there is we need to make this a little bit sharper because this is written with a lot of generality right now here you have y i t which is the information of agent i at time t being written as a function of the environmental noise and all the actions. Now all the actions u here well this is it is all right to write as you know as as a as a general form but then one has to be careful in this matter because after all u is being produced using using information. So remember the the action u i t itself is being chosen as a function y i t gamma i t of y i t. Now I call y i t the observation equation here but we can also think of it as an as an information equation because any previous observations any any any observation any all the information that you have can be written in this particular form. So although this is the this is the observation this you can say is the one can think of this as the observation that you have had at time t and then accumulate the previous observations in it or we can simply say put all the entire vector of information entire vector of information that the agent has into one one equation and write this out this particular equation out as an as an as an information equation. So this is this here is the observation equation can also be considered the information equation. So then if this is the information at time t then the action is being at action of agent i at time t is being chosen as a function of of this information. Now the the the the the subtlety that comes up in because of this these these closed loop equations you have action leading to information information leading action leading to information information leading to next action and so on this the subtlety that comes up because of these this this close these loop equations is that it can happen that they actually are ill defined in the sense that in order to for you to define u i t you need the knowledge of y i t but y i t itself is being written as a function of u i t. So there could be possible there could be problems such as the that these equations are not well defined because of the closed loop nature of the problem the other other could be that these these equations are also lead to a deadlock where neither this is where y i t cannot be defined and y i t and u i t cannot be defined form this set of equations. So this could so one has to make sure that that causality is followed when one writes these equations. So these equations let me write this in a different color here that the eta i t of psi comma u must respect causality. So what is causality well causality simply means that the causality here would mean that actions u the actions u j u j t do not affect so the action u j t that is taken by agent j at time t this does not affect the information y i t y i t dash let us say for all for all i comma j and p dash less than t. So for so the action taken in the future cannot affect the the the information that is that is present in the past. So act so in in other words the information is only traveling forwards in time the actions taken in the past affect the information of the future the actions taken in the future should not mathematically be allowed to affect the actions taken in the past. This may seem like quibbling but this really is since one is talking of an intrinsic model we have to invoke in it a direction of of time here and that direction of time unless unless it is given given by a specific in a in an explicit way is without that time only is an index so one has to invoke invoke some sort of directionality to time and therefore causality when one talks of information structures. So another assumption here is because I am talking of actions affecting information across time another subtlety here which one has to be be careful about is this is here we are also assuming that there is there is a fixed clock we are assuming that time is is following a fixed clock which means that the the the order in which the the in which the players act that means the time of action this means that the time of action of players is not affected by actions chosen chosen in the game so you can the timing at which players will choose their actions are is not affected by the actions that are being chosen in the game so such such games or is chosen in the in the in the problem so such problems are are called sequential problems so we are only concerned with sequential problems if one violates this the this assumption in which so that means time itself then is not a fixed variable or an index but rather a random variable whose value depends on the action then that leads to a completely new level of complication they are what are called non-sequential problems so without a fixed clock we it leads to non-sequential problems now assuming therefore that the that there is a fixed clock and assuming that we are not violating causality we we eventually get a set of loop equations like these where the information where you have information equation and we have also and the the action equation all of this well defined now what is the purpose then of of taking this action well the purpose of taking the action is then so these actions are to be chosen in order to minimize a certain cost so the goal of the problem is to minimize a cost that depends on minimize the following cost let us write this in the following way so we minimize the cost l of u 1 till u n comma psi where u where each of these u i's are chosen as a function of their respective information so notice notice that I have a u superscript i here so u superscript i is then the vector of all the actions taken by agent i over over time so it is u 1 u i 1 to u i t actions of agent i over 1 to t all right and now remember that these u's are to be chosen as a function of of their respective information so u i u i t is a function gamma i t of y i t so this therefore has given us a new way of formulating this problem so we have we we minimize this over these functions gamma 1 to gamma n so a new way of formulating this this problem you can see that if I if one introduces a state space model in between that means the state variable and so on those sort of problems can also be reduced to two problems of this form but the main advantage here in in in formulating the problem in this particular way is that we have gotten rid of the distraction of state we really do not rate to concern ourselves with what the state of the system is because the privileged position that the state of the system has in the earlier model is is is removed we only care about what information is need is available with the agents and what the cost is and how the action affects cost and how the action affects the information of the other agent so our our way of looking at information structures now also has to change the information earlier our information structure was we were talking of what subset of observations and actions were available to erase agents. So today now we do not need to think in those sort of ways information structure for us is completely defined information structure for us is completely defined using these functions through these functions eta i t so the functions eta i t so the functions eta i t where i goes from 1 to n and t goes from 1 to t these this collection of functions basically describes who knows what at at each at each time right so this these functions describe the the who which agent knows what information at at at each incident time right so therefore these these functions can together be thought of as our description of the information structure. So these functions are description of our these functions define this defines the information structure of a problem so thanks to this we can actually we can in fact talk of information structures in by just looking at the description of these functions so so what are they an argument of what what what values do they take and so on the let me also introduce one small piece of notation here this particular this this expected cost that has been written out here we can and this has to be of course computed once you plug in the ui as a function of their information so once one does that we get the following which is gamma 1 gamma 2 gamma n comma psi now notice that so I have introduced gamma i here gamma i here comprises of these p functions gamma i t where t goes from 1 to t and each of those gamma i t's are themselves functions of y i t right so this is gamma i t takes value takes as argument y i t now when I write gamma gamma 1 here with a bracket like this it does not really mean that the same argument is being supplied to all all the all these t functions they would each have their own respective arguments and that has to be plugged in and that I am just assuming that that is understood here but the point the main point of introducing this notation is that this now tells gives us a notation for the cost as a function of the policy of each player of each agent so this is gamma 1 now is play agent 1's policy gamma n is agent n's policy so in terms of that we can write out the cost and therefore the problem for us is to find these functions gamma 1 to gamma n such that we minimize j of gamma 1 to this cos j of gamma 1 to gamma n this is our stochastic control problem intrinsic model intrinsic form of the stochastic control problem so in the next lecture what we will see is how this this particular intrinsic formulation actually helps us understand information structures in a much in a much cleaner way so that is coming up in the next lecture.