 So, let us before we go to the next example, let us revisit once more the example that we just saw with where the agents had imperfect but identical measurements. So, in this case, notice that when the agents have imperfect but identical measurements, then effectively what we are doing is what is happening here is that when the whatever be the measurement, see whether it is omega 1 or omega 2, the agents were both agents were playing the same action. So, this agent was, so the action of agent 2 in this particular policy was to choose R in this case and action of agent 1 was to choose U. So, what appeared in the cause therefore was U, R in this case and similarly U, R in this particular case. Now, however of course, the cost itself changes because you have, because it the cost also depends on the value of psi. So, when it is omega 1 here, you have 0 whereas when it is omega 2 here, you have 3, so the cost itself changes. But nonetheless, there is something to be exploited here because both agents have the same information, we can actually try and make a sort of smart use of that particular piece of information that we have about the problem. So, in order to do that, let us look at this expression once again, the expression that we have for J. So, remember J of gamma 1, gamma 2 is the expectation of L of gamma 1 of y 1, gamma 2 of y 2, comma psi. But now, because both agents have the same information, what we realize is that y 1 is equal to y 2, these two are actually equal, they both because, this is because both agents get the same information. So, therefore, what we can do is the following, we can write this as expectation of L of a new function gamma tilde of y, where y is this common information of both agents. So, y here is equal to y 1 equal to y 2 and gamma tilde, gamma tilde is simply a function which takes a pair of instead of taking instead of taking one action, it actually takes a pair of actions. So, the pair of actions that it takes is gamma tilde 1, gamma tilde 2 and its pair of actions is drawn from this set, it is drawn from the set u, comma d cross Cartesian product with L, comma r. So, it can take one of, so this Cartesian product basically talks of this, it is all the pairs that are possible here, so it is u, comma it comprises of u, the pair u, comma l, it comprises u, comma r, d, comma l, d, comma r is this is where this is the, these are the pairs of that these are the pairs that that omega tilde, the gamma tilde can take its values in. So, thanks to this effectively what we have done is we have now written out this new problem now just looks like a problem with one agent, one agent having an information y equal to y and whose actions are taken in this set of pairs. Now, why is this equivalent to the earlier one, well it is easy to see because now when I all I need to do is when I in order for me to specify a policy now, what I have to do is write the values of gamma tilde, gamma tilde for each value of each value of y and y being equal to y 1 equal to y 2 as can take two possible values. So, so you have y equal to omega 1 stroke omega 2 or y equal to omega 3 and in each of these cases you have some value, you have a you have a possible value for gamma tilde. So, gamma tilde itself can take values from this set of four possible four different values, you know each value being a pair of pair of actions here, right. So, for example, one strategy is as an example, one strategy could be that this is equal to equal to the strategy where you take say omega in in the case of omega when when the this thing is omega when the information is omega 1 or omega 2 you play the strategy D, you play the paired action D, R and when it is omega 3 you play the paired action D, L. So, what is this particular strategy, well once you substitute this you realize that you can see what this is, this is actually the expectation of this becomes the expectation of D, R, sorry this because this the the cost of this policy becomes L of D, R times comma psi but psi itself remember is can take two different values omega 1 or omega 2. So, it is omega 1 times probability of omega 1 plus L of D, R comma omega 2 times probability of omega 2 plus L of D comma L comma omega 3 times probability of omega 3. Now, what is this expression, well this expression is nothing but the event is nothing but j of what we had earlier written out as the as the cost of the policy where the first agent where agent 1 is playing D, D and the second agent is playing R, L. So, as you can see what we have done is by combining these two agents into one agent this and who plays the strategy gamma tilde the combined agent plays the strategy gamma tilde we basically reduce this problem to a single agent problem. So, and this is j of D, D comma R, L in fact, if you recall was in fact R the optimal policy that we had it was equal to 0 points which was whose value was equal to 0.6 right. So, so why have we been able to do this combination the reason we have been able to sort coalesce gamma 1 and gamma 2 into a into into into one into one function which can take whose values are simply a pair of values is the reason we have been able to do that is because we have we have this here which is we have y 1 equal to y 2 and that is what we are denoting by y. So, because we have been because y 1 is equal to y 2 there is it is essentially there are really no instead of two different functions it is actually equivalent to having one function but taking a vector of values or a pair of values and that is that is the only little simplification that has been done here and that then effectively makes it a single agent problem. So, the lesson in short is that once agents have identical information the there is barely no nothing multi agent about this problem the the the two agents can be thought of as one one agent just taking value whose who just takes actions in a different space right. So, all the complexities in team problems therefore arise when agents have non identical measurements agents having imperfect or perfect measurements but having identical measurements reduces the problem to a single agent problem right. It you can also see that this this problem can reduces in in some ways to to a to also a problem of perfect information. See one can one can do the looking at this this cost expression here what one can do is one can come take take this come this particular combined cost as as the cost of as the cost of a new event which is omega 1 stroke omega 2 right. So, we can we what we can do is we can we can come we can write out the following. So, we can think of a new event so we can think of omega as comprising of not 3 but 2 events the first event is omega 1 stroke omega 2 the third is the second event is omega 3 itself and now what is the probability of omega 1 stroke omega 2 this this is probably is what was the erstwhile probability of omega 1 or omega 2. So, this is equal to 0.6 and the probability of omega 3 remains the same it is equal to 0.4. Now, what happens to what is then the cost in this event in the cost in this event. So, if you have any pair of actions a comma b and if you if you wanted to define the cost for this well this cost can be thought of in the following way this is this cost can be thought of as the cost of taking action a comma b in omega 1 plus times the probability of omega 1 plus the cost of taking action a comma b in omega 2 times the probability of omega 2 divided by the total probability of omega 1 and omega 2. So, effectively what you are doing is taking a weighted average of the of the cost corresponding to omega 1 and omega 2. So, now with we can now once we can basically just think of the problem as if it is on it is on this new space here the new space let us denote it by omega tilde it is as if it is on this new space omega tilde and where the new event here omega 1, omega 1 stroke omega 2 occurs with probability equal to 0.6 and you can see that you can check for yourself that these this will all come down to the same earlier earlier expression. So, what is happening is a problem with imperfect measurements but with identical measurements for both agents is not only equivalent to a problem with one agent having that information but in fact equivalent to that one agent having perfect information itself because we can you know remodel the cost and remodel the space of events in such a way that the information is effectively you know perfect on that remodeled space. So, this is this is a little bit of a digression but it helps you think of what really is the complexity when we have when we have team problems like this. So, team problems the in team problems the complexity all arises from the diversity or asymmetry of information with the agents. If multiple agents have the same information then there is not much you know there is not much interesting that is going on because in many ways it is really a single agent problem there. So, with this lesson let us take another example which is also kind of extreme example but let us do it for completeness sake which is suppose both agents get none of the agents get any measurements whatsoever. So, this in this case so we are assuming the case of no measurements. So, here is another example suppose you have no measurements. So, this is a special case of identical measurements but rather extreme special case because we are assuming no there is no information with the agents. Now, if agents do not have any information whatsoever then their action is the same regardless of the value of the of psi. So, the value of psi does not affect their information the information they get is a constant independent of the value of psi. So, as a consequence of that their actions are also a constant. So, agents therefore have to pick only here not a policy but just an action one for each agent because there is nothing that they have to decide as a function of their information. So, there are as a result to there are only two policies and two trivial policies for the for both agents. The agent one can choose the policy to always play up or always play down and agent two can choose the policy to always play left or always play right. So, the therefore the cost matrix then becomes something like this. It is easy to write it out we will just quickly write this out. So, you have agent two here agent one here agent two can take action either left or right agent one can take up or down and you can evaluate what the what the cost would be by using the same logic as above. It turns out that if it is UL it is 1.3, if it is UR it is 1.7, if it is DL it is 1.5 and if it is DR it is 1.4. And then in that case it is easy to see that the optimal the optimal cost is 1.3 and the optimal policy is to for agent one to always play U and agent two to always play L regardless of anything else. So, they do not get any information this is all they have to do. So, this here is another example. So, let us call this example 3. Now let us look at another more involved example. In this example now we have non-identical measurements for both agents. So, this is the example with non-identical. So, in this case what we will assume first because we are building up things slowly we will first we will assume first that the case where the first agent gets no information. So, Y1 is empty regardless of the value of omega regardless of the value of psi. So, Y1 is empty. So, the first agent gets no information, but the second agent here gets to see the exact value of psi. So, you can say Y2 is belongs to. So, Y2 here belongs to either omega 1 or omega 2 or omega 3. So, he gets to see the exact value of this of the exact value of psi. So, in other words I1 you can say is equal to sigma of this one set he gets omega 1, omega 2, omega 3. So, this is essentially saying that regardless of what happens of the value of psi whether it is omega 1 or omega 2 or omega 3 agent 1 has no way of distinguishing. Agent 1 can just cannot tell whether it is omega 1 that has occurred or omega 2 or omega 3. So, for him they are all identical that is equivalent to him getting no information. Agent 2 on the other hand is information is sigma of these individual sets. So, he can tell if omega 1 occurs, omega 2 occurs or omega 3 occurs separately. So, this is what agent 2 can distinguish. Now, how many policies are there for each agent? So, now let us do this calculation a little more closely. Agent 1 because he has no information his policies are trivial. He just has to choose an action because there is nothing to base the action on. So, he can choose he has two different actions. So, he has two different policies and these policies let us denote them as U and D. So, he has two policies U and D. Now, agent 2 on the other hand he has a much more finer amount of information and he can make use of this information to decide his action. So, agent 2 can now choose he has to choose an action for every possible piece of information that he gets. His information can take three different values it is omega 1, omega 2, omega 3 and he can take his action can take two different values which is L and R. So, the number of policies for this for agent 2 then becomes 2 raised to 3 that is because for every piece of information which is omega 1 or omega 2 or omega 3 he has two possible actions. So, he can take action L or R in omega 1, L or R in omega 2, L or R in omega 3. So, this gives rise to two choices for each value of psi and therefore two raised to 3 or equivalent or essentially eight different policies. So, so he agent 2 has agent 2 has eight possible policies. So, now what do we how do we list these policies out well in we can list them out in a very simple way when I can say well the if I want to write out here a policy in which let us say gamma 2 of y 2 takes well equal to a, b, c if y 2 equals omega 1, y 2 equals omega 2 and y 2 equals omega 3. So, a policy this is a description of a policy but we can write this more briefly as something like this a, b, c. So, this is a policy. So, for instance if I for example you could have a policy it is saying say L, L, R L, L, R here would be this here is a policy in which if omega 1 here is the information then the agent plays L agent 2 plays L if omega 2 is the information then also he plays L and if omega 3 is the information then he plays R. So, you can see here there is going to be there are going to be eight different combinations like this. So, once again we can now write out j of omega 1, omega 2 for every j of sorry gamma 1, gamma 2 for every value of gamma 1, gamma 2. So, this is going to be an even bigger table. So, let me make some space here to write this out. So, this for agent 2 the choices are as I said quite simple agent 2 has to choose either up or down, agent 1 sorry agent 1 the choices are simple he has to choose either up or down and for agent 2 he has those eight different policies. So, he let me write out these policies first policy is L, L, L this is effectively the policy where agent 2 is always playing L regardless of what information he has another policy is L, L, R a third policy is L, R, L another policy is L, R, R another policy is R, L, L then L, R sorry R, L, R and R, R, L and finally last policy is R, R, R. So, I actually do not need such a big table I can make this shorter like this. So, now what we can do is as we have done before we will fill out the values of J for each pair of policies for each policy like this. So, I will fill this out we can compute them exactly in the same way as we did before. So, let us fill these out it turns out here it is 1.3 here 1.7, 1.6, 2, 1.0, 1.4, 1.3, 1.7, 1.5, 2.3, 1.2, 2 again 0.9, 1.7, 0.6 and 1.4. So, this is the table of values that we find. Now it is quite easy to see here that once again the lowest possible value we can get is 0.6 and that value comes from agent agent 1 playing a playing D always. So, agent 1 here is always playing D this is agent 1 this is agent 2 remember. So, agent 1 playing plays D since he gets no information he is always playing D and agent 2 plays R, R, L that means if it is omega 1 or omega 2 he plays R and if it is omega 3 he plays L. Now you can see although agent 1 here agent 2 here has the information of the value of psi you can see he is only partially using it he is only using it to distinguish between omega 3 and not omega 3 the he is not actually his policy is the same regardless of whether the value of psi is omega 1 or omega 2 because it is R, R, L here. So, as a result this policy is actually effectively the same as this policy that we had here the policy that we had in the previous lecture it is actually the same as this particular policy because this is where he was choosing R when it is omega 1 or omega 2 and similarly for agent 1 as well. Agent 1 there was playing the constant was always playing D regardless of the information. In this case in this example the agent actually agent 2 actually does not have the information. So, he is he so he is obligated to play the same regardless of the value of psi. So, and we and you can see as a result we get in fact also the same cost that we that we got earlier we got we are getting the cost of 0.6 for this pair of policies and that is actually the same as the cost that we that we got out here. Now you can check that there is a that that this this will hold not just for this particular policy but for other policies as well. So, for example if you look at this cost here this cost is for the policy where agent 1 is always playing D and agent 2 is always playing L. So, in our new problem where the agent agent 1 has no information it corresponds to D and agent 2 always playing L corresponds to agent 2 having the information but disregarding it. So, he is playing the policy LLL and you can see the corresponding cost there is then 1.5. You can check the same for every other every other sort of interesting pair that you can find here. So, this is this is therefore another example of a team problem. Now this kind of a problem has been simplified significantly because both agents although both agents have non-identical measurements agent 1 actually has trivial information he just basically has no information. So, he is only taking an action and thanks to that our search became rather easy. But if the problem were more complex our search would also become more complex and we would have to write out a much larger table in that case. So, this is so this is something that we will see more of in the next lecture.