 Welcome everyone. So we have been discussing so far static teams with various sorts of types of information for the players. We looked at an example with two players and in which both players had identical information, sometimes they had no information, sometimes they had asymmetric information, sometimes they had noisy information. So and in all of these cases we saw what the method is to find the optimal policy or optimal strategy for the team problem. Now what we will do is look at a slightly different notion of optimality for team decision problems. The kind of notion that we have looked for so far involves what you can say is a global optimum or a team optimal solution. So this is team optimal in the sense that it is the optimal one for the entire team and it presumes that the strategy has been chosen keeping the entire teams goal in mind. Another notion that comes up which is the one I will define today is what is called a person by person optimal. A person by person optimal is optimal for a person given a set of strategies for the other players. So a person by person optimal is a tuple of strategies such that it is optimal for each player to play his strategy assuming that the others continue to play the strategies from that tuple. So in other words this is a strategy in which neither player from the team would want to change if the others do not change. So in some sense this is some kind of a local optimal solution and that kind of solution is called a person by person optimal solution. So the notion we have looked at so far is called so far we have seen team optimal, a team optimal solution. Now what does this mean? So suppose you have a team problem, suppose there are n agents or n players and they have to choose strategies gamma 1 to gamma n and the cost that they incur is j of gamma 1 to gamma n. This is the setting of a typical team problem. Now a team optimal solution so gamma 1 star to gamma n star is team optimal if j of gamma 1 star to gamma n star is less than equal to j of gamma 1 to gamma n and this holds for all gamma 1 and gamma n up till gamma n. So in other words j 1, gamma 1 star to gamma n star is a profile of strategies or profile of or a policy for the n players such that collectively they would not want to deviate from it. So in other words no other combination of policies is better than this one, it is strictly better than this one. There may be combinations that are as good but no other combination is strictly better than this one. So this is that is called a team optimal solution. So another notion as I said is what is called person by person optimal. So let me use a different color for this gamma 1 star to gamma n star is person by person optimal if j of gamma 1 star to gamma n star is less than equal to j of gamma 1 star dot, dot, dot, gamma i minus 1 star. So notice what I am writing here I have written the starred strategies for players 1 to i minus 1 then for the player i I will write a non-star one I will just write a gamma i here then player i plus 1 I would again I will again write a starred strategy and all the way till n player n again also the starred strategy. So these here the remaining ones are starred and this one here the ith one is without the star. So and the it is a person by person optimal strategy if this is if j 1 so gamma 1 star to gamma n star is a person by person optimal strategy if j of gamma 1 star to gamma n star is less than equal to this right hand side here for all gamma i. So what does this mean this means essentially that the it is for all gamma i and for all players i. So it means that for so j of gamma 1 star to gamma n star is per is his person by person optimal if it has the following property that keeping the others fixed at their starred values it there any player i would not want to deviate from his starred value strategy that means there is no other better strategy for him there may be other strategies that are as good but there is no other strategy that is better for him than his starred than his starred strategy that means the cost keeping the others fixed at their starred at the star strategies it is optimal for this player to continue to play his starred strategy and this is true not just for play for one particular player it is actually true for all players. So in other words you keep say suppose I take i equal to 1 then in that case the strategies of players 2 to n are held fixed at their starred values the strategy of player 1 is is allowed to change from gamma 1 star to any other one is allowed to explore all the other strategies and it turns out that gamma 1 star is optimal in spite of this and it is not just true for player 1 one can also do this for player 2 now for player 2 you keep all the starred you keep the starred values the same for player 1 and players 3 to player n and then allow player 2 to change its strategy from gamma 2 star to gamma 2 to some gamma 2 and then it turns out that any such deviation does not improve the cost and similarly for player 3 as well and floor and so on. So the this point the this this tuple of strategies gamma 1 star to gamma n star this policy is such that no unilateral deviation is from it is profitable. So let me write this here so no unilateral deviation no unilateral deviation reduces the can reduce the cost unilateral deviation from gamma 1 star to gamma n star can improve the cost. Now what is now what is the relation between these two notions we have defined one notion here which is the team optimal and another notion here which is the person by person optimal what is the relation between these two. So one relation which is kind of very obvious to see is that every so one notion one relation which is very obvious to see is that every team optimal solution must also be person by person optimal. So every team optimal solution is also person by person optimal now why is that the case well it is easy to see from this definition here itself. So if you look at this this particular definition here it is fairly obvious so for example for in order to show that gamma 1 star to gamma n star is which is team optimal is also person by person optimal what one can do is you can in this right hand side here in place of gamma 1 to gamma n what we can do is fix all the gammas for players other than player i at their star values and leave gamma i at as it is. So then these then become simply this particular expression what you get here on the right hand side then is this particular expression. So you would get gamma 1 star gamma 1 becomes gamma 1 star gamma 2 becomes gamma 2 star etc and gamma i remains gamma i and all the others remain become star. So therefore the the cost under then therefore what we find is the cost under gamma 1 star to gamma n star is less than equal to this particular expression and consequently gamma 1 star to gamma n star satisfies this inequality here and this is true you could do this not just for player i but in fact for every player. So as a result of this the we can conclude that gamma 1 star to gamma is also person by person optimal if it is team optimal. So this every team optimal solution is also a person by person optimal solution. Now one can ask if the if there is any relation in the opposite direction and remarkably there is in fact such a relation. So let me write that down here so this is a neat theorem now the the notion. So the other thing to note here is that this these notions of team optimal and person by person optimal are not limited to static team problems. So this is these these notions are not limited to static teams. So they apply also to dynamic teams and any other kind of team problems. Also applicable so these are also applicable to dynamic teams. However there if as if one wants a relation in the opposite direction so if one wants a relation in which we can claim that maybe if a person if you have a person by person optimal then its solution then it is also team optimal then that sort of relation needs some assumption about the information structure. So that result is coming up next. So we have this theorem. Consider a static and let L of u1 to un, psi be so this here is the cost function of the team. So u1 to un are the actions of the n players. Let me write this in my earlier notation u1 to un these are the n players actions of the n players and psi is the environmental random variable. So an L of u1 to un, psi is my cost function. So suppose let this be strictly strictly convex and continuously differentiable in u1 to un. So we want this this function L to be such that it is strictly convex and continuously differentiable in u1 to un. So it should be strictly convex in u1 to un and also continuously differentiable in these variables u1 to un for each value of psi then in that case then every person by person solution optimal solution person by person optimal solution of this team is also team optimal. So if you have this kind of a problem then every person by person optimal solution is also team optimal. Now this theorem is not that hard to prove but I will not get into its proof though. But let me let us realize a few things about this and notice a few things about this. See notice that because we are talking of this theorem is asking us that L has to be has to have two properties. It has to be continuously differentiable in the u1 to un and it has to be strictly convex in the u1 to un. It is not asking for any requirement from psi. So you just need that this is strictly this strictly strict convexity and continuous differentiability has to hold in u1 to un for each value of psi that is all that is being asked. Now what this actually presumes also is that when we are talking of strict convexity strict convexity or convexity for that matter is a notion that would that is applicable when u1 to un take continuous values. That means if you have u1 to un which is the action set of these which is the actions that the players can take if they are finite set if the actions are being taken from a finite set then this sort of notion is not even applicable or not even defined. Because like in a in so if you take the example that we had considered earlier where there were two players one could take actions up and down player one could take actions left and right those kind of those kind of teams would not satisfy this sort of thing. Here we would need therefore that L actually has takes values L should be taking values let us say ui suppose we need ui here to be in some r mi. So it has to be a real vector not just some discrete not just a discrete a vector that takes some discrete values then it and consequently so if I write m as the summation of mi and if I and suppose psi takes values in some r d suppose then L here is a vector from r m plus d is a function from r m plus d to r. So this sort of a function is the one for which one can talk of strict convexity and also continuous differentiability is this continuous differentiability again makes sense only if you want to un are actually allowed to take values in in a in a continuous space it is like like r mi right. So this is something to note so this this regime is not exactly the one that that we used in our in the example we have studied so far. A particular type of problem where this is where this property holds where these properties holds is actually a one of our familiar friends it is what is called the linear quadratic Gaussian team. So linear quadratic Gaussians are is a combination we have seen multiple times before in this course and it is reappearing here now again in the in the avatar of a team. So what is the linear quadratic Gaussian team so let us go to a new page and write that so a linear quadratic Gaussian team so what this mean this is often referred to as the LQG team. So what is the linear quadratic Gaussian team so the in the linear quadratic Gaussian team you have L having the following structure so L of u1 to un, psi takes the form u transpose Q u plus u transpose s psi. Now what is this so where u here is just a vector that is formed by stacking up the u1 to un this is your vector u Q is a fixed and positive definite matrix s is just some any matrix and so this is the structure of the cost the in order to define the team I also need to say what the structure of the information is so remember information in a static team was yi equal to eta i of psi so here in this case in the case of a linear quadratic Gaussian team yi is just equal to hi psi where hi is where hi is a full row rank matrix. So what one has to do then is as a player you have what you have to do in as an agent with the team you have to choose ui equal to gamma i of yi so this is an example of a static linear quadratic Gaussian team so I should mention here this is static. Right so this in this in this sort of team what you have therefore if you look at L now if you look at the hypothesis here we have assumed that Q is positive definite here so if you look at this function L as a function of u1 to un for each fixed psi for each fixed psi as a function of u1 to un this function is actually now convex in fact it is strictly convex because because Q is positive definite right so this sort of L as the property that so L is strictly convex in u for each fixed psi so for each fixed value of psi this L is strictly convex so as a result of and moreover it is just a quadratic function L is also quadratic in for each fixed psi it is also quadratic in u so consequently it is also continuously differentiable so it is strictly convex L is also continuously differentiable for all fixed psi so it is also continuously differentiable so consequently the previous theorem applies right so the previous so the previous theorem then what does it say previous theorem says that if you can find a person by person optimal solution of this of this team problem then one you have effectively all you can effectively also find you have effectively also found a team optimal solution so every person by person optimal solution of this team problem is also team optimal. Now in fact this little observation actually is extremely powerful because it helps us in gives a very easy way of finding the team optimal solution of this particular LQG of the LQG team static LQG team finding the team optimal solution of an LQG team otherwise would have been a much more tedious task whereas in this thanks to this this correspondence between person by person optimal and team optimal one can actually solve for it rather easily so in order to get an idea of how one can actually solve this you can observe that because y is so notice that yi is some hi times psi and I forgot to mention the vector psi here is a because this is LQG this is I forgot to mention that this vector psi itself is some is a Gaussian vector all right so this is with some mean 0 and let us say variance covariance matrix sigma. So because now the vector when the vector psi is Gaussian then what happens is then this is actually and this now becomes an LQG team without when the vector psi is not Gaussian it is just some linear quadratic team and even for such a team the previous result would still apply because you would still have that L is strictly convex in you for each fixed psi and continuously differentiable in you for each fixed psi right. So so the previous theorem continues to apply that every person by person optimal is also team optimal for but when psi is Gaussian one can actually say a lot more so when psi is Gaussian in this case one can actually find the team optimal solution in a in a closed form. So when so when psi is Gaussian you can see what if you think about just the person by person optimal solution what you would have there is you would be fixing fixing the these the actions of all the other players players except for I and then solving for the optimal action optimal policy for player for player I. So that that sort of logic actually suggest suggest a certain type of optimal policy and we will come to this particular logic in the next in the next lecture.