 I want to welcome all of you and thank you for coming to the Engineering Distinguished Lecture. This is the first year of the Engineering Distinguished Lecture Series and this is the first hosted by the School of Industrial Engineering. So I'm delighted to welcome all of you and at this point I'd like to welcome our Dean, Dean Meng Chang, the John A. Edwards and Dean of the College of Engineering. He was the winner of the Waterman Award in 2013. His courses have been taken by more than 250,000 students. He's written several books and also started several companies and a non-profit consortium. So Professor. Thank you. Good afternoon everyone and welcome to the fourth Distinguished Lecture Series of Purdue Engineering. We are delighted with inaugural cohort of five outstanding lecturers. Today's the fourth one for sparing time from their very busy intellectual life to come share their thoughts with us at Purdue Engineering. And today I'm personally so delighted and excited to welcome Eva Tardosh from Cornell because I've been following Eva's work for many, many years myself and while Eva really needs no introduction but nonetheless, Professor Tardosh is the Jacob Gold Sherman Professor of Computer Science at Cornell and was the chair of the department from 06 to 2010. She came from University in Budapest and John Cornell in 89. As we all know that Professor Tardosh made a pioneering and fundamental contributions to a wide range of topics including approximation algorithms, selfish routing efficiency, network flow algorithms among many other. And Professor Tardosh is one of the few who are elected to both National Academy of Engineering and a National Academy of Sciences. She's also a member of the Hungarian Academy of Sciences as an external member and a member of American Academy of Arts and Sciences. I could only read a subset of the many distinguished awards she has received including the Gerdo Prize, the Danzig Prize, the Falkerson Prize, the IEEE Technical Achievement Award. She's also the editor-in-chief of the Journal of ACM. The list goes on and on but we're all very eager to hear learning efficiency of outcomes and gains from Professor Tardosh. Thank you so much Eva. So thank you very much and thank you for the invitation and the opportunity to tell you all a little bit about the area that I currently find very exciting for research. I guess there's a bunch of words in my title that maybe I'll come to explain in a second and a bunch of co-authors listed on my slide that I'll maybe emphasize as we get there but this is work that I have done over the last number of years including a bunch of number of my students. Maybe the one I would promote the most right this very second is the one I listed first, Todoris Lycoris who will be on the job market next year. So if you guys are interested this is a great student to possibly hire. So I'm going to be interested in large population repeated games. So there are two words here. One is the game and second is there's something two different senses it's a large game. I guess one example of this large game is traffic routing and with these two schematic pictures I want to suggest this could either be packet traffic or car traffic it's depending whether you want a computer science or a more civil engineering application for traffic routing. What I mean here is that individual packets or individual cars have a source and a destination they want to go between the source and the destination and in this particular version they want to avoid congestion that they you know get faster to destination or less likely to get dropped on the way to destination if there's not so many other cars or packets going on the very same path. I can think of this as a centralized optimization problem where you would want to get all the cars to their destination as fast as possible but instead I want to think of this as a game on which every packet think of it as a car driver or owner of one of those packets want to get to the destination and you don't that much care what happens to the other cars as long as your car gets to destination very fast. This is a game because as a car driver you're both optimizing your objective function but at the same time you're causing an effect on the other people in particular you're causing congestion on the road that's not so good for the other people driving on the very same roads. But I'm going to sort of the most recent work or the work that I'm going to emphasize at the end or last piece of my talk I want to especially make this more realistic in this driving example or packet routing example you want to think about what happens as people repeat this exercise of routing packets or routing cars over and over again but maybe it's not exactly the same traffic if I think of internet packet traffic you know maybe you want to read the New York Times you get a lot of packets from wherever your New York Times text originates from and then you go to the board you read all the articles and you go and read something else or actually you go home and stop reading and someone else reads the New York Times instead. So packet traffic or car traffic as the same story is not completely stable there's some amount of predictability we don't usually want a single packet we want the whole article from the New York Times and similarly lots of the car traffic has the extremes of cars going from you know typical home destinations to typical original destinations there's a fair amount of traffic. This is what I'm going to call a repeated game it's repeated because every single minute or every single hour there are new cars driving. A different application that I was thinking slightly mixing in in in the talk is one where instead of we're driving or writing packets to avoid congestion we're something that's more sort of auctiony or auction based application is advertising auction so as you guys might all know we get a lot of things in the internet quote for free and what we're really doing is paying it paying it with watching advertisement and in particular there is a very vibrant and interesting area of how to best sell advertisement there are a lot of interesting aspects there including how to do this to actually feel like you're not bothered but at the same time you are buying stuff they advertising but one way to think of this again is a repeated game. Advertisements on the internet say next to Google are fractions of cents what one of those things cost so it's not that any advertisement is particularly expensive or painful but instead the volume of advertisement is so high that the whole industry in fact the very rich industry is living on this so this is millions of dollars fraction of a cent at the time. Definitely a repeated game in a sense that every single second multiple add opportunities show up and advertisements happen come and go and again just like in the previous case it's not quite the same thing every single second other things change in the word the something comes out in the news and that affects of what's popular and not popular and other things of that form change. So I want to think of this as a model of a repeated game and I wanted to maybe start telling you what I mean by repeated game is a bit more abstract setting so when this picture starts to illustrate it's times going from sorry left to right and in the first time period everyone does something and then you know something happens and then times time goes on and the different things happen to them. As a mathematical model of what I want to think about the players here the participants either the routers or the advertisers or Google as the case may be I will assume they have some sort of value or cost associated with the outcome so they either trying to minimize delay or they're trying to maximize value or maximize utility or some other goal like this and the assumption I'm going to make for most of the talk though I'm going to actually send a little pre sort of what might be classical other options is that what people are doing in the situation is learning from data. So one thing that changed or one thing that's prevalent in this class of applications because of the repetition of these actions over and over again that as you participate in this interaction you have an enormous resource of an enormous amount of data from your past interactions so going back to the advertisement auction even if you just participate in ad auctions for a couple of days that's incredible amount of data that's right at your hand of what happened or could have happened during those couple of days. If you're Google and have a couple of days of data from Google of how much everyone wants to pay for the ad that's an incredible amount of data rich for learning from and the same thing sort of happens in traffic routing though maybe that's more engineered in the car traffic version I guess you have to listen to the news to get better information in packet traffic they trying to ping different destinations to figure out you know how congestion is at different various places there may be one has to be a little bit careful of how much information they're getting but again it's a repeated game where past data of what happened in the last 10 minutes what happened during the last year last week is definitely useful and I'm going to take the attitude that everyone is learning from data what I'm going to ask is if everyone is learning from data or if most people are learning from data what can we say about the system performance that's sort of my basic question and then as a follow-up question if I worry about that the traffic might not be totally stable here that situation might change I also want to know that if I can maybe prove and that's what I want to try to argue that good things will happen if everyone's learning how long do they have to be around to make sure we learned enough if everything is super super changeable and every 10 seconds humanity becomes a whole other humanity who behaves totally differently than 10 seconds ago then probably we can't learn there's not enough data about us around but if you're somewhat more stable than we can and it's interesting to quantify how stable things have to be for learning to start working well so that's sort of the high level question of what I want to ask I guess I should have said in the beginning I will start by giving you some examples of what games and learning means and what the classical solutions before thinking about what learning would mean in this context but if there are any questions any moment feel free to ask and actually those of you standing back there there are empty seats here if you're willing to come down okay so examples so maybe the the traffic ratting is always the easiest as an example to think about and while I somewhat alternate in the applications I talk about between the auctions and when you want to buy something versus traffic ratting traffic ratting is an example that everyone easily can associate with if not because of internet traffic then because we all drive our cars and therefore it's a natural example so here is the traffic ratting as a very very simple basic example so the assumption is that the amount it the time it takes to get through a certain edge in this example has something to do with the congestion on the edge so I gave you a super simple example here where either it's a fixed amount of time so on top I guess it's not working well takes one hour to travel on an edge and then two other edges that are more congestion sensitive if there's x amount of traffic it maybe takes x over a hundred amount of time and I offered you a particular solution here where the traffic were to split 50-50 between two upper pass and lower pass now one disturbing edge there is this really super fast zero minute edge connecting the two edges which at the moment we're not using I would notice that the amount of time to get through this network is an hour and a half half an hour because x is 50 50 over 100 is a half plus an hour on both edges however this is not a game theoretically or selfishly sustainable solution that is it's not an ash equilibrium for a very simple reason of that zero cost edge because there's this pass over there if I'm one of those hundred drivers I would think wait wait wait I have a much better solution if I follow that red dotted pass I get half plus zero plus another half that's only an hour and if they all do this so that says this is not an equilibrium or not an ash equilibrium as it's called and if they all do this we are definitely all in trouble we all follow this every all hundred of us then x unfortunately went up to a hundred and hundred plus zero plus a hundred that's not taking two hours so what happened here is we all verse off because we all behaved selfishly now if you have seen this example maybe you're more familiar with the whole thinking I guess let me point out some basic fact for anyone who hasn't seen the example this is an ash equilibrium you might wish to go back to the previous situation but solely alone you cannot the trouble is the other guys are doing this if you were to follow the upper pass as used to that now also is takes two hours so you really can't help there is nothing you can do in fact this is the unique nash equilibrium this is an ash equilibrium you can't help it this is the delay you get and this is the unique nash equilibrium and what really happened here from just driving home the game theoretic aspect that while I was selfishly optimizing for myself I caused trouble to 99 other people and cumulatively the pain I caused to 99 other people was bigger the advantage I got myself so you know cumulatively 99 other people caused pain to me and we all were soft this is the example this is this couple was originally discovered by brains and German scientists mathematician and I think this is definitely example got me interested in the area and the very first thing that I started to work on and maybe what of all the work that mentioned what I'm most known for is the following theorem that says that in some way this equilibrium let's go back to this is not quite as bad as it seems yes two hours is more than an hour and a half but if you think about it in the traffic routing in internet packet routing and not car driving you know packets come pretty fast so we're talking about two seconds versus one and a half seconds maybe that doesn't make such a big difference or maybe it does but not infinitely bad and in fact we actually proved a theorem that I kind of still very much like saying that if you if you cheat a little bit and instead of comparing your solution which is on the left the cost or the tall delay in an ash equilibrium not compared to the optimum which indeed was better but compared to an optimum that has to carry higher amount of traffic saying maybe in plain English that if you design your network to be capable of carrying more traffic than you're going to have then letting people selfishly drive whatever they want is okay that's what this really says which is a nice justification or thinking of something that the internet providers all do if you have a problem don't control the traffic just put more capacity so it is designed the network to carry more traffic but where this this this was the early years of an area of thinking of outcomes in games and the classical notion that we started many of us started to think about it and got dubbed the price of energy is thinking about this ratio of how much damage do I cause in the Nash equilibrium the ratio of cost of the Nash equilibrium compared to a socially designed optimal cost if I go back to my brace products example it was two over one and a half that is it turns out that one and a half is actually the optimum to be more honest I guess there are really two definitions I should be careful with this ratio only makes sense if it's all pure costs or all pure utility and I actually have like two definitions up here to make sure the price of energy is always above one it's a bad thing if it's big so it's either the cost of the Nash versus optimum or in case of a utility question where we're not reaching as high utility as I should I reverse the ratio to keep it above one this note this definition was first proposed by kudzu ps and papa dimitriu a couple years before our paper and really took off since and there were a beautiful set of many great results some of them are traffic writing based like understanding this price of energy in lots and lots of different congestion or traffic writing games and also lots and lots of lots of auctions that is first price auction second price auction multiple prices public good auction variations of every particular auction type you know some are better than others as we now know but we have a pretty good understanding of what is this price of energy I'll come back a little bit of how we prove this because it will be useful or maybe I in fact I have a next slide to tell you how this gets proved if I show you the next slide of how we proved most of these results in fact everything that's listed on the slides you might get the feeling that oh yeah that's kind of easy and maybe it is not so bad so here is how this price this scheme of how this gets proved what you need to prove about the the result you you're arguing something about the Nash equilibrium and in some high level I really only have one sink to go for and that's that inequality up there I know that somehow these people don't want to switch to whatever they should be doing in the optimum that's roughly all I know I know that I know what I want them to do which I point out that they don't know what I want them to do but I do I'm a designer I know hey hey that's the way you do and for some of that reason they don't undo it that is the current cost is less than the cost of the optimum and this is all we're going to end up using and the class of games that we have been all using but Tim Ruff-Garden in a beautiful online paper summarized as a smoothness stop proves says that if a game has the following interesting or weird looking property then it has a good price of energy and I'll wrote down the inequality it says that for any solution a if I do something super weird on the left hand side and some of the costs of every player as they single-handedly change to the optimum while everyone else they said they current solution and I some of those costs then I have an inequality that connects that to the optimum cost and the current cost don't worry about the form of it I'll tell you in a second what you should mean if you have this inequality which you term the smoothness inequality then it's easy to see that Nash equilibrium has high quality because well remember they don't want to switch to these solutions so clearly they cost is better than this and dropping out the middle term I now have an inequality connecting the left and the right all I have to do is rearrange it and I get an equality and that is the boundary have you could say that this band was designed to make the proof easy and there are some truths in this or maybe backwards this form was recognizing that this is what we're doing all the time there are many many papers predating Tim's paper by 10 years or over the 10 years before it that basically use this framework without being as good as he is in recognizing that there's a framework here one way to phrase what he's saying and I think that's maybe my favorite way of phrasing it that what this inequality says is that if our current solution is really really bad much much bigger than the off look at that inequality if cost a is significantly above the optimum cost that is if I look this London times opt plus the cost it's very heavily dominated on cost of a then I can do something good one of these guys switching to his optimum will improve the situation it's not Nash because someone will want to switch that's what this inequality really says okay I went through this proof I'll come back to this in a second and I guess maybe not worth going through the slides again but these are all the price of energy proof size I showed you are all based on this scheme I put in a one comment here before I go and switch to the learning which is what I really wanted to talk about that if you want to design a great game a super great game then maybe this price of energy bands that I'm proving here which maybe I have to go back to sorry that slide which says price of energy bands on the smaller numbers one and a half two you can say that well wait a minute if I can help everyone get you know twice as fast today destination I wish to do that very true these are not optimal bands and sometimes we do have better bands sort of the classical bands are this small constant range that is not close to one but not in a hundreds you know one two one and a half stuff like that this is in contrast to this some games there is what's called tragedies and the classical example is the tragedy of the commons the tragedy of the commons without going through the details is a game in which we all get to graze our cast on the petro grass and at the Nash equilibrium there will be a million cast and not enough grass and all our cast will starve or will be the verge of starvation that is there is good welfare to be had in the system but for some reason selfishly we ruin it tragedy of the commons is full in life is examples with tragedy of the commons certainly environmental damage is a common example of the tragedy of the commons actually some traffic roughing is an example congestion sometimes can be an example of the tragedy of the commons so there are examples where selfish behavior can but can cause unlimited damage the fact that it's a small constant factor I view is a good thing or at least not as bad yeah this sorry maybe I should go through this slower I exactly mean yes cost is something should be low at Nash equilibrium so what we know at the cost of phi is the second line here A is an ash equilibrium then your cost at A is less than any other action this A star A star is the optimum cost the one that the social designer wants you to do and as you have seen in the brace paradox example the social designer might want you to do something that code good for you but it's not equilibrium right the A star is what the social designer would want you to do that's socially good overall it's good but it's not an ash equilibrium or may not be an ash equilibrium A is the Nash equilibrium here and I'm going to keep that notation star goes for the socially optimal thing which might be good for all of us certainly good as a collective but it's not necessarily good individually for you or in the brace paradox example even individually it was good for you what was really wrong is that once myopic objective function it seemed like I can improve my situation by driving differently or I could improve it temporarily while the other people didn't discover it yet does that make sense okay so now comes the question so this is all happened maybe on the last 15 20 years there's a lot of research on this and I'm happy to talk more as long as you are one but I wanted to focus on learning connecting this to learning and learning from data so to ask that question you can ask wait wait wait Nash equilibrium what was Nash equilibrium and what does it really mean so Nash is a outcome of selfish behavior is common standard knowledge certainly in economic circles as an accepted fact that people find the Nash equilibrium but if you start asking economists as well as all the rest of us can certainly understand and ask questions wait wait wait we don't really understand this so here are some of these questions Nash equilibrium in most games is not unique there are multiple Nash equilibria how did people know which one how did they find which one a second this is where the large games come in remember my games consisted all of you reading and your packets on the internet that's a lot of you how does one read rather have enough information about all of us right reading and other routers when we are sitting in front of other computers and reading packages from other places in the internet there is no way a single router would have enough information to know this so there is a informational problem a single player in a game simply doesn't know all the things that's going on and a simple player in the routing game whether your car or even including the radio news or a router doesn't have enough memory and bandwidth to even start memorizing all these other things so there's a lot of information needed and then if you're a computer scientist or maybe even if you're not a computer scientist there's a computational difficulty as proven more recently by Das Kalakis Goldberg and Papadimitriou finding Nash equilibrium in many games is computationally hard so if it's computationally hard then how did these simplistic routers or people find it that's going to be a little difficult to hear if it's computationally hard to find we won't be able to find it so something's wrong with the Nash equilibrium notion so what I want to do today is switch and instead think about what if they didn't find the Nash equilibrium they're just learning now learning has been a really really classical topic in game theory and let me actually start with a couple of proposals of what made this might all start so when we started to think about V meaning the CS econ community about this online games and online auction they may be the first proposal of us that know what's really going on here is that somehow magically by the invisible hand of Adam Smith's or the economy we somehow magically found this equilibrium because this equilibrium is stable and what's really going on is when we repeat the game this equilibrium repeats over and over and over again as you see on this slide even if even up to in the second period we do the same thing because we reached the Nash equilibrium and no one wants to deviate from this we know how to drive to work to be fastest and everyone is happy with the current solution so what that means is this inequality on the bottom here your cost that again a is your current solution is smaller than any other cost you could have had it's a Nash equilibrium so you're happy with it you know everyone else will drive the same way tomorrow so you're continued to be happy with it so I'm going to call this the no regret condition you're not regretting what you did so far because you know there's nothing better you could do and then you can look at data and I guess my data came from Microsoft Bing auction so Bing is Microsoft version of Google in case you're not using it and they too make a fair amount of money on this and they were willing to share some of the data and this is what they bidding on the auction looks like this is a weak verse of data and I have a bunch of one particular keyword I have a bunch of these but it's not looking like it's a stable solution it's not stable something is changing you can decide what this looks like to you to me it looks like they're running some sort of simple gradient descent style optimization algorithm and therefore changing their bids down and some of them going down some of them going up I can show you other plots verse goes up for a while and then goes down again they gradually just think they they think and they're that's I think what they really are doing so I want to propose that what they're doing is trying things and learning and these these auctions as much as traffic routing goes though is an ideal situation of trying and learning because a single auction or a single packet on the internet really doesn't matter you can lose it it's just you can resend it no problem a single auction it's a couple cents you lose it you gain it doesn't matter you can experiment and try to learn from the data and I think that's sort of what these guys are doing learning when it started in in studying game theory and that was a while back actually asked a different question can they learn to find the Nash equilibrium and I guess originally it was sort of as a form of pre-play pre-play was maybe we can play a game before we run the auction so that we find the Nash equilibrium this was maybe the first influential paper here is by Julia Robinson what's interesting about Julia Robinson is that it's a woman and she was the first woman member of the national academy of sciences which is maybe a nice thing to point out about her she's a mathematician this is her only foray in game theory but maybe I think this is what she's most famous for what's fictitious pray is best responding to the behavior of past players so I watch what happened I somehow believe kind of magically not quite true that what they what they not going to move they just going to repeat what they did the last many times so I think of their behavior as random samples from a distribution and I take what's best for me and she was thinking of this is a pre-play as a way it's not a real play it's a way to find the Nash equilibrium and she proved what did she prove she wondered whether it finds a Nash equilibrium she proved that in two person zero sum games it works later on it was also proven that two by two games even if it's not zero sum it still works it's find the Nash equilibrium but it turns out that's about it that's about the only classes of games where this kind of thing will find the Nash equilibrium now what do I want to do I want to think of something simple like this almost like this you look at that the past and try to learn from the past which is certainly what fictitious play does I maybe want to be a little more generous just not pretend that the guy is not doing anything so a little bit better learning and I what I want to model is this I want to model something of the form that when you start playing you might be literally clueless I have no idea but somehow you get better over over time and you learn something from the data what can you hope to learn so here is one thing that turns out to be completely reachable and in addition to telling you what one can say about when you reach it I also want to spend a couple of times convincing you that this is reachable I can reach my no regret condition without the stability so no regret condition remember said that my current solution is better than any other solution with science any other so other solution x this is my going to be my general no regret condition instead of saying that I stabilize and find the current solution that better than any solution with hindsight I want something a bit more relaxed you don't have to find a single solution you can alternate but if there is a really good solution x that is really consistently very good you should please find that that is the condition I want you can change your solution as often as you feel so therefore I indexed my a with time at is what all the players do at time t and I sum this over time that's I guess roughly your average cost I compare it to a fixed solution with hindsight if there is a fixed solution is driving on roots whatever 101 in silicon valley turns out to be such a good idea that it's consistently better than your alternatives please notice it and that way would reach this condition if if highway 101 is not consistently good sometimes good sometimes not good then it's hard to know when it's good and when it's not good so it's harder that's a harder condition but this one just says if there's one solution x that's consistently very good please do at least as well as that one because you reach it by learning I allow you to have a little error there which I'm going to call regret but I'm hoping that regret is doesn't grow linearly with time and in fact there's very simple algorithms that make the regret go square root that is significantly less fast than linearly with time I can even convince you that these algorithms that do this are supernatural very simple algorithms and maybe we're spending a slide just to get you feel good about it it's almost the same as the fictitious play I told you fictitious play was just best response to the past and I says no well that's a little drastic don't you know who knows how about this just randomize a bit so what fictitious play did is chose the oops this is on utilities not on costs sorry chose this thing that was best historically that is chose the single action that's best historically and now what I'm saying instead and that's what smooth here means just make it a bit more if there's a teeny difference don't make such a big difference in your action and add randomness when you can use randomness to the extent it's reasonable more concretely try to choose a distribution maximizing the same objective function but adding a little little bit of noise what I did is add the entropy at the little bit of entropy when possible keep it random when one distribution one one action is a lot better than other than do that but if it's not a lot better than a little entropy and a little entropy entropy is good for you that is keep it random if two are similarly good then do them with similar probabilities and turn out that this version of fictitious version of called smooth fictitious play works very well it I can also tell you and maybe I'll skip this of what it turns out to be as an algorithm if you work out the mathematics but I'll skip this and it generates what I I promised you a square root of t this regret growing sub linearly that is as time not linearly in time but just square root of t this is one of the many algorithms if you add entropy you can add any other regular what's called regularizer some sort of randomness and you will take care of it this is good in optimizing the speed at which regret regret goes down there are many other algorithms that work really well so this is what I want to defend as learning you learned if you did at least as well as very every single action with hindsight or similarly not much regret this is a condition that's achievable this is a condition that I think humanly reasonable I kind of believe people learn this well and I'll come back to this in one second in fact many of us learn better than this it's possible that doing something different on Mondays than on Tuesdays is good for you and maybe you'll discover that too I didn't ask you to discover that fact maybe and you can discover it sure that's even better that is regret is not necessarily a positive quantity you have negative regret if you beat this benchmark which you're welcome to do what I'm asking you as a learner what I'm hoping people achieve is that they meet this benchmark they might beat it or they at least should meet it what happens in a game when people reach this benchmark so the original Julia Robbins on question was will they reach an ash equilibrium and now we know that the answer to this has to be no this is an algorithm a very very simple algorithm and we know that finding an ash equilibrium is computationally hard so a simple algorithm can do it I know it because Daskalakis Papa Dimitriou sorry Daskalakis Goldberg and Papa Dimitriou proved it it's computationally hard and therefore this algorithm is incapable of doing this but it reaches something else and what something else it reaches is called correlated equilibrium so what does it reach it reaches a probability distribution of our play that exactly has the Nash equilibrium property I assumed that they're going to reach and the limit with regret going to zero they're going to reach the equilibrium condition every whatever I did over the past is better than any single action with hindsight that's exactly my condition I assumed they learned this way and it's learnable that way this is precisely Nash equilibrium well almost here is what we are missing we're becoming correlated that this is what's called a correlated equilibrium what's correlated equilibrium is exact same as Nash equilibrium except our place might become correlated so if say the two of us are playing this game then what can happen is the two of us don't act independently as we should in Nash equilibrium what are we correlating on we're correlating on past history we're sharing the history we're learning on the same history and therefore we're correlating and I guess uh to this is maybe uh nice and maybe important concept so I wanted to actually really really drive home what I mean here and try that on a two player simple example that probably you guys saw no rock paper scissor so this is the payoff matrix for rock paper scissor uh rp and s for rock paper scissor uh you win if you zero if you want the diagonal one if you beat the other person what happens so first of all what's the Nash equilibrium of this game I guess we all know the Nash equilibrium here is to completely randomize uniformly play one third one third one third by the way we also know that this is really hard for humans and they're actually cool uh rock paper scissor human competition with people winning trophies for actually being able to beat uh rock paper scissor and if you're not convinced that this is hard for humans then try that there is a old New York Times app I think if you google rock paper scissor New York Times you get it uh where you can play against an algorithm that the New York Times put up for us um I lose and I bet you guys all lose and no I mean you probably don't have to lose if you cheat if you actually really flip coins yes you can do it there is nothing they can do they non-magically read what you're going to do instead if you don't cheat that is you actually try to randomize in a head with no randomization device then apparently they can figure out what you do and they can beat you uh okay but let's figure out what happens if you do learning and to make our game a little bit more interesting I actually changed the payoff from zero sum to putting a nine minus nine on the diagonal that doesn't change the equilibrium it's still one third one third one third but it's not another zero sum game and remember Julia Robinson proved to us that zero sum games are different so no it's not it's not a zero sum game I can drive the equilibrium what we're playing or what I'm playing one player does I can represent it in a 2d picture because there's three strategies here but you probably just have to sum two ones so we really in 2d and so this is the picture the corner side you're playing scissor rock or paper and the green dot is the mixed Nash equilibrium one third one third one third and I'll try to imagine what will happen if we play if say I play against you and say I start playing the mixed strategy and you're trying to learn maybe you start on the mixed Nash equilibrium 2 who knows now singing is randomization is not ideal and you're welcome to quote it and try it out as it turns out when you do a thousand runs you probably will be over one of them you do a bit too often say it one thing is called paper maybe I do paper a little more often if I do paper a bit too often what will happen is your learning algorithm will pick up on the fact that I did paper a bit too often and as a result you do scissor a bit too often to which my learning algorithm picks up on the fact that he's doing scissor all the time I should pick up on that and I do rock all the time and what we're doing in this 2d picture is a spiral behavior on which we're going to chase each other rather than staying in the center as an actual equilibrium or illustrated on the picture we're going to play the off diagonal entries without paying the diagonal entries or result heavily paying diagonal entries that is our behavior got correlated when I'm doing rock then my opponent is seemingly equally likely to do paper or scissor but not doing rock we're avoiding the diagonal which is cool in this example because the diagonal had the minus 9th so this is what correlated equilibrium is it correlates the behavior in a weird kind of way okay so I have only a few minutes left or actually how many minutes 10 10 10 minutes left so I want to tell you something about two putting two pieces together that I sort of gave you a high level view so far one started with Nash equilibrium quality of Nash equilibrium and the second I showed you and some sense of when they're learning they're going to find this correlated equilibrium instead of finding a real equilibrium at pure Nash equilibrium so first thing I want to tell you is correlated equilibrium is actually turns out is equally good for this price of energy results so actually I'll skip this and maybe even skip the following slides which I guess to the speed of time I guess one line of research that involved this my colleagues and students at Cornell we're trying to wonder about how fast you converging to this correlated equilibrium and in fact we can show that the convergence is faster than the root T that you would expect but I'm going to skip this because I want to get to the second part but I really want to more spend my last 10 minutes on is the quality of learning outcome in in games so remember going back to the quality that we talked about in the beginning we had this I guess this is learning outcome okay maybe I skip this too oh okay sorry there's a low insert here that maybe we're talking about to what extent people are learning there's a bunch of different papers including my paper with Dennis McIpple and Vasily Sirkanis thinking about different setups and wondering whether learning is a reasonable model of human behavior maybe the one that's nice to point out is the very top one which is a human subject experience and the very bottom one which is the being auction data they're very different domains one case the lower case artwork is on algorithmic bidders that these are big companies that using algorithmic machine learning tools to bid and the top one was human subject experience and they both of them reasonably well match the great behavior but due to focusing on the next part I want to get to this quality of outcomes so let's actually go back to the quality of outcomes and remind you that we talked about this price of energy which was the cost of Nash equilibrium compared to the optimal cost so that was the price of energy that I already talked about in the beginning the natural generalization that I maybe want to focus on is when they learning so what would learning mean learning would mean I think one of the two things this and the next one up this one would say the cost of the solution the average cost the solution you reached compared to some optimum as time goes to infinity if you learn long enough okay I sum up your cost divide by t that's your average cost compared to the optimum cost this is a bit not realistic and maybe a bit you know fantasy because this assumed that the optimum solution is not changing there's no changing population everything is very stable so what I really want and this is our recent paper is I want this which is maybe too high a price but I wonder what conditions can I get something like this the average cost that you paid compared to the average cost of the optimum that is I allow the optimum to change also and this is my real goal and what I want to do in the last 10 minutes is give you a little favor that putting together the two pieces I already told you we're almost there and I can convince you that oh yeah this is very doable I hope so I hope I can 10 minutes will be plenty enough to do this so step number one I have to step back and just think about learning outcomes learning and no regret condition and again I have to skip one of the convergence things here oh shit okay okay let's do what happened here okay uh what do we have these slides a bit too focused on the speed of convergence which I'm skipping but I can still tell you the price of energy part using these slides so what is the no regret condition we have the no regret condition says that you don't regret any single things this hindsight if it was a consistently good solution you not regret that what do I want you to not regret just the same thing I always wanted you to not regret I wanted you to not regret the optimal solution this hindsight so I used the same A star I always had that's what the social designer wants you to do and I want you to not regret that condition and if I use this condition that you not regret it I end up with the very same chain of inequalities I always used to have using the smoothness condition and end up with the very same band on your average cost compared to this opt which is in the smoothness inequality so what's nice about the smoothness star proves and that indeed was Tim's point when he pointed out the property there's not only about equilibrium it's about the no regret condition as long as you have the no regret condition whether you're equilibrium or not you do have the price of energy band you can change the two inequalities together you get a little deterioration because of the regret error which is what's getting pointed at but as a final piece somehow something was very non-satisfying in that proof because it assumed in a painful way that the population or the optimum is unchanging there's a single strategy with hindsight this A star that's not changing as you go and it's always the same optimum and that's the thing you should not regret so what would happen if I take a dynamic population which is much more realistic where people come and go you repeat the game but the strategies are a bit changing so this is a paper by Todorius Lycoris and Vasily Circanis and myself more recently and what we're doing as an actual concrete model is take a population of games and every iteration every player will vanish with some small probability p what you should think of is that in expectation everyone stays in p inverse times so I'm going to make that long enough that they can learn at the same time with n players and think of n as big n times p people change all the time so the optimal moves like crazy or at least often because the population always someone's leaving but you stay long enough that you're capable of learning or any one person is capable of learning to make something work out here I have to be a little bit more careful with my learning because players might have to slightly adopt these strategies if you think of this player in this particular I don't know if the traffic writing or what kind of game he's choosing between the red and red and yellow option maybe when he came then red was very busy and he was in use to choose the yellow but as time goes on this red guy so vanished and maybe he needs to switch over to switch his strategy so I need them to run a sort of adaptive version of the learning algorithm that adapts this time and I guess I maybe was too optimistic on what fits in an hour but you can have these algorithms adopt all they have to do again summarizing only in plain English is a bit forgetful that is recent experience is more relevant than very far away ones because maybe some people left since then but one trouble that I do want to emphasize and that's sort of the last technical piece of what I was hoping to say is if I really really just want to copy over the proof then I will wish for something that's not hopeful so this is what I would wish to hope I wish to have that your cost as you went over time and things changed over the other players is is good compared to the optimum cost this hindsight where now optimum itself changed also so the one thing that changed in this slide that the optimum may start got time indexed because the optimum changes now this is too hard the optimum as I said the population changing the optimum is changing you can't possibly on the universe have an optimal that you can learn this well if every single iteration you should do something else then how did you learn that so learning cannot achieve this but here is something so optimum can be very sensitive as probably you all know but in case you don't here's a simple matching example there's an optimum solution one guy vanishes the optimum solution totally changes it's an augmenting pass with arbitrary number of changes in it so changes are really big however the learning players can adopt to changes with some amount of changes unfortunately maybe not surprisingly the regret the error term will be linear in the number of changes they have to tolerate not too surprisingly if there are a lot of changes it's going to get worse but there's a this is good enough and what you need is a serum of some sort that won't go into details but maybe many of you do optimization will believe me that oh yeah I guess that could be possible in a large enough game it's often the case that you can have a not quite optimal but close to optimal solution that's stable that has the property that is not too sensitive when you take one or one or two people out to give you some sense in a matching case turns out the greedy matching is a lot more stable than the optimal matching just as an example and there are many other cases when we can do stable solutions and I guess to summarize our serum in a very high level way in many many games where Nash equilibrium has this smoothness style price of energy proof we can extend it to working with this changing population the price of energy with the changing population will not be so great so I guess we call it with three different parameters alpha beta and gamma let me explain what they are alpha is the original price of energy band which if I want to be careful when I make the game big enough that can go to one gamma is a regret loss depending how long I'm keeping people in the game so if the property of change is small then gamma goes to one but otherwise that's something it's a regret loss they're not learning well enough and finally there's a beta beta is how much I lost from the optimum because I wanted the stability again if the game is really big beta can go to one two big games people staying long enough there is no price of energy if the game isn't that big then there's different parameters that quantifies the price of energy again in the beauty here is these are all constant relatively small constants and not in the not deteriorating with the number of players so to wrap up there are a couple high-level messages and maybe some more technical messages I hope you guys all agree that learning is a good interesting way to analyze in game it might be a good way to actually adopt two opponent unlike what I said about Nash you don't know don't need to know who the opponent is and what the hell they're doing so no need to have any prior knowledge about the opponent and actually one feature I didn't mention and not in this work is if the opponent plays badly learning algorithm mistake advantage of the opponent making mistakes whereas Nash equilibrium does not want to tell our technical part in some games and auctions and traffic routing are two examples learning players reach a high social welfare and they can do so even in dynamically changing environment depending on various parameters so learning is good for social welfare and maybe good for individual welfare also in two different parts and thank you very much for your attention I will repeat the question so as long as I hear it it's fine I heard it I'm not so sure I understand but let me try and you can you can clarify the question if I misunderstood it what are the some of the common mistakes I come across when designing or analyzing a game and somehow it seems to me like there are two questions here designing and analyzing so some one or some of these things are like traffic routing is really analyzing not designing that is I did make the assumption which I think is sort of correct that routers want to get the packets to the destination fast I guess the learning or even the smooth the smooth like sort of smooth behavior is something that traffic routers have been doing years before I started to look at it in fact probably before I was born these are things that were happening and all we are doing is analyzing it not designing it and then one issue in the analyzing category or two issues in the analyzing category that's common if you look at that some game there are many tragedy of the commons out there that there are many games where they are bad Nash Equilibria or actually versions where there are some equilibrium are good and some equilibrium are bad the games I showed you what talked about are games where all Nash Equilibria have reasonably good properties so those games at this point we understand pretty well we understand much less and maybe that's a form of mistake games where some Nash Equilibria are good and other Nash Equilibria are bad and what you really want to understand is both two questions do people somehow know to less this learning algorithms will find the good ones or the bad ones and if the answer to this unclear can I help them can I get them to find the good ones can I do anything to induces them to migrate towards the good solutions rather than the bad solutions the second part is maybe your design question of what can I do to design games certainly the auction games are designed so there is a lot of discussion in Google or Microsoft of exactly how should they run the auction maybe many of you know about second price auction or even the generalized second price auction that's the classical auction for for Google there's lots of interesting questions that is not quite this of exactly what they should do in a more modern more flexible environment that's running today that's a fully designed question and then there is the sort of halfway it's a natural game I have a little bit of control I can't can't just design the game from the beginning what would I do with my little control to help them which a good outcome rather than a bad outcome these are very open questions so I don't know if these are mistakes I come across but these are certainly great research directions mistakes won't come across is especially if it's humans this took a very mathematical attitude of what people will do selfishly optimizing the objective function and one very interesting question is most of us don't know what our objective function is so we certainly are better at optimizing it so in many cases bowling biased or interesting human behavior that's not as simple as minimizing delay or maximizing utility is again very open very interesting lots of great research I hope I kind of answered what you tried to ask but I'm not okay it's a great question and maybe I should have paid more attention to this if I think about Google auction that's a pretty full info like so one question is how much information people get and I made it importantly that you shouldn't know who else is playing because that's really lots of information for these algorithms what you really need as a feedback is what your payoff would have been had you chosen another pass so in the traffic example what you need as a feedback is when you choose to drive on 101 again using California because those roads maybe are more famous or because I know them better you need to know what was the delay on some other road which at this point Google or even the news does provide you with that form of full information so yes what I actually any band I showed assumed full information of this sort that you always know what you could have gotten that you paid something else the results do extend to partial information so partial information would mean that when you drive on 101 you literally only know what happened on 101 you weren't on the other roads you know nothing about those it definitely slows down a speed of convergence because you're going to have to try these other things to learn anything about them so the particular bands I offered for speed of convergence that those ones don't extend you have to be like you you're losing a factor because of lack of information but at the high level of what this converges to and how it depends on time that those days the same so yes but I should be a little careful if there is no full information out there another very open direction is the both the full information and what's usually used as a bandit model that you literally only know about things that you tried those are very well understood I think the real life is somewhere in between that is when you drive on 101 you really know what happened on 101 you were there and you know something about the other ones because maybe you heard on the news or something what you get on other things you didn't try is usually partial less correct information and that's a much more open area but even with just the information on what you tried we can regenerate some of these with the converges bands getting worse yeah sure is so I guess he's asking about this this one of these last slides here this guy here but there really two losses the price of energy loss and the loss due to desired stability of the solution so they often beta in this example and that there's something similar about them and indeed something similar in a sense that both are going to one as the game gets larger they both helpful if there are a lot of players around that one similar thing that's really helpful in pushing those numbers close to one I don't I'm not sure it's it's an interest it's a good point and certainly what helps those numbers be close to one is the same thing I don't know if I'm good at at at uniting them beyond the fact that many similar players or many players with similar goals really helpful in both cases cool question so the question was can I trick players into thinking they have negative regret and is it beneficial or ethical it's certainly a cool direction I guess there is some sort of panel discussion that I gather some of us will do tomorrow and I was thinking of raising the the some of these more ethical or different sort of issues that I didn't raise here because I don't know a mathematical answer like I mean certainly your last part of the question is it ethical that's a very good question I don't have anything really smart other than you know all humans have like a panel discussion level things to offer is it beneficial sure it can be I just showed you that if I could show in the brace paradox the very first example if I convinced in base products and some that please don't use that middle edge that's bad everyone is better off every single player is better off and maybe if I can help everyone that will be one place it's ethical to have them though I am not sure I don't ethical that's that's a great great question last I don't know and maybe you have an idea and I'm happy to talk to you offline is is what do you what does it even mean to to how can I treat them like I guess one way you can ask this question and maybe that's something it's worth commenting on when I said that the Google auction is is full information but Google does or Microsoft also if you beat in the auction say you want some ad and you willing to pay five dollars for an ad they give you as a response they give you a payoff curve they not only tell you how many how many how many times they showed you your ad because of the five dollars and how much you have to pay because of this but instead they show you what would have happened had you beat other numbers that is they give you a curve of how the number of impressions that you would have won or number of times you would have been shown and the cost would change with the bid they give you a function that's on the display now one question is do we trust them that is as a bidder when you so see such a curve do you really believe it's true and we actually have a paper that I didn't even touch on here looking at whether people trust the trust what they get in here and I don't know if you know I don't know why they don't trust Google is telling me that those numbers are all correct there is some sense that people don't trust them whether it would be ethical would they be I excellent questions but there is certainly a trust issue and certainly in that place Google has a has a way of cheating them they could have given a different curve I don't I don't think they do but I guess I don't know