 Thank you, thanks to everyone for coming. This talk, we're so lucky to have Amanda Askell, this talk is sponsored or helped brought into real life by the AI Ethics Governance Initiative and then I just want to say that we're also live recording this entire event so anything you say will be forever in the scrapbook of the internet. And I think next week, next Tuesday, we're going to be having a set of Ignite talks which are like five minute really quick lightning talks by various members of the Birkenklein community and that'll be next Tuesday also at lunch here. So you guys should all come for that. Okay, so I want to introduce Amanda Askell who's going to be talking to us today. She's a research scientist in Ethics and Policy at Open AI where she works on topics like responsible AI development, cooperation, safety. Before joining Open AI, she studied philosophy, her PhD in philosophy at NYU, where she wrote at the Sun Infinite Ethics, which she's going to talk about, which we're going to make her talk about. But then before that, she had done a B. Phil in philosophy at Oxford. So in addition to that, she writes a bunch for popular audiences, Fox, The Guardian, or Quartz. So let's give her a little warm welcome. Come in this way. I don't know if anyone else was slightly horrified by the, this will be on the internet forever. I know I am, but maybe I have more reason to be. So today I'm going to be talking a little bit about AI development and in particular how to develop AI competitively without following victim to collective action problems. So I guess like a key thing that I want to emphasize is that people talk a lot about competitive development and how it could end up being really bad for the safety and security of AI systems. And I'm basically going to say that like, you know, there are lots of mechanisms for preventing that from being the case and they don't necessarily involve like ceasing competition. So that's the kind of key thesis. So here are some cakes. I promise to torture this analogy a bit later. But for now, here are some robots as well. Cool. I want to get to this little thing here. Okay. So basically there's going to be five sections of varying lengths. So the first thing that I'm going to claim is just that AI systems can be more or less beneficial to society. I hope people kind of agree with that already, but in case not, I will like go over it. The second is that AI developers in general want to build more beneficial AI systems if possible. I think that's true even if you think of the most kind of like self-interested, doesn't really care about the world, sort of ideally fictional AI developer, but I think it's even more true of like real world developers. The third thing I'm going to argue is just that competitive pressures can reduce investment in system value or at least incentives to invest in system value, even if you're like a developer that wants to develop things safely and well and that this is like a problem. I'm then going to give a kind of like very simple like game theoretic analysis of that basically to indicate that there are five factors that affect the rationality of investing in systems value in this kind of situation where there's a competitive pressure. And then I'm just going to argue that these five factors can be leveraged to improve AI development even when you have like a lot of competition. Okay. So first premise, AI systems can be more or less beneficial to society. So picture of climber, I feel like since I moved to San Francisco, I became the kind of quintessential like Bay Area person, which means that I go climbing a lot. So I'm going to use a climbing analogy here. So there's a question of like, imagine this is like your kind of system here where you've got like your harness and all of your safety equipment and you're going climbing in the outdoors. And the question is like, what makes this system beneficial? And I'm like most systems, including like in technology, but also elsewhere have like three key components to being beneficial. So the first one is just safety. This system is like, if it's designed well, it's not going to like put me at risk of like getting into an accident. It's basically not going to like fail unexpectedly. I know what it's going to do. And I know that the risk of an accident is low. And ideally I know that if there is an accident, it's going to be like one that isn't like very bad for me. And the equivalent in like technology is just making sure that your systems don't suddenly or unexpectedly fail. So it'd be really bad if I had like a safety critical system and I had like, you know, some aspect of it that just failed unexpectedly in new circumstances. This is the kind of failure that like we want to avoid. The second one is security. Now, ideally this applies less to the case of climbing, but the idea here is if I have like someone who's very malicious and wants to do me harm, my system should basically be impervious to any of their kind of like their attempt to like undermine my like my safety equipment. So in this case, it might be that like doing checks, for example, is a way of making sure that if there's been any tampering with your equipment that you're not going to like then go and rely on it. So that would be a system that's secure. And obviously in the case of technology, this is mainly things like if someone wants to steal data, I have a good way of like protecting the data of my users. If that system is working well, then I have a secure system. Finally, I want to have a good social impact. So that like a couple of natural ways in which my system for climbing can be of good social impact is that it can like not ruin the environment so that more people can come to the park and enjoy it. And I haven't like destroyed it with my climbing equipment. And also just like the route is enjoyable. Like people find it pleasant, no one finds it too hard, etc. And in the case of technology systems, I think we want to think a lot about this because it's an easy thing to overlook. You can think, well, I have this system that is safe and secure. But you could have a safe and secure system that like decreases like trust in like really reliable information sources. And that would be a social harm that is unrelated to like safety and security in the kind of standard sense. So AI systems have these properties to greater or lesser degrees as do like other technology systems as do systems more generally. So basically AI systems are in fact systems. And here is an example of where AI systems can go wrong. So this is a system where it's learning how to control this boat here. And you can see it's kind of crashing into things. But then eventually it's learning that it gets a bunch of points here if it gets these turbos. And so now it just keeps crashing into this over and over and over again. So the goal for humans had in mind here was that it would go to the end of the course. And instead what you found was that the system just started crashing in order to get these points from these turbos. So this is an example of a way in which like an AI system can you think you've designed it for this like one task. You haven't realized that actually the way that you've created the incentives or such that it just is going to like set itself on fire. So that's bad. We don't want that. We want AI systems that are better than that. OK. AI developers want to build more beneficial systems if possible. So I think that one thing worth noting is it's like strongly in companies incentives in general to invest in making sure that their products are safe. Now why is that. And I think this is like pretty true across domains. So first obvious reason is just like money. People don't want to buy systems where like it's really easy for someone to just like hack the system and steal all of their data. So that's an incentive for companies to make secure systems. People don't want systems that are just going to fail suddenly. And so that's like a reason to build systems that are free of accidents. And in general people want systems that enhance their lives. And so that's like a reason to want systems that have high social value. And typically like you know I guess like in the kind of ideal economic model this is like all you would need if you had like a perfect market with perfect information then you would end up just developing things with like the perfect level of the kind of socially optimal level of risk because otherwise people wouldn't buy it. So you wouldn't over invest in safety. You wouldn't spend unnecessarily but you also wouldn't under invest. I guess a key point worth noting there is that you can in fact over invest in these things. You know if I were to have a food production company and are to say I know I'll like double the cost of the food that I'm producing. But in in order to like because what I want to do is I want to have taste testers for absolutely every piece of food that goes out. So essentially you can guarantee that someone has in fact eaten a small corner of this food before you eat it. I think most people just would be like that's unnecessary and I'm not paying like twenty dollars for a sandwich for that to be the case. So like ideally you want to get to this socially optimal point. The second key incentive is just like litigation basically. So the idea so like liability law is a way of like ensuring that you just kind of internalize any externalities. You know like you harm me I get to sue you as a company and now like you are like intern harmed by the harm that you caused me. So like a standard example that's used here are things like pharmaceuticals that go wrong. So you know kind of very famously drugs for morning sickness which caused birth defects and led to like lawsuits for like millions of dollars against the companies that produced those drugs. The third one is regulation and you'll notice here that I have like an umpire rather than like something that's from like I could have had a government building or something like that. The reason for this is just that regulation doesn't always have to come from government so that obviously can and governments are these kind of like external actors that can like force kind of safety regulations on an industry but also loving industries do engage in things like self regulation and in that case it's like this kind of regular regulatory incentive to develop safely. Finally I think that companies in general just want to build things that are in fact good and I think it's always worth like remembering this like when people join companies they don't do it thinking oh well I don't really care that I spend my nine to five like working on this system that I think is going to like harm a bunch of people I think most people like couldn't do that and so for the most part people in companies are actually compelled to build products that are like good for people good for their users and they do in fact care about this. I think that in many ways the way that we're kind of designing society is so that we don't need that but it is an important component. Okay so now we're going to do the fun kind of game theory bit where I like torture the cake analogy quite a lot which I was really excited by so I hope you guys like the tortured cake analogy I guess. Okay I also kind of like baking shows and so maybe this is just me going a bit overboard on that. I swear this has to do with AI and I will like bring it back to AI at the end but maybe I'll be like think about this as like a potential development contest between two developers. So imagine this imaginary baking show there are two bakers involved in this baking show and they are going to like cook something like they're going to bake a cake and the way that it works is that the bakes are scored from zero to five but the first baker gets this advantage so the first baker score gets a kind of plus two at the end so if you get a bake of three then at the end if you're first you're going to get like a five overall. Now the time ends as soon as the first baker is finished so you can see this is basically trying to model this like first mover advantage if I'm the first to market I get the product out but like that's it like suppose that we just like look at what people get at that point what each baker gets so they get their score times a hundred dollars so if I bake something faster than you do and it's like a two out of five then I get four and you get whatever the score is of your current bake so if you're like halfway through and it's terrible then you maybe get like a one and then I get four hundred dollars you got a hundred dollars so this is like kind of replicating something akin to a simple version of like a development race I want to be the first out with the product because I get more but I also want a better product because ultimately if I just rushed to market with something that's terrible I'm not going to like I'm not going to like gain anything so in the case where there was no race you know if I wasn't trying to race against you I would just take all of the time and I would make this like five out of five bake I would get the plot like maybe I'd get the plus two if for some reason I'm still getting the plus two for like winning and then I get like seven hundred dollars so if there was zero competition I just like you know wouldn't I would in no way rush but the baked grace does give me an incentive to reduce the quality in order to get faster bake so this is the basic idea behind ways in which competitive pressure can make things go badly wrong if you don't have enough incentives to actually make things safe so in this competition I could do I'm going to call cooperating which is I can use up all of the time and I can make a perfect bake or I could defect I could bake as fast as possible and make a mediocre bake so in the case of like an AI system this is going to be like spending a lot of time trying to make sure my system is socially beneficial or just trying to get it out to market as fast as I can now if we both have the same so we each get a fifty percent chance of winning like let's to make to keep this simple you can you can work this out in like more complicated games but like I just want to get the kind of simple thoughts out so suppose I just have a fifty fifty chance of winning if we have different strategies though then whoever defects is going to is going to like finish first if you just decide to make the mediocre bake you're just going to beat me if I try and make a really good fancy like cake with lots of layers and stuff if I could do that which I can't so now I'm imagining a case where there's Anne and Ben and these are the two people who are competing the idea is okay like so if they both cook an excellent bake then they each have like a fit they each have a fifty percent chance of winning but they're both guaranteed to have like a five out of five bake and so they don't know which will come first there's that plus two and so this is distributed equally among them in expectation so you're like guarantee of five but you don't know who the extra two is going to go to now if Anne does a quick bake and Ben does a slow bake hopefully I actually have a little pointer oh no I don't okay never mind I don't want to Skype call someone that would be bad then she's guaranteed to then Ben is guaranteed to win and so Ben is going to get a payoff of four because his bake is a three out of sorry his bake is a two out of five but he's getting the extra two I hope I did the numbers correctly here I often like change my numbers last minute and then they become wrong so I know everything that follows is correct but if I use the wrong number earlier when I talked about the score of the bad bake then I apologize but Anne is guaranteed to get two because she's going to be last she has this two out of five bake and so she doesn't get the extra two points now if they both rush they both have a 50 percent chance of getting the extra point because it's a 50-50 chance of who will win but their bake is a two out of five so they each get three so in this game like many games it depends a lot on what they think the other person will do so here Anne should cooperate and she should go for the excellent bake if she thinks there's at least a one-third chance that Ben will and you can see that Ben is in a symmetric situation so Ben should also excellent bake and take up all the time if he thinks that there's at least one third chance that Anne will and for the sake of this we can just imagine that they have to put the things in the oven at a certain time and so there's not like a there's not a chance of like last-minute defection but I can talk about that in questions if people have thoughts there okay so again in the no-race case we would aim for a five out of five bake but in the bake race if we think it's pretty likely that the others will race then we should go for a mediocre bake so this is the thing that's like concerning about competition with respect to things like product safety so what if it's the case that I'm just really confident that other developers are going to kind of under invest in making sure that their systems are socially beneficial it's like well then I'm in a collective action problem we'd all be better off if we engaged in this race over beneficial AI systems we find ourselves with individual incentives to instead defect and just try and build something really quickly and so this is unfortunate it's basically a tension between like the group interests and individual interests and so in this case responsible AI development can become a collective action problem when you know you're confident that the other person is going to race or basically just as a result of competitive pressures and I'll talk a little bit more about that cool so in the case of AI just to like really bring it back because I've been talking about baking quite a lot but let's just say building AI is like baking a cake I guess where when you cooperate you're saying we're going to make sure we invest in making our systems beneficial when you defect you say like no we need to develop faster and we can do this at the cost of making like a worst cake or in the case of AI like making a worst system knowing that it might be like less secure than it otherwise could be etc cool okay so that was literally just exactly the same thing but showing you how this analogy works and again people maybe like look you're doing something that's a one-shot game and I'm to those people who have that thought if anyone I'm like yes we can talk about non one-shot games later it's just harder to represent them it's hard enough to represent these ones okay so a fun thing that falls out of this I think is actually really important is that trust now actually has like a real kind of like monetary value and maybe it's like I don't mean to like that I don't think that's like dismissive it means you should actually be investing a lot in making it the case that you and Ben can work together so if you're baking and you can like spend some money to like go out for drinks with Ben really chat with him about like how he's going to bake tomorrow like find a way of like gaining each other's trust that you're going to actually follow through in your actions you should be willing to invest that so the idea is like if I have a zero percent confidence that Ben will cooperate then my expected reward from the baking competition is three hundred dollars if I can move that to 100% or in fact even if I can just shift it above the one and three I start to like again money all the way through but simplest case I move it to 100% and I get $600 so the idea here is like in order to like move my my confidence in Ben from zero to 100% I should be willing to spend like several hundred dollars because like it's just worth it and ideally it wouldn't require that hopefully Ben is like a good person but you know like I don't know Ben I haven't gone out like for these drinks with him yet so maybe I'll spend like $60 on those drinks okay so in this kind of example and I do think this kind of this generalizes there are these five key factors that affect the rationality of investing in system value even under this kind of like competitive pressure and they're actually the these are five factors that things like I'll talk a little bit about like market incentives and liability again because they all lean on like one of these factors at the very least so five factors that increase the rationality of cooperation so having a higher expected value to mutual responsible development is one such factor I'll go through each of these and give some cases so we don't need to focus so this is the case where the better like you know if I suddenly scored out of ten and you could get ten out of ten bake and you know the bad bake would still be a two out of ten then this would increase your your incentive to cooperate with Ben I'll show that in a second lowering the expected cost to unreciprocated responsible development so if I develop if I develop responsibly but I don't expect that that's actually going to like harm me very much if the other person turns around and like defects then that's going to increase my incentive to cooperate lowering the expected value of not reciprocating responsible development so if it's the case that like the the bake that I make is like worse if I don't bake responsibly in this case or build AI responsibly then this like decreases my reason to defect lower expected value to mutual quick development so if we just think that the cost of building these systems that don't work very well is like really high and quite distributed then that's another kind of factor that's going to mean that I should actually cooperate with agents more and having greater confidence that others will reciprocate responsible development is a final factor that's really important when it comes to increasing the expected value of your AI systems and I think it's worth just emphasizing that like in many cases companies are like you know competing with one another over maybe it's good time to give a kind of analogy here but companies are like often competing over how safe to make their systems you know this competition is occurring in like car manufacturers it's occurring in like plane manufacturers it's just that we think that the point at which to which they are competing is not one that's bad so plane safety is like very good maybe like recent events under minor confidence in that a little bit but car safety is also like extremely good and the idea is that we have just created the system of incentives so that the point that they're racing towards is like ideally as close to socially optimal as possible it's basically like people don't want to spend more on safety they're really happy with the safety of their cars same with the safety of their planes so I think when people hear something like competitive pressure pushes this down it's like we should only be worried about that if it's if it pushes it below a point that we think is like kind of acceptable and by like using these levers we can make sure that that point remains kind of high I'll try and talk a little bit about so this is just like to illustrate the things that I gave I hope you just trust me on this I did go through it so hopefully these these are kind of correct so this was the earlier case this was the original bait case where the trust threshold was one-third I had to have greater than a third confidence that the other person would reciprocate so now factor one to mutual development now the trust threshold has gone down to one in seven is if I think there's like a greater than one in seven chance that he will reciprocate then I will or that he will in this case build a beneficial system then so will I the second is lowering the expected cost unreciprocated responsible development so that's like making this a little bit better it's a little bit better if I build the really safe system and the other person defects on me and builds the bad system and in that case the trust threshold is lower to one-fifth the third factor was lower the expected cost to unreciprocated responsible development so I developed responsibly is this today accidentally oh the third one is not that ignore the one at the top I knew there'd be an error in these there always is okay so this is saying reduce the benefits of defecting against the other person so my bake isn't actually going to be that good if I defect against you you're a sufficiently good baker that if I try and bake quickly against you I get like a one in five and so this reduces me to a three and here the trust threshold is down from a third to a quarter finally um lower the expected value of mutual quick development so if we both defect against one another suppose that we each like therefore make like a much worse bake so we only get two we get scored at like one out of five again the trust threshold now is zero so now it just doesn't make any like basically you're like I don't care how confident I am like I don't you know like that Ben Ben's just not has no interest in defecting neither do I we should both just like trying to do the do the best we can in the in the baking contest or in the eye contest we should build responsibly so excellent we have these like things that can affect responsible development cool these fact factors can be leveraged to improve AI development so before I go into this I think like I'm kind of showing you this like abstract thing of like here are these like factors that make it more or less rational to develop something responsibly and they apply pretty generally um I think that sometimes people still get this instinct of like ah but like companies are just super cut through and like they just like want to you know I don't know outcompete one another and this is like really pie in the sky thinking um and I think to that I mean one thing that's interesting uh or that's worth noting is I'm talking about collective action problems with negative externalities here you know so this is a collective action problem where um it reduces the safety of systems and expectation there are some collective action problems that actually have positive externalities um you know like so when people fail to collectively act it actually is to the social benefit of everyone so like a really standard example of this is going to be a price war between companies so if companies if I'm like a large like drinks corporation and you are the other major large drinks corporation and there's basically no competition in our field and I'm like well look like people want to buy this can of like I like don't want to say any names because I'm like oh gosh what if I was like I'm not implying that anyone is colluding but like I want to increase the the cost of a can of soda to like ten dollars a can um would you like to also do this I can only do this if you if you also raise the cost of viewers and then we'll still get about half the market each but we'll each be making ten dollars a can um if it's the case now that's a case where we are cooperating in order to like get more money from consumers and doing that is like in violation of antitrust because like it massively raises the cost for consumers and the idea here is that failing in that collective action problem i.e. defecting against one another is the thing that drives prices down um you know like in that case if I raise it to ten dollars you have like a pretty good incentive to like be like I will make mine nine dollars ninety next month and then I'll be like well I'll make mine nine dollars eighty next month and that's exactly the kind of defection pattern that you see in like a price war but the really interesting the really interesting thing is that we have an entire legal framework to make sure that that kind of coordination doesn't happen so if you think that like we should be really pessimistic about coordination between industry to like make sure that they are developing safely which is like what happens in a lot of self-regulated industries but also elsewhere um it's like we we think that like when it's in people's interest to coordinate it's often such a kind of strong incentive that we actually like have to if it's if it has negative rather than positive externalities we actually like make it illegal uh so like that's the kind of case for like I'm not giving you a pie in the sky thing and I think that this is like genuinely quite possible but the AI industry is like young and so in many ways still has to like develop a lot of these mechanisms that other industries have already like developed over time and I can talk a little bit about that later okay so just to give a kind of overview of like some levers that would increase cooperation if like I'm correct and I think this list is huge it's like very kind of under explored and I think this should be like a kind of major area of research one is that you can try and distribute the benefits of AI among developers more um or just like correct misperceptions um about the long-term benefits of AI to everyone so lots of industries don't actually have like an incredibly strong first mover advantage in fact some in some cases there's there's like sufficient first mover disadvantages that it like is actually better to be a kind of second in the in the development race but I think in general this could involve like agreements to distribute the benefits or just basically ensuring that your systems are actually like such that other developers can build on them for example correct misinformation about system safety among the public and regulators so a large like point of failure in the mechanisms that I talked about earlier is like just if the public doesn't know how safe or secure the system in question is if I have no way of verifying how secure you are being with my data then I can't like really like vote with my dollar and like not purchase your system I just don't have the relevant information and if I'm a regulator and I want to like come in and look at how secure your system is and I just have no way of knowing like I just don't have the expertise or it's like really hard to find out this information then like that similarly is just going to mean that I can't really enforce regulations that say that you have to have like a very secure system giving access to shared resources conditional and responsible development so this would just be a case where it's like yeah like you are more willing to like say work together with people who are like agree to like develop AI responsibly that agree to like sets of ethical principles and that can kind of demonstrate that they have done so so this is like a kind of like positive bonus and I should say that all of these fall out of the five factors make it easier to verify that AI is being developed responsibly so that's a way of like reducing the likelihoods that someone is going to kind of defect in this way so that might just involve like finding ways of just like verifying that people are making sure that they have like you know a bunch of people employed for things like security work it could involve like more rigorous things than that but this is just a kind of example consider frontrunner mergers to reduce harmful race dynamics so this is something that like OpenAI has it has it discusses mergers in its charter it's like a little bit of a kind of unusual principle but it's kind of interesting to me where the idea is that if you have like if basically being a responsible development developer is like a large advantage then you actually like massively reduce the cost of someone else defecting on you because you're like look you know people want to buy my like safe secure awesome system I'm not super worried about this person far away in second who is like building a much worse system because they're just like not going to develop at first and people are not going to like want to consume it and so in fact like you know doing that kind of like reducing these like head to head competitions between frontrunner developers is like one thing that can also help with this kind of thing finally like in fact I think I might have a couple more but developers can make trades based on each of their concerns if I'm really concerned about like data privacy and you're concerned about like something else like misinformation then maybe we can like a great making like you know it can be kind of like this thing where each person tries to express their concerns and most developers try to cover most concerns and that's another example of like cooperation between developers building trust between developers by increasing communication and collaboration so collaborating on socially beneficial projects is a good example of that and there's like you know less of a concern that that's like about specific systems you're developing that's more just about making sure that the social impact is good and creating economic incentives to develop AI responsibly so that could be an industry or it could be like through a regulator cool oh those failed that was like a mistake slide okay so one thing that's like worth noting at the very end here is that a lot of these things have really interesting feedback mechanisms so in the case of trust my confidence that you will cooperate with me increases thereby increases your confidence that I will cooperate with you because you know what increases my incentive to cooperate with you but similarly you can get these negative spirals where if my confidence that you will cooperate decreases then your confidence that I will cooperate decreases and I think this is really worth noting because historically we've seen this kind of thing play out just through misperception so if you look at like the Cold War for example the US and the US SR like in some cases just misperceived the actions of the other in a way that caused these kind of negative spirals so in the case of like global politics this can also this can obviously be extremely dangerous but it can also just seem kind of wasteful it's like really unfortunate if one person does something that seems to indicate that they're not cooperating and the other just like says ah you're not cooperating so I will then cease to cooperate basically forgiveness is like a really important principle in these sorts of games so you forgive and then you get this positive spiral so that's like the kind of importance of trust I think I've talked a lot about a lot of mechanisms here there was a lot of things that I wanted to cover that I didn't so hopefully some of those will come out in the questions but that's the end of my talk I think we can just open it up to questions and I'll pass around the mic right so there was a lot that happened in this talk it's packed full of information I was wondering you sort of have this it's a little fuzziest like what the concrete outcome is here like are you talking about AI development while we're still like attempting to reach some sort of AGI or are you talking about sort of the final end product of of like oh weekend weekend sort of through this game theoretic principle we understand that that the final AGI will be you know a safe AGI yeah so there's a question of like let me know if I've kind of misunderstood but one issue is like when you think about development races this could mean one of two things it could mean like there's like two kinds of races often you know one is like a race to just reach some end point so maybe like AGI is a good example of like an end point I don't think it necessarily is actually a good example of an end point but like let's suppose it is or you can just be like racing to stay ahead and develop kind of like ever more advanced systems I think that a lot of this should apply to either case you might have certain concerns about so like in the kind of case where you're thinking of something like AGI and you're thinking okay this is like some kind of like extremely like powerful general system you could have this dynamic where like you're incentivized to kind of the concern that you might have is in a race to like stay ahead you can be kind of increasingly like incentivize this effect towards the end and the reason why I'm making this argument is that if you have a throughout like the development process put into place the kind of mechanisms that make sure that development is safe then basically defecting at the end is less costly to everyone so the idea is this is about like how what you do over the course of development with the hope that you then ensure that like regardless of what happens at the very end you've made sure that things are safe now that's not like completely guaranteed but that's the kind of like motivation behind this sort of like action so I guess cool so then my sort of follow-up question to that then is how do you deal with the fact that like sometimes companies are not playing the same game although they they like may perceive themselves to to be doing so like there's some sort of like non-compliance that is unknown and and like it seems to me that they're I don't know yeah I guess that's that's probably yeah no I think that's right so like this is it's one of those things where you want to like make sure that you bring all developers ideally like make it transparent what game that they are playing or like you know so think like trying to make that information known so people can like a lot of this can be based on just like people can think that the incentives are different from what they are you know so if I perceive the benefits of like advanced AI systems to be like really great then like I am playing a more cooperative game than you are if you just think yeah there's some benefits but like they're not that they're not that high I say and in that case I think that's just like breaking those information asymmetries is really important and ideally also finding ways of just like verifying so like trust is one you know trust is like confidence that the other agent is going to cooperate with you you can really reduce the need for trust in a sense if you can just like verify whether they are cooperating with you and so some of the mechanisms I didn't talk about that people might or didn't like explore as much are things like just basically seeing that you are in fact doing what you say you're doing and the more you can find mechanisms for doing that like the bear I think things are going to be Apple and Apple are yeah are you know both producing AI technology but one of it one is incentivized through like advertising an advertising platform and the other is not incentivized through that they're incentivized through like hardware sales but they develop they both develop AI technologies they potentially like compete in a game that looks fairly similar but they're incentivized in different ways like how do you take that into account yeah I think it's going to depend a lot on the particularities of the game and there's like so many different components here so one is like we have shared incentives to cooperate if for example there's like shared reputational effects so if a huge data leak at Apple is like bad for Google then in many ways like it's costly for one major entity if like the other entity is not developing things that are securely that are secure but it also depends a lot on whether you're competing over the same domain so like if I am like trying to build something in advertising and you're trying to build something in an entirely different market then I think that it's there's less basically we're not even in the kind of game like we're not even competitors really in this sense because we're not competing over the same kind of like market share so it has to it has to be the case that like I'm incentivized over the same sort of like product so yeah like I don't think that fully answers your question because I think you have a specific scenario in mind but I'm also totally happy to like work through that layer other questions Jesse I go to Columbia for machine learning I was hoping you could go into the idea of it not being a one-shot game especially an iterated game with a fixed endpoint of one person going to market yeah so like the case of an ongoing game in some ways is like better so in like one-shot games are and there's a part of reason why I like focus a lot on like the kind of worst case scenarios because it's sort of to try and outline that like the worst case scenarios are potentially not as bad as people are thinking so like a case of an iterated game I can get a lot more information about like the intentions of the other developer and what they're doing and I have this like large period of kind of like reputational effects so you know an example is like iterated prisoners dilemmas are just like famously like much better than one-shot prisoners dilemmas so prisoners dilemmas I just have a complete incentive to defect I don't have any incentive to cooperate but it's actually not clear that the game structure of a one of a prisoners dilemma played repeatedly is the same as that of a one-shot prisoners dilemma in fact in many cases it seems like there's an incentive to cooperate because I give you information when I cooperate namely like that I'm going to cooperate with you you can give me reciprocal information so we can build this kind of like trust component over time so I think that there's actually a reason to be more optimistic in the case of a kind of like iterated game in the case of like you're not like going to market I think that the dynamics are going to be fairly similar if it's the case that you are kind of racing to stay ahead because you can think of that as a kind of series of points that you're trying to reach where you're just trying to like make sure that you continue to like develop before someone else and in that case I think there's still going to be an ongoing incentive to do that securely if it's the case that you have this kind of dynamic but I'm not sure if you were more interested in that point or in the kind of iteration point yeah I think the idea of ensuring that it's not a fixed end point seems like the way out of the problem I hope yeah the problem can get a lot easier in that case although you know you might get I think it's still interesting to look at these kinds of dynamics both in terms of like it makes things like a little bit worse but also because you might have points along the way that you're kind of like so even when you're racing to stay ahead it's not clear that developers when they do that don't end up kind of like competing over like having some slightly better system of some form than the other and so I think it's still kind of useful to think at least in terms of like end points in this kind of like progressive sense but yeah I think that basically you improve the game by making it not a game where there's this like super fixed end point because then I'm just like that incentivizes me to like defect at the very end whereas in a case of like an ongoing game I probably have like less of an incentive to do that and also I get these like good kind of reputational effects so I do think that like this is like a worse kind of game than the iterated game but that all of the factors that apply also apply to the iterated game which is like kind of interesting or helpful so yeah maybe an example of this would be in the iterated game imagine it's the case that I can you have like a certain like amount of like so you and I are competing to stay ahead but we can make more or less secure systems and those are like more or less valuable if I continually get better payoff from my ongoing development of more secure systems then this gives me a reason to like cooperate with you and if I can increase that payoff I have more of an incentive to continue to try and develop those systems with like a high level of security so that's like an example of where the first factor can like still increase overall like security of the systems even though we're not actually competing to a specific end point I might throw out a question so my name's Nicole I'm staff at the Bergman Klein Center so I think we host a lot of different talks here about ethics and governance and not often do we go down the route of the games which I think is is really interesting but for me I'm unfamiliar with it so I'm trying to situate it in relationship to some of the discussions that are going on about how to think about this space and I think something that like I would love to hear how you relate it to this would be concerns that I think are hard to situate relative to the incentive structures that you're talking about between these like competitive games so I'm you know we've heard from Virginia U-Banks talking about the development of systems for allocation of public resources and we've heard from others talking about the ways that data can sort of contain bias or contain artifacts that kind of have impacts in the world that it would be hard to see how they would relate to like the market competition around AI and so you know you've painted a kind of optimistic picture but I'm wondering like which which concerns you're optimistic about and if this structure that you're offering really addresses some of those others yeah so in many ways like it's optimistic because I'm like hey there are mechan you know like the first thing to say is like there are like potential failures here that there might not be in other industries so I think a really good example of this is the case of things like bias in data sets so like in many cases when you develop AI systems the people who would be harmed by those systems are not the same as the people who are purchasing those systems and so this can lead to the kind of asymmetry that doesn't let market mechanisms kind of like cause you to bear the cost of that harm so imagine a case where like I'm trying to decide like how much jail time someone should get and I release a system that's really good at just telling judges like you know this case given all of the facts you've made available to me here's like my I'm just like a recommend I just recommend like how much jail time to give someone when you're when you're sentencing and you find out that system is like hugely biased the person the agents that are like using that system I like the local kind of governance structures like whoever decides to implement it and whatever at whatever level are just not the same as the people will be harmed by it and also in a lot of cases that people are harmed by that system are fairly disenfranchised so it's not like easy for me if there is as it turns out bias in that system and I'm like given an overly punitive jail sentence to then like take this system builder to court and like and sue them for this and so that's the kind of case where you can see a real failure in some of these mechanisms so I think the main thing I'm arguing is basically that like in and on a lot of like my work on this I actually am like AI is a special case where a lot of these standard mechanisms either don't apply or they fail so like currently it's like a fairly unregulated market so you're not looking at a case where there's an external regulator there's a huge amount of asymmetry in like who's purchasing the systems and who is like potentially harmed by them there is like a huge asymmetry in information between regulators and developers and between consumers and developers and that's like another case where you don't expect market mechanisms to really be able to help you and currently there isn't like an all like you know we do have like the liabilities framework but it's not always super clear how that's going to apply to some of these systems where we can't know like how they're making these decisions for example and so in many ways like there is actually this like huge problem which is like all the incentives that I gave you at the beginning apply a little bit less well to AI and so I think the kind of key thing that I'm pointing to is like actually we have an incentive to build those structures and that's like the important factor and so like AI is this like novel industry to many degrees and I think novel industries often make mistakes of this form until they basically learn to coordinate so maybe I can give a couple of examples of that so one is this like the kind of scooters the electric scooters in the Bay Area where I could be getting the kind of facts wrong but I think that what happened was they were basically like originally working with the city about like you know where to put them make sure they're safe et cetera and then I think a company just like put their scooters like on the street and like and you know the city hadn't given approval et cetera the city like responds by just banning this like banning electric scooters and you're like well that's no good for like for and essentially that's like an act of defection that's like then harming everyone and so then people like work together like you know I think that then some of the companies have worked with the city and are now like allowed back in the city and so that's like an example of like a novel industry basically making a mistake I think and a lot of industries if you look like when I have done some kind of case studies make mistakes initially that are in fact harmful and then they learn from that and they improve and I think that in the case of AI these systems are often so kind of general and important that I actually don't like that dynamic I don't think that in AI development we should be like producing systems and then learning from mistakes in the world and then like you know like being like oh actually we should coordinate to make sure that we're all developing safely I just think it's like too costly and so that's why I'm trying to like preempt that and be like let's not be like a novel industry let's kind of like bypass that stage and just start building these mechanisms now and learn from other industries instead and you just see this like in other cases so like I think it was like the pharmaceuticals industry so when you look at like patent medicine sellers they came along and pharmaceutical industry exists and patent medicine sellers come and they just start directly selling these like medicines that don't work to consumers you already have the pharmaceutical industry who are selling to doctors and I think that they actually ended up supporting some of the legislation that eventually became the existence of like current drug laws like the FDA as the kind of like ultimate entity there and in many cases and you're seeing this in AI as well like a lot of AI companies are now basically coming out in support of certain forms of regulation of things like facial recognition technology and so I think it's this thing where I'm trying to say like there are actually big problems here but like as an industry like AI developers can in fact just kind of like bypass some of the mistaken bad phases and just like move to these phases where we instead like develop the mechanism to prevent it from going wrong so hopefully that kind of ties it in a little bit to the case that you were thinking of and like why I think that there is an initial problem but I'm like maybe a bit more optimistic about finding solutions Yep, I had a question about the the five principles you put up Yep, I can go back to them if that's helpful So the first four of them kind of corresponded to the four states of the game that you described Yep and I realize this is not the most effective one just from a they're somewhere yep there we are and then the so the first four corresponds to the four in the square yeah and then the fifth one kind of seemed like it was an outcome of the first four so I was just wondering if I'm thinking about that correctly or if there are ways maybe to intervene on that greater confidence factor that are not through those first four levers Yeah, so this is like I did a bit of a kind of cheat here because in many ways like I'm treating especially like it makes it look like I'm treating trust as this like external thing and it's just sort of useful for is useful for me to do that to talk specifically about mechanisms that can improve trust but then you'll look at a lot of those mechanisms and they're really applying via these incentives and also if you actually look closely at a lot of these incentives they affect each other quite a lot and you know so like also I mean another I'm just going to reveal all of my cheats here because I'm like look at the fact that it's like expected value so that means you can adjust these things not only by changing the actual value of the outcomes but by changing their probabilities so like there's a lot going on here that I sort of like mush together but yeah trust is like highly integrated with the other factors and in some cases you might just think falls out of it I think for that to be the case you also have to have this premise about the agents that you're kind of working with so if I just think that you only respond to your kind of self-interested incentives then the way that I would like make the kind of trust threshold like the way I would get greater confidence that you're going to reciprocate is by changing your incentives and I agree that that is like one way that I can that I can do this like that does in fact give me evidence that you're trustworthy if it's just like not in your interest as much to defect but I think that like maybe this is like we should also think about the ways that are not that don't seem super incentives driven to like increase confidence that others like will cooperate so in many ways I just think that like if I just get to know someone a little bit better get to know like the kind of person that they are that's a way that I have more trust that they're going to like reciprocate my like my own responsible development or in the case of cakes my own excellent baking but that's maybe not as covered by these like factors here so it's like I do think that like yes each of these relates to one another but if but we should also not model agents as these kind of like home economics or whatever and we should think yeah I know it's also I just get information about you as a person and that kind of thing can also just increase trust Hi I'm Stefan I have two questions I guess the first one being how much you think of your analysis applies to nation states as opposed to like commercial developers yeah and the second one being how confident you are that game theory in general is the right framework to think about yeah AI development and how to model the relationships between actors in that space yeah yeah so on the first question you know like at the moment like most of the key developers in the AI space are companies that's why I start and a lot of like literature has like explored these dynamics with companies because companies care about making money and like it's probably a good source of funding I think that with nation states so suppose that it depends on what the nation state is doing so on the one hand it could be facilitating AI development on the other hand it could be like developing AI itself as like part of some program I think that a lot of the same incentives apply you know there's like the same kind of like potential for collective action problems between states there are a few like disanalogies that are sort of interesting I guess so in the case of companies the absolute gains that they care about are just things like making money and I think companies are much more driven by absolute gains like I don't really care if my competitor is like doing better than me if I am doing better overall than I would be otherwise so like if I get like I'm like if I get $40 million of a market and my competitor gets $60 million of that market but we developed in a way that grew the market to $100 million I would rather that than a situation where the market is $50 million and I have 30 and my developer has 20 I just want the like I just want to make more revenue if you're a state like at least a lot of the standard thoughts are that states are much more compelled by relative gains so things like having a relative economic advantage over other states and the more like you're motivated by relative gains on this framework and I think just generally the more you're going to be kind of adversarial and the less willing you're going to be to cooperate so that's one factor is like basically how incentivized are states by these relative gains I actually think people maybe this is like a really interesting question because for states there are these like large bundles of relative gains large bundles of absolute gains and I think it's really easy for people to say all states care about is like power and I'm like I don't think that's actually accurate I think they care a lot about like lots of factors like just general economic development the prosperity of their people as well as like their relationships with other states that are all of these like kind of mixed incentives so it does apply but that's like an asymmetry like what are their incentives how much of them are relative or absolute I think a second key issue is that in the case of private developers there's very often this third party which is the state and the state actually has this like ability to like come in and like regulate you for example so if you're just doing things really terribly like a regulator can just come in and be like you need to stop and like I'm just not going to let you continue so this ability to like appeal to third party oversight is like really useful and it's not something that states have as much of so there are like international bodies but in the case of like international oversight you might just think it's like a little bit less robust than the case of like a state that can just come in and be like you need to stop operations there's not really an international body that like does that to states but there is also like it is also worth noting that there is that body but if you look at like a lot of the mechanisms and things like that they are also very similar so if you look at like arms control like this like fits really nicely into this like kind of framework so it's this is like a long way of saying like when I think about states there are a few asymmetries but there are also like symmetries that should mean that we should maybe want to appeal to similar structures the second case is like how relevant is like game theory in general so the way that I think about what I'm doing and maybe this is like kind of wrong is but you know like you see here I gave like a one-shot game I kind of like you know I've given you I've revealed all of my cheats here and in many ways I'm not doing much more than this like super basic analysis of like what are the rough incentives here and basically I think that my concern is that the narrative is sometimes at this really early stage of like everyone is just fighting against one another and that's going to mean that we're all going to develop things that are really terrible and I'm like let's just move that like one step further and be like actually when you start to just analyze these really simple games that's not actually true you do in fact have pretty strong incentives to like cooperate with other developers to make sure you're developing responsibly you can also take that to like this further stage of doing a full kind of game theoretic analysis and I'm like I like game theory and so I would love I like I want people to do those analyses especially because they're then more realistic they're like they're across time they think about number of developers they have like many more variables than I have here and I do think that's useful I think the main goal that I've had though is just like kind of communicate these ideas to people and communicate this like less pessimistic picture and so I'm not saying something like oh this game theory is like super advanced and useful it's just kind of like hey look even on like really simple models or just like simple kind of like naive ways of looking at what's going on here we should be a bit more optimistic so yeah it's like I'd love to see the game theory done I would love it if people like listen to it but if they're you know like ultimately like I want just people to kind of like think about the things that are ideally like the most important and just like one step beyond the kind of like naive like super adversarial framing