 Good afternoon stage B My name is tear father Gill and I will be your herald for a little while My main purpose at the moment is to introduce you to the absolutely brilliant Rob Miles who will be talking about AI safety Thanks Hello everyone. I want to clear up a couple of things first off Or you know start start off with some things I want to first off say that I am not an actual expert on this subject I have a YouTube channel where I share this research Because I think it's important and I think it's really interesting There aren't very many experts in the world. I'm not one of them. I'm also not Rob Miles the lecturer from Hull University. He's here somewhere. I think I'm not him if you're looking for him He's around anyway So and we don't have quite as much time as I was hoping so we're gonna move fast This is the question that I am trying to get people to ask themselves with this talk What is the most important problem in your field? Take a moment to think about it And why are you not working on that if you're not working on that? So I'm gonna argue that I think the most important problem in the field of AI and Possibly in the field of computer science possibly in the world is AI safety So why what's the risk here from AI? Pretty much you can divide it up into four categories along two axes, right? You have your short-term stuff your narrow stuff This is the kind of problem you can get with AI systems as they currently exist or in the very near future and this varies by by Misuse or accident right it is this somebody using this technology With the wrong aims or is this somebody just making a mistake and the technology is doing something that's unintended So I've got some examples there. I don't have time to go through What I'm mostly interested in and what this talk is mostly going to be about is the long-term risks or the general AI risks and in this case I feel as though General AI systems are so difficult to Understand and control that It almost doesn't matter who's using them So the risks from misuse are quite similar to the risks from accident because the chances are it's very unlikely that Somebody's gonna make general AI and use it to do something bad It's more likely that it's going to do something bad entirely of its own accord So that is what I think is the most important problem. Thank you. Wow. That's a lot of water. Okay So This is this is like the statement of the problem I think that we will sooner or later build an artificial agent with general intelligence And I'm gonna just go into each of these things and explain what I mean by them. So first off sooner or later What do I mean by that? This is a survey from a bunch of AI experts These are people who published in major AI conferences ICML and nips and they were asked to survey about when they thought we would achieve Human level machine intelligence is what HLMI there So that's a that's a system that's able to perform any cognitive task as well as humans or better And so this is gonna work. Yeah, this is the outcome point with this So if you look at the point at which they think that we get a 50% chance of having achieved that this red line is their sort of aggregated Probabilities that comes down about that didn't work about 45 years from 2016 But then if you're looking at let's say a 10% chance that comes in about nine years, so We're talking about that general kind of timescale worth taking with a huge pinch of salt because if you ask the question very slightly differently You get numbers more like 120 years, but in general we're talking about that that that range and ultimately This is a serious problem if it happens in 10 years I mean it's an impossible problem if it happens in 10 years. It's a serious problem if it happens in 45 It's still a serious problem if it takes 120 or 200 Because and I think that we will get there sooner or later, right? We know that general intelligence is possible It's implemented by the brains of humans Brains aren't magic. We will figure it out. This is going to happen eventually So when I when I say an artificial agent, what am I talking about? agents are a An abstraction from economics and like game theory decision theory that kind of thing And they basically they just things that have goals and choose actions to further those goals So like the simplest thing that you could think of as an agent is something like a thermostat It's kind of slightly absurd to think of something so simple as an agent, but it's sort of the simplest possible agent So it has a goal which is for the room to be at a reasonable temperature And it has actions it can take in the form of like turning on the air conditioning or turning on the heating or you know Turning them off adjusting the settings on them or whatever it has some actions It can take and it chooses which action it's going to take in order to achieve its goal It's like extremely simple a more complex agent might be something like a chess AI Where it has a goal of winning the chess game You know if you're playing white then your goal is the black king to be in checkmate And it has actions in the form of moving the chess pieces and again chooses its actions to achieve its goals You can think of human beings in the same way and this is how it's usually done in economics. You think about a Goal of something like maximizing your income or a company maximizing its profits And then making its decisions to achieve its goals Quite a simple concept, but you can see how it's a useful way of thinking about intelligent systems So what does it mean to be? intelligent within this framework an intelligent agent Pretty much intelligence is the thing that lets you choose effective actions Which means it lets you choose actions which actually achieve your goals so if you have two different agents in an environment that have Conflicting goals like let's say your environment is a chessboard and one of the agents wants white to win and one of the agents wants black to win then The agents will choose actions to try and achieve their goals and generally speaking the more intelligent agent is the one That gets what it wants. It's better at choosing actions and better at directing the world towards the state that it wants the world to be in even in the face of opposition So finally the last term here is general intelligence. What does it mean to be a general intelligence? So you notice we can compare one chess AI to another chess AI But and and you might think that it would be sensible to say that The chess AI is like more intelligent than the thermostat because it's much more complicated. It's much more like sophisticated But ultimately a chess AI can't do a thermostat's job Like it's too narrow. It doesn't there's no like position on a chessboard that represents the room being a good temperature There's no move on a chessboard that represents turning on air conditioning It can't think about that problem and that's because it's a narrow Intelligence it has intelligence but only within one domain and pretty much everything that we build So far all the AI we have so far is narrow AI it does one task and Potentially does it very well, but it's not able to generalize So it's a continuous spectrum of course like if you write an AI system that can play an Atari game Then that's pretty narrow if you write a machine learning system, which like DeepMinded in like 2013 That can play a wide range of different Atari games. That's more general. It's not a general intelligence, but it's more general it's a continuous spectrum and the most general intelligence that we're aware of right now is human beings obviously human beings are very general we're able to Operate behave intelligently in a very wide range of domains Including domains that we've never seen before Domains that we didn't evolve in domains that we aren't prepared for like human beings can play chess and we invented chess Right. It's it's a brand new thing human beings can you know drive a car and we invented cars Human beings can operate in alien environments human beings Can can drive a car on the moon right you can build a car take it to the moon and drive it on the moon general intelligence lets you do this kind of crazy Unexpected thing. We're the only thing that can do that so far But you know sooner or later right So oh yeah, right so so I said I think this is the most like the most important problem But on the surface of it kind of looks like a solution right you build this agent with general intelligence You give it your goal like cure cancer or like maximize human happiness maximize the profits of my company something like that And it can choose the actions it needs to take in the real world to achieve that goal but I think it's a problem because Choosing good goals is really hard So this is this is an AI. This is so annoying that mouse cursor isn't mine. That's like that's in the video anyway This is this is an AI system built by open AI it's playing a game called coastrunners Which is a racing game? You may notice it's not really racing they trained it with the score you can see the score there in the bottom left and It's discovered that every time it picks up one of these turbos it gets some points and If it just like drives around in a circle and smashes into everything and catches fire The the turbos respawn at a rate that it can continually pick them up And this turns out to be a much more effective way of getting points than actually racing So that's what it's doing it is winning By the metric it was given and my point here is that This is not something unusually stupid by open AI like open AI is not making it Necessarily an obvious mistake here. This is just like the standard way that these systems behave and I'm gonna pause it because it's distracting This is the standard way that these systems behave and you can find Dozens and dozens of example pretty much any time somebody tries to build an AI system the first few times Their reward function or the utility function or their metric or whatever they're using their objective Doesn't do what they think it did like There was one thing they were trying to create Something that would design creatures that would run quickly So the first thing you do is you say you have like a center point an anchor point on the creature And you say I want to measure how far that's moved Right and that encourages you to like detach the smallest possible thing. That's your anchor point and fling it or Sort of unfurl like a spring and not go anywhere. So yeah, okay right now You do have to actually like move yourself. You can't just be a catapult Okay, we'll measure the center of mass and what they found when they did that was you end up with these extremely tall and thin Creatures with a big giant mass on the top that then when they're started to be simulated fall over And that moves their center of mass a long distance Like this is this is this happens all the time or you may have heard about that tetra say I that Would like play reasonably well And then just when it was about to lose would pause the game because it lost points for losing but not for staying on the Poor screen indefinitely This is the default right? This is the normal thing even in very simple situations simple games simple evolutionary algorithms What happens when we start talking about general intelligence? We start talking about the real world This is a quote from professor Stuart Russell. He sort of wrote the book on AI if anybody's studying AI and hasn't read his textbook you should So I'm gonna read this out for the purposes that recording a system that is optimizing a function of n variables where the objective depends on a Subset of size K, which is less than n will often set the remaining unconstrained variables to extreme values If one of those unconstrained variables is something we care about the solution may be highly undesirable And so when we're talking about the real world, we're talking about tremendously large values of n right the chance of getting K anywhere near to n is Zero so this is something that becomes more and more of a problem the more complex the system Your AI is operating in and if you're operating in the real world. That's the most complex system. We've got So I Forgot how I was gonna move on from here hang on a second Oh, yeah, right. So what do I mean? What is what is meant here by like setting unconstrained variables to extreme values? Like let's say you You create your AI system, it's in a robot you want it to get you a cup of tea, right? You're giving it a simple goal. You manage to specify. This is what a cup of tea is I want there to be one, you know on the table in front of me so far so good But oh no, there's like there's a priceless ming vase on a narrow Stand right next to the door to the kitchen and of course the robot immediately just smashes the vase and goes in and makes you tea You didn't tell it that you care about the vase it doesn't know it doesn't value the vase at all So obviously it's gonna smash it right. This is like the most basic formulation of the problem So, okay fine, you shut you know you shut the thing down and reprogram it and say okay now I Want tea and I also want the vase to be intact, right? And if you do that, what does it do? I? Don't know But there will be another thing right there will always be another thing That you forgot to include something you care about so I mean like here's something completely stupid It might reason something along the lines of okay now I care about the tea and I care about the vase The vase has to remain unbroken. There's a human being they're kind of hard to predict They move around they might knock over the vase so I have to like definitely make sure the human being doesn't move at all ever And you know kill you or something like this is ridiculous obviously this is absurd But at the same time if the objective function doesn't contain the fact that you should be alive at the end of all of this The system doesn't care about it right. It's not in your set K So it's not it's not interested and Even if you have a really fantastic Utility function you manage to program in the 20 most important things the hundred most important things There's always going to be a hundred and first Right because human values are complicated and not very well understood and the problem is slightly worse than that because There's this thing about setting things to extreme values. So when you're being when you're when you're making decisions What you're always doing is making trade-offs You have multiple things that you value and you're deciding how much of one thing you're prepared to trade off for how much of another thing and You know like oh, I could do this I could do this faster But it increases the probability that I would make a mistake or I could do this better But it would be more expensive for you know these kind of trade-offs the more intelligent you are of course the more creatively you Can find ways to make these trade-offs well and The problem is that if you only care about a subset of the variables in the environment You will be willing to trade arbitrarily huge amounts of any of the things you don't care about For arbitrarily tiny amounts of any of the things you do care about right the this system would be perfectly willing to Destroy an entire city or something for a point zero zero zero zero zero one percent increase in its ability to get you tea And that the more intelligent the system is the more able it will be to find new and creative ways To destroy things that aren't in its objective function or you know sacrifice them to get tiny increases in its objective function um So yeah, this is a problem Now you may have noticed that there are a few ways in which the scenario that I am talking about here is unrealistic This kind of audience. I feel like you might one of the important things in One of the important ways in which it's unrealistic though is that when the system went wrong You just like turn it off and try again and reprogram. That's how we do programming, right? That's how we do AI But that is unrealistic Because you know if you're making a chess AI and you decide it's not working how you want you turn it off It's fine that system has no concept of you or being turned off or anything But if you have a general AI It has a full understanding of the world, you know, or maybe not full But it has an understanding of the real world it understands its place in the world and your place in the world It understands that it can be turned off and it fully understands that if you turn it off It will not get you a cup of tea because it will be turned off and so it's not going to let you turn it off It will either fight you to to try and you know prevent itself from Being being prevented from achieving its goals Or if it's smart, it will deceive you, right? It will behave as though you've programmed it correctly so that you don't mess with it until it's in a position Where it's confident that you can't turn it off and then it will go after its real goal And you'll notice like this is a convergent instrumental goal, which I'm not going to have time to go into But what it means is that this pops up for pretty much any goal, right? I've chosen making a cup of tea. It's pretty arbitrary. The point is it doesn't really matter what your goal is You probably can't achieve it if you're dead so Pretty much whatever goal you give a system. It's going to display this behavior By default of trying to avoid being turned off There are a few more convergent instrumental goals These are things that you just would expect to show up Thanks to show up as instrumental goals across a wide range of terminal goals So self preservation is an obvious one. You can't achieve what you want if you destroyed or turned off Goal preservation is another one like whatever your current goal is Allowing somebody to change that goal to something else is Probably a really bad way of achieving that first goal so Systems will be incentivized to preserve their own goals and try and avoid being modified specifically in their utility function or their goal function And this is true for almost all goals Resource acquisition again pretty much doesn't matter what you're trying to do You probably can do it better if you have more resources in the form of money or energy or matter or Computational resources or whatever so we would expect these systems if they have some goal that we gave them by accident Which isn't what we wanted The vast majority of those accidents will involve Will be will be helped by acquiring resources so we expect that to happen self-improvement again. This is Like sci-fi stuff, but it seems perfectly feasible that AI systems could improve themselves perhaps by just acquiring more hardware to run on so that they you know Parallelizable algorithm if you can acquire more computing power you can become smarter. Maybe they could rewrite themselves to become better as well Very difficult to predict this kind of thing But if it's possible you would expect these systems to go for it because whatever you're trying to achieve You can probably achieve it better if you're smarter and Yeah, this is the explanation of what that we don't have time for that Ask I want to leave time for questions if I can so So yes, this is my point right Artificial general intelligence is dangerous by default. There are all of these behaviors Which we would expect to arise by default unless we specifically design ways around them Unless we can explicitly design systems that do not have these shortcomings if we just extend what we're doing now and Manage to come up with something general That could be a tremendous problem And it's kind of it's possible that we only get one shot It's possible that the first general intelligence we build will manage to succeed at whatever its stupid goal is that we gave it by accident And that really makes the problem much more difficult because it's so much easier to build this kind of Effectively malicious agent which will deceive you which will fight to prevent itself being turned off and go on to do some crazy Thing that you don't want it to So much easier to do something like that than to build something which is safe So we kind of have to beat this challenge on hard mode before anyone beats it on easy mode Yeah, so so so this is why I think this is an important problem You can see how there may well be more than 45 years of work to do here There may be more than 120 years work to do. It's actually a hard problem I've just picked out a couple of so the point is there isn't there is any field of scientific research right called AI safety, which is People trying to solve this problem. It's very difficult work because we're trying to Design safety into systems which do not yet exist, but it's not impossible work One avenue of work I'm not gonna have time to do these properly one avenue of work is Courageability trying to make systems that don't have this convergent instrumental goal of preventing being shut down and modified There are some interesting designs there, but nothing which we're confident would definitely provide Courageability it's like a property that we're aiming for in AI systems Value learning is a way of getting around this idea of having to program in the values of the system like where you have a hundred things And you forget the hundred and first So you have the system try to learn the values of humans by observing them This sounds simple, but actually there's no proposal right now that would work. They'll have weird Edge cases where if the system is sufficiently powerful it ends up doing things like You know taking the nearest human apart to find out Rather than just observing them like what counts as an observation and what counts as interference is a fuzzy line You know with all these weird philosophical problems Side effect reduction is a way of getting around this like level the whole city to make a cup of tea thing You try and build into the utility function a general preference for having a small impact on the world Supartial you know it would help, but it's not a solution to the problem An interpretability is something that people are working on a lot right now because the current AI systems we have are really Black boxes which makes this whole thing so much more difficult So if we can find systems that we can actually understand their thought processes and their decision processes We're in with a better chance of being able to make those systems safe This research is happening currently a few companies and a few academic institutes. It's a very small field. It's extremely exciting such interesting problems and Like currently there are about 50 people working on this and that and everyone is hiring very aggressively right now because People high up at starting to realize that we do actually have a problem here So if you are interested in AI if you're in a place where you're thinking about what you want to do I would urge you to consider this as a possibility I think it's by far the most in problem most important problem facing humanity right now. Thank you What questions do you think that The solution to the AI problem will be solved with better technology Which will circumvent the issues that we currently have or more by developing thought to the extent that we solve the AI problem and But with the constraints of the technology we have at the moment I think that the The problem So there are some there are some solutions or some approaches which Probably perform better the better the technology is like if your system involves Modeling human beings to try and understand what the human beings would want then the better your technology is the better You can model the human beings and so the better that system performs on the other hand The more powerful the system is the more like anything that involves like containment or restriction or Safeguards stuff like that that stuff becomes much harder the more powerful the system is and keeping a keeping a weak system Safe, it's not actually safe obviously, but it's like they're dangerous, but contained is easier if the technology is weaker so I think most of the work right now is Not there's some philosophical work and there's a lot of mathematics and there's a lot of just pure computer science so I Don't think the technology will solve the problem. I think it's got to be for work Thank you for the top I had a question that maybe Do you think that the solution that you proposed or evoked at the end of the talk might in the end hinder the goal of making actual artificial general intelligence as in like make the development slower or even prevent the like prevent the AI to find solution that couldn't be thought by a human by by having like mimicking the human Pafior or Like valuing the side effects too much. Yeah, this is definitely a trade-off where Often you have to trade off like effectiveness or capability for safety Like obviously there's a sense in which a less capable AI system is automatically a bit safer just because it's less able to completely out with you and So, yeah, I think I think being careful about safety is definitely going to Reduce the overall like speed and effectiveness with which we can develop this stuff And that's just necessary ultimately. It's like it's not optional. I feel as though you could probably The example I use of this well Related example is the the work that they did at Los Alamos during the Manhattan project to sort of double-check that The first the Trinity test would not ignite the atmosphere. This was a serious I mean not not a very serious concern but it was something they thought about and they put some very good scientists on the task of really nailing down that This could be done safely before they did it and that slowed them down at a time when there was a war and there was There was like an argument to be made for speed In the in the order of however many thousands of lives every day And yeah, it's a trade-off But ultimately I'm glad that they checked that they weren't going to literally destroy the whole planet before they did it And I think it's the same here Thanks That's good. Oh, that's it. Yeah. Okay. Thank you very much. That's all the time we have We are a bit behind today everybody. Thank you very much for your understanding. Thank you very much Rob