 Hello everyone, it's an honor to be here. Thank you to the organizers. So I want to talk to you today about fueling innovation by creating, collecting, and improving stepping stones. And there's an interesting paradox in life. Let's see if this works. It's not working. That's not good. All right. There's a paradox in life, which is that if you try too hard to solve a really ambitious problem, which is to say that you only take actions that advance towards that goal, then you'll fail. However, if you ignore the objective completely, you're much more likely to succeed. So a metaphor for that is this maze here. Whereas if you imagine a little robot that starts at the bottom and it tries to reduce its distance to the goal, if it only takes actions that do that, then it will go north and it will bang its head against the wall forever, which is what this optimization algorithm in the middle does when it incentivized to reduce distance to the goal. However, if you ignore the objective and just explore the set of possibilities, then you can trivially solve this problem. So this is actually a really key and important idea behind all innovative processes out there in the world, whether or not that be science or technological process or even natural evolution. The general idea is that you cannot have one overarching goal that you myopically optimized or invest all your resources in. Instead, you have to invent your own sub-problems and your own goals as you go. And then if you're working on one problem and you suddenly invent something that is actually helpful on another problem, then you catch chance on the wing and you recognize that serendipitous moment and you switch and you do a goal switch and you start optimizing for that other problem, too. So there are many examples of this in life. So if you think about this technology for cooking food and you wanted to say, I want to cook food faster and cleaner and you only worked on things that cooked food faster, then you would never invent the modern microwave. Because in order to invent the modern microwave, you had to have been working on radar technology and then recognize that that melted a chocolate bar in your pocket and say, oh, that's interesting and start adapting that technology to cooking food. Similarly, if you start out centuries ago with the abacus and you say, that's fantastic, it does computation for me, what I'd really like is orders of magnitude more computation. Well then if you only invested in things that improve the amount of computation that an abacus provides, you might get longer rods and more beads, but you would never invent the modern computer because to invent the modern computer, you need to be working on vacuum tubes and electricity which immediately have nothing to do with increasing computation. Similarly, to go from coal-fired furnaces to clean energy in the form of nuclear power, you have to be investing in space and time in the form of Albert Einstein. And in a natural evolutionary example, if you want to go from dinosaurs to things that can fly, well you shouldn't invest in things that get you higher off the ground. Instead, you have to start researching and investing in this new fangled technology called Feathers, which initially are really good for insulation and are not at all good for getting you off the ground. So the conjecture here is that the only way to solve really hard problems is to create the problems while you're trying to solve them and then goal switch between them frequently over time. And so by the way that we can do that algorithmically is we can have our algorithms if they're starting to optimize for one objective and they suddenly recognize they've produced progress at another objective to start optimizing on that other objective too. So imagine a scientist and it wants to create a robot that can learn to walk with like a deep neural network. If that robot during training suddenly starts crawling really well or balancing on one foot really well, we shouldn't throw that out as a failure because it's not walking. Instead we should capture that serendipitous event and start optimizing for crawling and balancing too because those may be essential stepping stones to ultimately produce a walking robot. So the ultimate goal here is we want to produce what we call open-ended algorithms. These are algorithms that can endlessly innovate and I mean endlessly. I'm talking about if you give these algorithms a billion years of infinite planet-sized computation, would they continue to innovate? We have no algorithms like that today but we think it's fascinating to try to figure out how we might build them. Now we know that this can happen. It happened on earth which has been innovating for over a billion years. And if you think about it from this very simple origin of a single-celled creature, we now have jaguars and hawks and the human brain in this endless parade of engineering marvels unraveling from this system. It's also true in human culture which has been endlessly innovating since the dawn of human civilization. So the question is can we make algorithms that do this? And so one of the things that we've noticed is that there is kind of an abstract principle that underlies all of these innovation engines and that is that they start with a set of things that they've already built and they generate permutations of those things and if they're interesting then they keep them and they add them to the set and then they continue expanding outward with this ever-expanding collection of stepping stones. And so one of my goals in this talk today is I wanna sketch out a little bit more how we're trying to build algorithms that kind of capture this innovation in a bottle and produce it, which we call open-ended AI algorithms. And then I actually wanna draw some connections between those algorithms and the open-source community which I hope you'll find interesting. So the first kind of family of algorithms my colleagues and I have been building kind of pushing towards open-ended algorithms or what we call quality diversity algorithms. And the idea here is that we wanna, our algorithm to find and produce a diverse set of high-performing solutions. And a canonical example of this type of algorithm is an algorithm that I co-invented with Jean-Baptiste Moret called Mapelites. And the idea here is that you choose dimensions of interest that you're dimensions of interest that you will care about and then you discretize them. So for example, if you have a robot and you want it to find survivors after an earthquake you might also care about other things like safety and energy efficiency. So you discretize those dimensions as in this grid and then you search for the highest performing solution at every point in this grid. And what's interesting about this algorithm is it doesn't return to you one solution with an arbitrary trade-off between safety and fuel efficiency but instead it returns to you an entire surface that should illuminate the space of possibilities telling you, for example, that you can do really well in the corners of this box but not in the center. So I wanna show you this is a qualitatively different kind of algorithm. If you run traditional machine learning on a problem that has, for example, some dimensions of interest and you only optimize for performance what you get are low performing points here and you don't explore much of the search space. If you add diversity, a pressure to kind of actually explore more of the search space you do get higher performance and we've known that for decades but you still in actuality don't explore that much of the search space. Map elites, which is a quality diversity algorithm is a qualitative difference in a sea change at what happens with search. Here with the exact same amount of compute you completely fan out and understand what's possible in the search space you explore much more of it and what's really interesting is we're finding over and over again that these algorithms actually find a better max performing solution than the other algorithms whose job is solely to find the highest performing solution and that's because they do a better job of expanding out and finding what's possible in this search space. So what's really interesting is that we can observe in these laboratory experiments that goal switching is absolutely essential for success. So if you take one of the final solutions in the map the best thing that map elites ever found for one of these types of problems and you look through the kind of the history what problems were being worked out in the lineage that eventually led to that solution and what you find is that search was not working on that particular version of the problem or even nearby versions of the problem that very frequently you have these long circuitous routes through the search space that you had to be working on this problem and then that problem and switch to this problem and then work on that problem to ultimately solve one particular problem and here you can see some of those traces through time and what I propose to you is that it would be virtually impossible for a human to come up with this curriculum ahead of time but what's cool and what's powerful is that this algorithm is automatically simultaneously exploring many curricula through the space and finding the ones that ultimately work. So we actually created an algorithm called innovation engines and we challenged a whole population of little tiny neural networks to produce one picture that activates the neuron and a deep neural net that is trained to recognize coffee cups or golden retrievers or a motorcycle and what that allows us to do is compare what happens when you try to optimize for a single objective say make me a coffee cup picture or you have a thousand different objectives simultaneously and you allow goal switching between them. So in the single class case this optimization landed on this early motif it's trying to produce a water tower here doesn't look much like a water tower but it's stuck on that local optima and just forever is basically stuck on that theme and makes minor enhancements that improve the score but ultimately don't produce a very high performing picture. In contrast map elites when it has a thousand different classes and allows goal switching initially it lands on that dome or moon like thing which kind of looks like the top of a water tower but isn't very high performing but ultimately something that was doing well on the beacon class swaps in becomes a better water tower and then gets refined to ultimately look more like the water towers in this data set and became much more high performing. So we can quantify this if you have more objectives you do better over time and in fact the more objectives the merrier that as you add more objectives you get more goal switching and you get higher performance ultimately in the system. Now goal switching also enables really good ideas to spread which I love and so this is basically allows one innovation somewhere in the algorithm to spread to a whole lot of different problem types and then become the technological foundation on which further innovation is built and this happens also in nature so biologists call these adaptive radiations in one lake in Africa there might have been an innovation in the efficiency of breathing in along for example and that can spread to all the other lakes in Africa and then that solution can be customized to the particularities of those different lakes. Similarly Darwin's finches started out with one idea and then spread and got adapted and customized to each of the different niches on the different Darwin, the Galapagos Islands. And finally think about the computer that amazing original invention has radiated out into society and then adapted to so many different use cases and is the foundation upon which we all build. So we want that to happen in our algorithms and what's amazing is that we are starting to see that happen in our algorithms. So in that domain where we were generating images this initial innovation of the black dome with a red background initially with the deep neural net thought that that looked like an abaya and that original idea then radiates out through all of these different classes to ultimately become the backbone on which you get a volcano, a mosque, a water tower, a beacon, a yurt, a church, a planetarium and obelisk and a dome. So it's really amazing to see this happen inside of our search algorithms. In 2015, we had a paper in nature that showed some of the power of these ideas. What we did is ahead of time in simulation we launched a quality diversity algorithm specifically map elites and it went out and it gathered all of these different ways to walk using legs in different ways and combinations. And then when a real robot became damaged in the real world, we used an efficient algorithm Bayesian optimization to search through that set of high performing diverse solutions to find a gate that worked despite the damage on the robot and in about 30 to a minute and a half this robot is able to get up, conduct a few experiments and walk away and continue on with its mission. We also use these exact same ideas to solve this kind of long standing kind of grand challenge in artificial intelligence which was this video game Montezuma's Revenge which was one of the games on which DeepMind's original paper scored exactly zero. And many, as you can see all the blue dots here many different industrial and academic labs have been trying to make progress on this task for years now, mostly without success scores between zero and about 10,000. And then we applied these ideas and enhanced them a little bit to have the agents in this game kind of go out and find the best way to go here in the game and go here and go over here and do this activity. And that exploration strategy kind of produced a sea change that took scores from about zero to 10,000 all the way up to on average 660,000 with our top performing neural network here scoring about 18 million and beating the human world record of 1.2 million. So quality diversity algorithms are really exciting for all the reasons I've told you and the question is what's missing? And the answer is that still ultimately their ability to innovate is constrained to whatever problem it is that we're running them on. And this is actually true with almost all machine learning algorithms. So no matter how long you run OpenAI's Dota agent in Dota or DeepMind's AlphaZero algorithm in Go what you'll end up with is a really good Dota agent or really good Go playing agent. But what you won't have is a robot that can wake up and make you breakfast in the morning which is what I would want. And so we want algorithms that can do that that can kind of break out of their sandbox and continue to innovate relentlessly. And so the intriguing possibility that I want to raise with you today is that could we create algorithms that generate their own challenges and to solve them all at the same time? Just as nature kind of invented the problem of trees or sorry leaves high up in trees and the solutions in the form of caterpillars and giraffes that can eat those leaves. And so our most recent work here is the paired open-ended trailblazer or PoET algorithm out of UberAI Labs with my wonderful colleagues here. And the idea is that we want to endlessly generate increasingly complex and diverse learning environments and their solutions simultaneously. So here PoET pairs two populations a populations of environments and a populations of agents and periodically will add a new environmental challenge if it's not too difficult or too easy for the current set of agents and then we'll also optimize the agents to do better on those challenges and we'll allow goal switching between them. We have a deep neural network with a variety of sensors and the degrees of freedom in the environment are listed there. So this algorithm has to learn on its own how to walk and it starts off on easy domains such as flat ground with little tiny stumps or slightly hilly domains, a few steps, a few very, very small gaps. And over time PoET's generating harder and harder challenges for this little deep neural network to learn how to control this robot to walk on this task. And over time the environments increase in complexity. Now we're starting to see bigger gaps or taller stumps in this case but you're seeing them kind of each separate. It's like learning these skills. First I want to take on big gaps and then I want to learn how to climb big stumps and now rugged terrain. It's not yet combining those things but later the algorithm does start to combine those objectives. So now this poor little robot has to jump these big gaps and climb over really big stumps and it's using its sensors to figure out how to run and leap over chasms that are about as big as possible for its body and its motor torques. So here's another challenging environment. We didn't even think this was possible but PoET invented it itself and figured out how to solve it. So what's interesting is that if you take the final environments that PoET has invented and solved and then you directly drop a new deep neural net with optimization in that environment try to solve it, it always fails because there's no gradient or curriculum to figure out how to solve this really hard task. It's like being dropped in calc three when you're a four-year-old or something. So you might think that's too hard so we'll try to do kind of what's intuitive. We'll take the solutions that PoET invents and then we'll create kind of a direct path, a series of stepping stones that linearly interpolate between the original simple environment and that hard environment and what we find is that also always fails. Somewhere along the line there's some jump that it can't make because the right stepping stone to solve the first red thing in this chain is not the green thing right before it. So intuitively designing this curriculum never worked and that problem gets worse the harder the task is. So goal switching is absolutely essential for the success of this algorithm. Now I want to share with you one of my favorite anecdotes from this work. So here is an algorithm is learning how to walk and it does so in the original simple environment and if you notice it's kind of dragging its knee on the ground it's not standing up and walking for whatever reason and however PoET eventually says okay you're good enough on that environment we'll create a new environment with little tiny stumps and that knee dragging behavior now no longer works it's getting tripped up on these stumps. So in that environment it has to stand up learns to do that and go over these stumps and gets a better score but PoET is always checking for goal switching opportunities you know is the best solution on one problem a good solution for another problem so it automatically swapped that solution back into the original simple environment where now that it's standing up it gets an even higher score and with further optimization gets all the way to a score of nearly 350. Now we did the counterfactual here we took the original agent and gave it as much computation as was required to get to the score of 349 but let it run in that original environment and it never stands up and it never gets a very good score. So here think about this kind of intuitive curriculum to solve a simple problem you actually had to go work on a harder problem for a while and come back to it. So we can test PoET with and without goal switching and we find that without goal switching it never sol produces and solves these really hard challenges but with goal switching it always does. So in future work in this domain what we'd like to do is take this to much more interesting complex 3D environments such as the one that you see here but I think it's more provocative to think about this algorithm in this context as we get more computation in the coming decades what will happen with an algorithm like this when it has the ability to explore extremely complicated environments with other agents that you have to negotiate with to buy and sell goods you might have to fight you might have to cooperate you might have to climb walls evade aerial predators, et cetera, et cetera and so with a billion years of computation and environments like these these algorithms might do really wonderful things. So ultimately I think that this is a step towards what I call AI generating algorithms which might be, and I argue in this paper that's linked here are potentially the fastest path to our most ambitious goals as an AI research community which is to produce human level AI within computers and the idea is that we might need algorithms that learn as much as possible on their own as opposed to the current dominant trend of machine learning which is trying to hand design the solution piece by piece and the idea is that we could launch an algorithm that bootstraps itself from simple origins all the way up to producing human level or even super human level AI all on its own and we know this is possible because that algorithm has run on earth and has produced all of the people in this room so we can do it algorithmically we just have to figure out how and the idea is that the game plan is that we would have to simultaneously do three things we'd have to metal learn the architectures the learning algorithms and automatically generate their learning the environments, the learning challenges which is what I focused on in my talk today so to conclude kind of the academic portion of this talk automatically generating environments and solutions I think is a really really powerful ingredient to try to propel forward innovation engines they automatically invent their own curriculum they create, collect and improve stepping stones they harness goal switching and adaptive radiations they also hedge their bets by simultaneously trying multiple overlapping curricula and finding the one that ultimately works and I think that these are keys to algorithms that endlessly innovate including evolution and human culture in the form of science, technology and art and therefore they may be the only way to solve really ambitious problems that we have in society and in research and to discover the full gamut of what's possible out there in a search space so if you're interested in more on these subjects we just put a tutorial online for ICML in 2019 on population based methods so now I promised you I would draw some connections between this line of research and the open source community and what I think is really interesting if you think about it is that the open source community acts a lot like a quality diversity algorithm if you think about what people here do they create, they collect and curate and they improve stepping stones which are these different packages and repositories so in the sense you and the people in this room and the people in this foundation are doing exactly what is needed to catalyze progress in society so to the individual contributors in this room I would say please create, creating, collecting and improving your projects and to the people that are community organizers keep enabling those wonderful activities so all of the code, I want to mention we're doing our part all of the code for all of the algorithms I've introduced are online and available and also there are more open source projects at Uber we're really proud of our open source activities there so my final thoughts for this room in the open source community are this and this is kind of what I've learned is working on these algorithms if you find it interesting even if you don't know why you find it interesting or you don't know what it's possible use is in the world put it out there go out there and build it because you never know we're working on that idea that project will take you you may go switch from that to something else and have a serendipitous discovery similarly I would say share it because the more stepping stones the merrier you never know how your idea or your repository or your code or your hardware will be used by someone else you just can't anticipate it so put it out there into the world and something wonderful might happen I'd also say just release it clean code is great but please don't let perfect be the enemy of the good I've seen time after time people say my code's not good enough to release and they deny civilization the opportunity to benefit from that code and that innovation so just put it out there if you don't have time to clean it up so I would say that the least helpful repository out there is the one that doesn't exist so finally I just want to say keep doing what you're doing which is providing the rocket fuel for the open-ended algorithm that is technological progress thank you