 Boom, what's up everyone? Welcome to Simulation. I'm your host, Alan Sakyan. Super pumped to be talking about multi-agent environments and artificial intelligence. We have Todor Markov joining us on the show. Hello. Thank you for having me. Thank you so much for coming on. I'm really excited for this episode. There's so much to learn about this field that's permeating into every aspect of our lives. For those that don't know Todor's background, he is a machine learning researcher at OpenAI with degrees from Stanford in symbolic systems and statistics. His current focus is on multi-agent environments and transfer learning. You can find the links in the bio below to openai.com, as well as Twitter's OpenAI, as well as Todor Markov's GitHub, his website there, and his LinkedIn profile link. You can find all of those below. Check them out. Todor, let's start things off with one of our favorite questions to ask our guests. What are your thoughts on the direction of our world? Right. So I think today a lot of changes in the world are strongly driven by emerging new technologies, like artificial intelligence, but also like genomics, software, material science. I think today is particularly exciting because we are seeing significant speed up and impressive new developments in some of these fields, like AI, but also in things like 3D printing. And I think a lot of the future of humanity is going to be driven by developments in these technologies and by how well we can use them to improve the human condition. And how do you think we can ensure that the technologies are used to enhance the human condition? I think that's a hard question. It's a question that a lot of scientists think deeply about, a lot of policy people think deeply about. I think that it would require just being careful with technologies that we're developing while at the same time realizing that we cannot... It's not really an option to stand still and try to bring progress to a halt. So I think doing this sort of careful balancing between continuing to move forward, continuing to develop new technologies, while being careful about safety considerations, about being inclusive and really thinking deeply about the effects that the technology is going to have on different people is the way to go forward. Very careful. Slow but thinking geopolitically, how to do it, but not standing still. It says yes, yes, okay. Okay, let's get to the journey. So born in Bulgaria, and actually we were wondering, we're like this Markov chain? Yeah, you're like no, no, no. Common surname for Bulgarians. And so you left at 18 to go to Stanford and that's where we did some statistics. I'm curious, who were you when you were young that kind of linked you into being passionate about science and technology? Yes, so when I was in Bulgaria growing up, I had the very good fortune of going to what's called a gymnasium, which is a specialized school for that focuses on mathematics and natural science. So I went to a really great school in Bulgaria. I had the great fortune of having really great teachers. I did a lot of contests in Bulgaria, both in math, both nationally and internationally. I got some great mentorship from the team leads for the national math team in Bulgaria. I got some great peers, great friends from that experience. And that's how I got really passionate about math and science and new technologies. And that's also how I decided that I wanted to study abroad for college because a lot of other people who were competing had done it and they had great experience doing it. So I decided that, hey, I want to do that too. It seems cool. And here I am. Whoa, what a gymnasium is what it's called. Yes. So this was a was it like a whole height was a middle school and high school. Yes. Okay. Yes. 5th to 12th grade. Nice and powerful teachers, curriculum. Yes. Yeah. The mentorship is is crucial. And also you saw some you saw people before you going abroad and you were able to also make the decisions go abroad. Now when you got to Stanford and you started surrounding yourself with other really smarty pants kids. What was that like? It was interesting. I think from a diversity perspective, it was much higher diversity than my peer group in high school. My peer group in high school had a lot of people who were extremely technically proficient in math and physics and computer science. But it had a much smaller number of people who would be really good at say history or philosophy who would have really in depth read about it and thought about it just because the educational system didn't reward it as much. Whereas at Stanford, there was a much higher variety in terms of people who are doing more humanities or artsy type of things. There were more people also who had worked on interesting site projects like things in robotics or small startup ideas, that type of thing. At the same time, I actually found that the average or maybe even the top 20 percentile people were from a purely technical perspective in math or in physics were not quite as good as the national team at Bulgaria. So in that sense, it helped a lot in terms of, I think a lot of people coming into Stanford have a very strong intimidation factor of, oh my god, everyone here is so good. And for me, it was more like, yeah, people here are very good. They're very talented. They're all smart kids. But I've been in a similar situation before with people who are just as good if not better. Nice. That's cool to apply. Okay. And then how about then, would you say that symbolic systems and statistics are two of the better fields to go into if you want to go into machine learning? Yes. I think they're both great options for symbolic systems. So let me first talk a little bit about what that is. Symbolic systems is a program at Stanford that encompasses several different disciplines. So we studied linguistics, we studied philosophy, we studied a little bit of neuroscience, and then we studied math and computer science. So it's similar to cognitive science degrees at other universities, but it tends to be more technical and focuses a little bit more on the math side and on the computer science side. But at the same time, it's still very flexible. And it allows you to specialize in whatever you're interested in. So symbolic systems, it has specializations ranging from applied logic to neuroscience to artificial intelligence. So it gives you a really high degree of flexibility in terms of making it what you want it to be. And that is one of the main reasons that I was very attracted to it. Flexibility, baby. I agree. And then statistics as well. Yes. I think statistics is a lot of the theoretical models behind machine learning and deep learning to the extent that they exist are driven by statistical models and statistical thinking. So I think statistics is a very good way of approaching or getting into machine learning or deep learning, especially for people who are a little bit more mathematical, a little bit more theoretical, who want to build a little bit more intuitions about, okay, I want to actually know what this thing is doing and why it's working and are a little bit less satisfied by just, okay, it's not working. Let me add three more layers in my neural net and try again. So to actually understand why it's working, why it isn't working, nowhere to troubleshoot, to debug this type of stuff. Yes. And right now there's a relatively little, like very rigorous theoretical understanding of why and how deep learning works. There is a little bit, and it's a very active area of research. There's lots of people in math and statistics who are starting to pay more attention to it. But I think it's a really exciting field and I think, yes, for more theoretical people who are interested in this field, statistics is a great way to get into AI, get into deep learning and attack the problems in the field from a bit more theoretical approach. And then, so then you also spent a good amount of time doing EA stuff, logic, robotics, you like climbing, philosophy. So these are kind of like some of the fields that you're interested in. But you ended up doing some work in tech and finance, Bridgewater, the Stanford Artificial Intelligence Laboratory, Blend. Oh, this is all pre-open AI. Yes. And what was kind of some of the big core takeaways from that period for you? Lots of different things and different things from all the different places. But I think core takeaways were communication is really important. You should try and have very clear lines of communication with everyone you're working with and also with four relationships in your life beyond work. You should strive very hard to be comfortable giving people feedback, even if it's critical and also receiving critical feedback from people and acting on that. And on a more practical project based level, I think those experiences were very useful for me in terms of picking up skills like how to learn things on your own, effectively how to take a big, vague project, break it up into bite-sized chunks and then execute on those chunks one by one, how to manage your time well, things like that. Yeah. Okay. Whoa. So this is like actually quite important to go and pick up these skillsets that you're talking about prior to either doing entrepreneurship or joining other industries is like learning about those skills and applying them in the rest of your life practices. Good. Very good. Okay. Now, okay, open AI stuff, give us a quick mission on open AI. So open AI's mission is to develop artificial general intelligence. So artificial intelligence that is human level or above in any task of economic or intellectual interest to humans. And we want to do this in a way that's safe and in a way that is fair such that the benefits from this artificial general intelligence are fairly distributed across all of humanity and they're not overly concentrated in a certain set of countries or a certain set of shareholders to the exclusion of other people. Wow. Yeah. That's huge. This is why when we were saying earlier open AI has one of those ethic core ethics that is safety and fairness. Yes. And this is a big part of some of the code updates that we're having is this inclusive stakeholders. Yes. That this is not just for extremely wealthies and rock it off. This is for all of us to go from triangle to circle on the planet to decentralized infrastructures that are coming in. I love that part. Okay, now let's get into, let's start breaking down some of this work. You guys have been actually like blowing everyone's minds with creating, which right now are narrow artificial intelligences that are beating humans at what we used to think couldn't happen. Dota 2. Yes. Starcraft. These games are and here and here we have some example videos where we're actually going to be talking about how these things work. And one of the core parts is the competitive self-play component. Yes. So competitive self-play is a key idea right now in how AI systems, especially in competitive games like Go or Dota or Starcraft are developed. It was originally the first big example of competitive self-play was AlphaGo back in 2016 when it defeated the world champion, Lisa Doe. And everyone was super shocked because many people did not expect that to happen for at least 10 years. And the key idea, so AlphaGo wasn't trained entirely with competitive self-play, but a lot of it's training was self-play. And the key idea is very simple. The idea is that if you have some sort of competitive two-player or even with more than two-player game, you're just going to have the same agent continuously play against itself and slowly get better that way. And the reason that works is that it provides what we call an automatic curriculum. It's always at the appropriate level where it's difficult enough that you can learn new things but not so hard that you just have no idea what's going on and you can't learn anything. So if I had to draw a comparison, if you were learning to play chess and you were a mid-ranked level, if you're playing against say your six-year-old nephew, you probably wouldn't learn that much. And if you're playing against Bobby Fisher, you probably also wouldn't learn that much. But if you're playing around someone at your level, then that's when a lot of learning is actually going to happen. And competitive self-play is one way to induce that sort of appropriate difficulty level where it's hard enough that you can learn but not so hard that you're just stuck and don't know what to do. Yeah, that part's critical. This is like when adults play with children and it's not the adults bring down their skill level to the children's skill level so that the children can learn just keep winning at about the same skill level that they know and a little more. But the crazy thing is that you can then do that over and over again. So as it incrementally learns, it's continuously going up one level incrementally learning times millions of plays. And this is what's crazy. When we take millions of plays at Go or Starcraft, it takes us a lifetime. Yes. But millions of plays for you is a day. Yes, yeah. It's definitely for this type of games, especially when you can perfectly simulate them and where you have access to perfectly accurate simulation data. This is extremely strong because you can just have it play a lot. And it's still, it's horribly inefficient compared to a human. Like this probably took hundreds of thousands of tries and if you had to do it, you would do it way faster. And with Delta, like OG, who are a top ranked world class team, they played for maybe 10, 20, or they learned to play for 10, 15 years, something like that. OpenEyes bought spent 45,000 years, which is insane. I don't know if like the total human amount of time spent playing Delta is equal to 45,000 years. And 45,000 years was played in how much time? That was over seven or eight months, something like that. 45,000 years worth of gameplay in seven to eight months. This is why the creative potential to learn things that even humans haven't tested, given the scenarios and circumstances, is just so much higher. Yes. And we did see that during the matches between OpenAI and OG, where OpenAI did some very unusual strategies, which no one in Delta really does. So we did like a very aggressive game, but also a very aggressive use of buybacks during the mid game, which I don't know of any other team that has used this in the past. And from looking at interviews with OG after the match, and also actually just talking with the contractor teams that we've had that were testing the team for us, we found that playing against the bot really helped them tighten up their game and explore new things, get new ideas for how they can improve their own game. So I think that really also highlights the potential for AI to act together with humans and help them learn and help them improve and get better at the things that they care about. The hybrid potential is very, very beautiful. I want to ask a couple questions on the technical side. I think it's important to at least break this down. What are the environments where you're designing? This is CAD, computer aided design. This one I'm guessing is a re-skinned version of Mujoko, which is, yeah, Mujoko stands for multi-joint control, and it is a fairly, it's a very good physics simulator that's very popular for AI research, especially for research in robotics, because it's, yeah, it's a reasonably accurate simulator that models lots of different real-world physics. It's not perfect clearly, but it does a very good job. It has good support, and lots of researchers really like it. I like it. I think it's good. And this is a physics simulator, Mujoko. Yes. Interesting. And then in terms of how the actual math works behind the physics simulation component, how the designs are actually made, and how they're really quickly computing on exactly how to, in this case, it's seeing if you can get the ball kicked past the goalkeeping opponent, or in the other one, we have the sumo example where they were trying to push each other out of the ring. Yes. So how is the math working in the background where it's trying to realize where the other object is, how to push it? How is that happening? Right. So actually, most of the math here is baked into the simulator. So the physics part has a lot of math in it. And that one is, I actually don't know that many details about it. I've played a little bit with the internals of Mujoko, but as we said, it's lots of complicated math. And on the agent side, in terms of how is the agent actually learning to do things? How is it learning how to do sumo well? Or how is it learning to stay upright when it's being pushed by wind? There is some math, but actually most of the math is we don't explicitly tell the agent, hey, here's how you place, you do sumo well, or hey, here's how you stand upright when there's wind. What we do is we just, we use something called reinforcement learning. So the idea is that when the agent is interacting with the environment, it gets a certain reward if certain things happen. So for example, in the goalkeeping task, you get a reward if you score a goal and the keeper gets a reward if it stops the ball, if it stops a goal from being scored. In the sumo task, you get a reward if you trip the other, your opponent. In the wind task, you get a reward if you don't fall down. So you have these rewards and then you have an update rule, which it varies depending on the specific algorithm it's doing, but the general idea is that you tell your agent, hey, here's how you learn based on, how you do a small learning step based on what you did in this episode and the reward that you got. And so the basics are somewhat simple. There's still some math involved, but I think it's the math there is relatively simple compared to the math in the physics simulator, but it's still, it's the impressive part is that it's still able to learn these relatively complicated tasks where, like in this case, it's the sumo game and the wind and the, like kicking the ball, but it can also learn extremely complicated things. It can learn how to play Go, it can learn how to play Delta and at an extremely high level. And those things are something where if you try to explicitly tell the agent, hey, here are the rules by which you play Go at a world class level. Here are the rules by which you play Delta at a world class level. You would never be able to do that. Just humans don't know how to do that on an explicit level. If you ask Lisa, though, hey, why is that move good? He can point at some general insights, but it's not something where you can write a checklist and then by following the checklist, you can play Go well. It relies a lot on intuition and implicit representations. And I think it's very impressive that this type of simple learning rules can learn these very complicated intuitions, these very complicated internal representations. That you can create an aware character agent in the physics simulator and then as they do something like prevent this person from running past that line, that they get rewarded and they know that then that is a behavior that I should continue. Yes. Very similar to us as agents in this world as well. When we do something like exercise or choose a healthy food option or when we achieve a bunch of goals that we have set for ourselves, we get rewarded. We'll get to that question at the very end of the show. We did the wind attack 1-2. Ron, did we do the second 1-2? This is the wind attack, right? Then this is putting a vector of wind at a certain degree of strength and asking the agent to stay on the platform. The interesting part about this one is that actually this agent that's here right now is the same agent that was playing sumo before. The agent that was playing sumo, we took it out and we placed it in this new environment with wind and it's never seen wind before and we actually didn't allow it to train. We told it here is this new environment, here is this new thing that you have no idea what it is. You've never seen it before. Figure it out and the thing that we like is that actually it's reasonably good at figuring it out. It's doing reasonable things. It has this squatting stance. It's kind of stable. It's trying to stay in the center and this is without any training on a new environment that it hasn't seen before. So this is an example of what we call transfer or transfer learning where things that you have learned in one environment that you've trained in then transfer in a different environment with somewhat different dynamics. So as we see here, it turns out that the things that you learn for staying upright in sumo for not getting tripped over, they also help you if you don't have another opponent but if you have winds that you need to deal with. That's so cool. This is kind of the first incremental steps towards trying to generalize narrow intelligence. That's beautiful. Beautiful example. Okay and then the next one is this. You guys had an ant, a bug and a spider with two legs, four legs and six legs and eight legs. Yes. So this work was something done by my team, the multi-agent team at OpenAI. I think about a year or two years ago around that time frame and the idea here is that you have two different algorithms. One is just a regular deep reinforcement learning algorithm and the other is a metal learning algorithm which can learn and adapt based on within an episode what the other agent is doing. So initially this was for this work we had a population of these different algorithms who are also controlling these different robots. So the four leg robot we call it an ant, the six-legged green robot we call it a bug and there was also a eight-legged robot that was called a spider. So the bug is the strongest one in terms of physical strength, so things like how much force can you apply on your joints and the spider was the weakest one and so we ran these multiple algorithms with running on multiple robots in a tournament league where we ran evolution on that league. So agents and algorithms that win got to survive and replicate and agents and algorithms that lost would continue would be pruned and eliminated. That's the fifth asset, Ron, if you want to bring that one up. So that shows like you were just describing that over time you can actually see which agents are the winners in all of these different simulations that occur and over time one of the models beats out the other ones and that's the it's called the model agnostic metal learning algorithm. Yep, exactly. So we see here that spiders don't do as well because they're weak but also the metal learning algorithm does better than the regular algorithm and also there's a difference here in terms of model architecture versus like whether the architecture of the neural net you're using is aware of time of what's happened during the wrestling before or whether it just treats every time step individually. So being aware of time also helps a lot and agents who are aware of time end up doing much better in this tournament. But yes, something that was interesting here that we thought was important is that even with the ant a weaker agent the metal learning algorithm was still stronger and better by a bug controlled by a regular reinforcement learning algorithm and so yes the broad idea behind metal learning algorithms is that they're learning how to learn they can learn how to do different tasks quickly they can adapt to new circumstances quickly and this was one small but I think still important example of why this matters and how this helps you perform better in competition with other agents but also potentially on other tasks that you might care about versus a reinforcement learning versus well metal learning is still technically technically a reinforcement learning algorithm but it's a specific approach to it which has advantages over regular algorithms it also has some disadvantages they can sometimes be slightly less stable it can be hard to get them to work but I think it's an exciting area of research within reinforcement learning. So the edge of knowledge has been pushed in reinforcement learning and now we're going even further and seeing which algorithms of reinforcement learning are good at whichever tasks within that field. Yes or we're thinking about how usually most algorithms of reinforcement learning they're good for many different tasks so the same algorithm that plays DOTA it can also learn to use a robotic hand to manipulate a cube and it has no idea what environment it's working on. That's the same reinforcement learning algorithm yes the same one that's so that so then that does then generalize. Sort of so it's not the same weights it's not the same like the model itself is the learned model is different but if you start from a novel model the same algorithm that learns how to play DOTA it can also learn how to control the robotic hand. And it assigns weights to variables over time. Yes that's correct. Based on their rewards. Yes okay. Yeah there was there was there was an interesting component to this which was that at some point you actually this is it's right here that rewards for behaviors that aid exploration like standing and moving forward which are eventually annealed to zero in favor of being rewarded for just winning and losing. I thought that part was really interesting so at first it's like you get rewarded for doing these interesting even just standing up being exploring and then that goes to zero because the most important reward is whether or not you can win the game of sumo pushing the other thing off. Yes so yeah this is an example of something that is a heart problem in reinforcement learning that is also an active area of research which is how to solve exploration problem and also sparse versus dense rewards which is one approach to it. So in general in reinforcement learning we say that you have a sparse reward if the reward that you're getting is relatively rare so say you only win it you only get rewards once you defeat the other agent in sumo and we say that you're getting dense rewards if you're getting a reward much more frequently like for example reward based on are you still upright if you're still upright you're getting rewards and there's a trade-off between those two things sparse rewards help you be much more accurate in terms of targeting exactly the behavior that you want to learn like if you're playing chess you care about checkmating the other the opponent you don't necessarily care what type of opening you're playing or mid game as long as you can checkmate your opponent but when the reward is sparse that makes it harder for the agent to learn because in the beginning it's kind of doing random things and if it's hard to get the reward it may never figure out what is expected of it so if I were to draw a parallel here like if you wanted to like if you had a dog and you wanted to train it to do tricks it would not start with very complicated things because it would it would never be getting rewards and it would never figure out what it needs to do like like fetch me orange juice the dog would never be yes exactly so you want to start roll over something like that yeah you start with simple things and you gradually increase the complexity and this is also what we did in the sumo game where we started with simple things like okay stay upright move towards the center of the room but eventually we wanted to actually do the thing which we care about which is win at sumo so as time progresses we start rewarding it less for the simple things and more for the actual victory and at the end we're only rewarding it for the actual victory wow okay so you start with the dense rewards and then over time the rewards become sparse yes wow that's that's a really cool component to reinforcement learning wow competitive self-play is freaking crazy this look at this it's like running you know look at this it's just it's it's nuts how quickly these agents can learn to be so much at their peak at their absolute best and the applicabilities of this are also into like all of the like automobile and like manufacturing and airline and even so many other simulations and biotech and other fields just like how do you optimize for a given variable like constraints how do you optimize like you can literally test like a part on a vehicle for 100,000 miles and see how well it does stuff like that yes though actually that's another open research problem in reinforcement learning that's very important and the people are very interested in solving in that right now RO actually hasn't been applied directly in the real world that much and the reason is that it is extremely data hungry so it requires these simulations but a lot of simulations they are they don't perfectly match the real world and if you directly train in simulation and then try to transfer in the real world often time your thing is just not going to work because the real world is different there is a bunch of things happening there that are not in your simulation and this is like potholes in the road or something yes so this is called the sim to real transfer problem and it's a very active area of research in robotics people are super excited about it lots of smart people are thinking very deeply about it and try to solve it again the name of it is again sim to real problem simulation to reality sim to real problem yes whoa that's a good one too the sim to real problem simulation to reality problem yes okay um let's go into our um the neural mmo wow okay so mmo massive um multiplayer online and rpg role-playing games usually how we see these things um like the world of warcraft of um and the grand theft auto multiplayer sand big sandbox games these types of things and um this is crazy because you have uh your simulating uh agents that are playing here and this is 128 concurrent agents playing um over a hundred concurrent servers running 128 each ish or anywhere variant and then a hundred million lifetimes when i was reading this i was like this is nuts okay so yeah so neural mmo teachers about this yes so neural mmo is a project that an intern on the multi-agent team joseph suarez he worked over on this over the summer uh of 2018 and so this ties in the broader uh mission of the multi-agent team where the multi-agent team is interested or we believe that uh complex environments with large-scale multi-agent interaction lead to uh the emergence of complex skills complex behavior and that also this behavior is robust and generalizable so it's useful in many uh interesting cool tasks outside of the original environment that you're training with so neural mmo was uh kind of like a proof of concept for this exact idea where we wanted to test out a relatively simple environment with simple dynamics and see uh oh if we have a massive number of agents interacting here are we going to see cool stuff happening and we do see some pretty interesting things like a lot of camping around the edge there's so much camping on but also exploration into the middles and stuff yeah yeah so some really cool things that I think for me were possibly the main takeaways from this work is that one uh agents which are trained with a larger numbers a larger number of other agents oh so to explain a bit how the game works it's relatively simple you have all of these agents they're uh walking around this map they're trying to forge for food and they can also fight with each other and um so we see even from this simple of an environment with a large amount of Ron go ahead and pull up the seventh one asset as well that will also help you with this part of the description yeah go ahead yeah so you have the agents you can see them fighting here and also uh I think there's food items around lying around that they're trying to forage and um you see um we see some really interesting behavior some really interesting things happening even from a relatively simple environment like this so one important thing that we see is if you're an agent that were trained in a competing with a large population you end up being roughly stronger somewhat stronger so if you are trained during your life with a hundred other agents and then you're putting a server with agents that in their life were trained with 10 other agents uh on their map then you end up being stronger and you're able to out compete them if you are trained with more other agents you end up uh exploring more you walk around the map more you're slightly better at foraging um and a third thing that is somewhat related is also that um when you train different sub populations of agents you see these uh geographical niches form where each sub population like certain parts of the map and kind of stays there and tries not to walk into other popular other sub populations niches so that it doesn't have to fight with them whereas if you only have once a population it tends to roll more widely and there are no individual niches that form whoa so so each one of the the characters uh have uh they have they're trying to weigh out how close they want to be to other characters because then they can throw those items to take each other's health bars down um also they have to forage for nutrients on the map and so they're trying to balance all these things out a lot are you know hang out by the like in the center you have a there's a larger uh surface area around the um around the diameter of the to be attacked um so like that's why some hang out around the edge because there's only 180 degree versus 360 etc these are so these are some of like these when you when you compete on this map with the like say what the 128 of these and then you become the best here then you get moved up to where all of the other best across all of the other servers were competing against each other uh i'm not sure if we did it we might have done this but something that we did do was we took some agents that competed with 128 we took some agents that competed with 64 we took some agents that competed with 32 we took some agents that competed with 16 and then like we put them all together and we said fight we want to see who's the best and it turns out that we do see somewhat qualitatively different behavior where uh in general agents that were trained with more other agents end up uh performing better and oh so maybe a component of this we're not sure if this is a causal factor but one other thing that they do is they just explore more they roam around the map a little bit more whereas agents that were trained with a smaller number of other agents they tend to hang out more closer to the initial spots that they started that they were spawned in and they don't move as much and that may have to do with how on the 128 scale that there's just so many more pieces of of knowledge to gain about how to play at the best level versus if you're only playing against 12 or 24 potentially yeah i think it has to do with the fact that there's just more competition in the 128 so you need to roam further in order to uh like not be fighting all the time and in order to avoid this type of extensive fighting whereas with 16 you can move five steps and you're fine oh okay oh it's the same size terrain yes it's the same size oh because if the terrain doesn't scale that makes sense yeah okay yeah okay okay because you can more freely walk around you end up combating less and yeah okay okay you okay um and then yeah that last um the last one ronnie um so this one shows then how they move on the terrains this image yes and so if this is center one is when there's only one agent yeah so this is an image that illustrates the zoom out of it yeah about niche speciation that i was talking about so when you only have uh one species like in the center image uh you end up with just one relatively uh like big piece of the map that the agents like to hang out in and uh there is no um they're kind of uniformly distributed across this uh like uh this red stripe here whereas when you have more species so when you have eight species like in the left image the different colors are the different subpopulations and you see how the different subpopulations end up liking uh different uh pieces of the map and they tend to uh hang out in the in the same area as other as other agents within their subpopulation and they don't intersect that much with agents from other subpopulations and we we think that this is mostly driven by um just avoiding need this fighting but we think that it's also uh an interesting toy proof of concept for specialization and niche formation uh emerging as a part of this multi-agent interaction and we hope that in the future with more complicated environments we are going to see more interesting specialization more interesting niche formation so we are uh we are relatively uh happy and this result makes us somewhat optimistic that we are going to see that in a more complicated environment this is gonna lead us to the question that we must ask now are we already in one of these massive multiplayer online role playing game environments uh maybe uh if we are there haven't been any very direct signs of it i think that the counter question i like to ask to that is if you knew that you were in a simulation for sure or if you knew that for sure you were not in the simulation would that change your life in any way would it change the actions that you take on a day-to-day basis good counter question wrong thought oh change ron's how how did it change it ron i don't fuck around now that's all period no fear i call it like i see it i stand corrected when i'm corrected i live my life i try to stand up for the oppressed there's something among us that wants to control us and they can go fuck themselves yeah ron yeah that's it so so the behavior hasn't had did change it did change once you once you learned well at first it was don't get me wrong it was frightening when you when you were aware that there's something much larger than all of us trying to work through us to manipulate us to control us to do its will nothing less than slavery all across the boards and now that the most intelligent of us have realized that jeez something's trying to enslave us and uh you know let's uh let's let's let's let's try to beat it and let's try to speak out against it and then when we do that then we become enslaved as well you know it's looking for just ignorant people that speak confidently of shit they know nothing about so this is the something beyond the three-dimensional realities taking its oh it's yeah multi-dimensional way beyond 3d reality something multi-dimensional comes in through earth and through us if we allow it if we allow it and then and then if and then so this is the realization then that ron is talking about and toters question maybe seems like it was maybe somewhat i think related but also like um in terms of this already being a simulation in terms of this already being a massive multiplayer online role playing we're we're cracking the code we're hacking we're hacking it and we're doing it to the best for humanity all of us collectively ron and i about a maybe a month ago or so we're just talking about as we continue to make these mmorpg simulations ourselves we gain a better understanding of how we could potentially already be in one that is being manipulated by the creators of it so that's why once you make really complex simulations using super intelligence is it's easier for us to look in the mirror and realize well that how is what is that already yeah so this types of things it's very hard to yeah it's very abstract we're still far away from it all that stuff but yeah they don't like me talking like i just talked either they get really upset if i end up to dead in the street later on today you know why there is also an interesting question around agency in that uh i think if you're thinking about a simulation it's um usually i think it implies that whoever is running it is i think it's in a sense anthropomorphizes them or it assumes that they have a conscious experience or a certain set of goals and it's not um it seems possible that you might have uh like there are ways that the world might be where like what we experience is kind of like a simulation but there's no one necessarily running it or the thing running it is closer to a natural process or a force of nature rather than an agent that has conscious experience and i think that kind of blurs the lines and creates a weird middle ground where um it's unclear if that's what's happening uh is that does that still count as a simulation is it meaningfully different than this being the true reality so in that sense i think there's um lots of interesting potential middle ground positions which uh would be cool to explore yep yep the the anthropomorphizing or not of the of what is creating potentially the process the simulation now um what do you think happens pre-birth and post-death um hmm probably nothing or Occam's razor would lead to believe that probably nothing okay the Occam's razor answer okay and then what do you think is the most beautiful thing in the world hmm the most beautiful thing in the world um humanity tell us more why um hmm the most beautiful thing in the world for me is humanity and i like humanity because uh well i'm biased partly but hmm there is something aesthetically pleasing in a sense or just the um storyline of um being confused apes stuck on a rock floating through space but still trying to do interesting things with your existence i think is cool and uh it's kind of a rooting for the underdog vibe the world is big and complicated and uh not very nice or gentle but humanity is still trying to stick it out and flourish so rooting for the underdog i love it we find ourselves as stewards of earth good luck see if you can handle it can you all play together basically but yeah yeah that's good yeah it is very beautiful it is it is so this has been such an interesting episode thank you thank you for coming on and teaching thank you for hosting me we are very grateful it's been such a pleasure um holy cow that was super fun i mean there's just so much to still understand about multi-agent environments about competitive self-play neural mmo's like all the stuff that's that you guys are doing um everyone thank you so much for tuning in we would love to hear your thoughts in the comments below on the episode check out the links below to openai.com the twitter for open ai also totores links and his linkedin profile check all that out and go and talk to more people your friends your co-workers people online on social media about things like multi-agent environments competitive self-play neural mmo's and go and and and share more of the questions around these concepts and go and talk to more people about them also huge shout out to ron vagus for producing and directing thank you very much ron we greatly appreciate it and we did pretty well with all those videos good job ron proud of us and support the artists entrepreneurs the organizations around the world that you believe in support simulation our links are below help us grow and prosper and go and build the future everyone manifest your dreams into the world we love you very much thank you for tuning in and we will see you soon peace