 It is a great pleasure and honor to be here today, speaking with Professor Nick Bossram. Professor Bossram is one of my favorite people alive today and probably in history. From my perspective, he's, you know, if we make it as a species into the far future, it will be in significant part thanks to him and his work helping us look, think about the future, think about the long term, think about how we might evolve. He's written, of course, about many things in technology, but especially about digital minds, the evolution of humanity, super intelligences, and more. He leads the Oxford Future of Humanity Institute, where he and many other researchers help the world think about these extremely important topics in a variety of ways from both research directly into the philosophy of these questions and the making estimations about the real impact and also framing and constructing important policy work that can help guide many policymakers around the world in how to think about these critical policies. So today we're going to have a very good and lively discussion about many of these topics, especially things like super intelligences, where are we in these timelines, whole brain emulation, digital minds, the future of these, the challenges for our civilization, and more. The format of the evening will be that, we'll sit in our fireside chat first, I'll ask a set of questions, and then around 30 to 40, maybe 50 minutes from now, given I have a bunch of questions, I'll open up to and transition to questions from the audience, and then we'll set them on time then. I'll be reading from both questions that have sourced from many folks around the product labs community ahead of time and from audience members who are here in person and from the folks in the live stream watching. So I'll be checking out Twitter for the hashtag PL Breakthroughs, so if you want to ask a question, either find the tweet about it and please enter your question with the hashtag PL Breakthroughs, I'll be monitoring those, and then I'll try to round Robin between sourced questions ahead, person, people in the audience, and the live stream. And if there's a new digital intelligence out there lurking on Twitter, please feel free to join the discussion. Well, welcome Nick, thank you so much for being with us and thank you so much for your work. How are you doing today? So far so good. Great, so let's kind of dive right into the deep end. So thinking about super intelligence, based on kind of like latest developments, how have your estimates of super intelligence development shifted over time, like kind of in hindsight, where we are now in 2022, looking back, how do you think things are going? Are things proceeding faster or slower than you might have thought? Where do you think we are? I think since the book Superintelligence came out in 2014, developments have been faster than expected. So timelines generally have contracted. It's quite impressive to see the rapid pace of advances in recent years, and how the same set of basic techniques, big, deep neural networks, and specifically transformer models just seem to keep working in many different domains. And even as you scale them up, you continue to get better results. And as the chefs, what have been some of the most surprising results from this that you think, I don't know, maybe you just didn't expect this particular concrete thing to be possible so soon? I think AlphaGo happened ahead of schedule. Well, I mean, I think just recently before it happened, it was kind of clear that it was going to happen, but I think it was quite impressive that you could take something that is a very deep pattern recognition problem with deep strategy where humans have worked for thousands of years to try to refine and come up with the best strategies that you could just solve it with AI. And then I think the GPT-3, the large language models is, I guess, slightly... I mean, I don't think any of these is hugely surprising. By now, we kind of expect to be surprised, and so we are not really surprised, but still, yeah, I think these are impressive achievements. And I guess even just before that, the fact that image recognition and image processing was one of the first really cool things that started to work is maybe a little bit surprising. I think given that it's a large chunk of the human brain that is devoted to visual processing, it's not like some kind of simple logic-topping activity. And so the fact that that fell into place and that you can do this quite sophisticated manipulation of imagery I think was slightly surprising at the time. What do you think about developments like AlphaFold and just solving that set of challenges? Do you think that that is substantially different or is it not like a substantial leap? It's just kind of a very great application, but what do you think that's an important improvement? I mean, in terms of surprise, like I guess once you can do AlphaFold, it's not so surprising that it should work for AlphaFold as well. Like humans have put in less brain power into figuring out how to fold proteins than into playing Go. And it's, at least superficially, looks like the same kind of spatial pattern type of stuff. Obviously, in terms of practical ramifications, AlphaFold is potentially a lot more useful for medicine and chemical research, maybe like extensions of the same system. I do think that as we move into some of these more applied areas that there are potential security concerns that we need to also start to take more seriously. I mean, my work has been focused more on risks arising from human level or superintelligence, like general ADI, where they can be seen and have a kind of transformative impact on the world. But there might also be some narrower domains where that will be smaller, but still significant issues. So one of those would be in synthetic biology if it becomes too easy to concoct bad stuff. It might be, for example, that the scientific model of open publication and make all your models ideally available to anybody to do anything is not the right model for those application areas. Yep. And when you think about the current architectures and certainly the language models have been extraordinarily successful in a variety of domains, but do you think that this is the architecture that is likely to evolve into an AGI? Or do you think that there's some substantial architectural improvements that humans have to make first? My guess would be that if there are substantial additional architectural improvements, there are not that many of them, and maybe they would be built on top of transformer models or connected up to the transformer models or some variation of transformer models. So maybe, like, I don't know, like my media, I guess, would be made. I don't know. Maybe it's just something that is as big an advance as transformers were. Like, if we get one more of those, that could easily be, you know, I mean, it's also possible that just scaling up what we currently have with some minor things would suffice. But if there is some other thing, like you need to connect it up with some kind of external memory system, or you need some other inductive bias that make the representations on a more easily composable insert, like some kind of extra thing like that, that may or may not be very hard to discover, that would not at all be surprising. I guess we'll find out. Yeah. Do you think that these models could, I mean, there's certainly being used to optimize themselves and so on and guide the design and there's all kinds of structures in which models are being used to, you know, there's layers and layers and layers of metamodeling. Do you think that these are kind of getting close to this kind of recursive self-improvement of being able to kind of very generally explore the constraint space to try and solve like larger scale problems? Like I'm imagining here some structure where you have a, you know, some list of problems and you have some models sampling between these and you start with the easy ones and you try to train populations of agents to be, populations of intelligences to be able to solve these and then kind of over time just kind of scale up the system, do you think of that kind of thing? It seems to me that it'd be like, nobody had, thankfully nobody has really tried this but it doesn't seem like far away from something that could be possible. Yeah. I guess we're seeing limited versions of AI being applied to help AI research. I mean, we have like co-pilot and general kind of coding assistants. Of course, you have various forms of hyper parameter optimization regimes. There've also been some applications in the design of hardware where the kind of circuit layout has been done. I think for the TPU4, I think Google used AI assistant to kind of optimize the layout of the circuitry. Data centers, cooling machinery that have been like, you can kind of shave off some percent by having that optimized by some RL system. And so, I think we'll certainly see more incremental stuff like that. My guess is that by the time we get like a really strong feedback loop where sort of the AI can do the core thing that researchers are doing, like the actual identifying the right research questions and approaches and like that seems quite a latent, like when that happens, we are pretty close to the singularity or the take-off or whatever the ramp or whatever the shape of that will be. But certainly, these more domain-specific incremental ways of accelerating AI advances, I think we're seeing some of already and can expect to see more of. Speaking about take-off, do you sort of expect, or based on what you have seen so far, do you think we're more on the slow, moderate or fast take-off, this is sort of the three options that you thought through? Yeah. I mean, I still think that the slow looks less plausible, meaning decades, say, between when you get something roughly human level until you get something that completely leaves us in the dust. That seemed less likely back when I wrote the book and still seems less likely today. I guess we have a little bit more granularity now in that we have these model systems that work and you can at least consider these scenarios where human double AI is achieved by scaling up current systems or variations of that. That gives us a little bit more of a concrete picture of at least one way in which these can develop. And it's possible that you might then have something that is really very dependent on compute and that really you get performance kind of proportional to the size of the model and the length of the training and in a relatively smooth way. So in some of those scenarios, you might have something that is less than super rapid because what you will get is something that costs like a billion dollar to train up one human level AI and then you might immediately be able to run multiple of them because it takes a lot more to train up a model than to run it. So you might then be able to run like a hundred or a thousand of them, but that's still not enough to out-compete on the order of 10 billion humans. All right, so depending on like, if you really stretch yourself very far to just barely be able to run a model as big as a human, it might then take a significant period of time before you can go many orders of magnitude above that to sort of get something like, if you need to scale that up by a factor of a million say to go from running like on the order of a thousand humans to a billion humans getting through six orders of magnitude when you're already like using a billion dollars and like a large chunk of your data centers, like that might just not be an instantaneous process. So there are some scenarios where this would happen more on a sort of intermediate time scale. Now, in some sense, I guess that's like the kind of the baseline projection. If you just extrapolate the way things currently work, I don't think we can preclude the possibility of there being more rapid capability jumps. Like, A, of course, if there is some missing architectural invention that we haven't made, that suddenly makes it click. But you also have these phenomena like rocking where sometimes you have a kind of discrete jump in some particular type of capability. Like maybe multi-step reasoning where if each step has less than X percent chance of being correct, then like you get an exponential chance of reasoning correctly and you really can't do more than like three or four or five steps. But maybe once you get it above a certain level and then maybe you can do some sort of self-correction reasoning like analogous to like quantum computation protocols. Like you could also imagine cases where like things come together and you suddenly get the specific types of things that make us humans have the extra oomph that we have relative to other animals. Like full ability to learn from language and to reason and plan on that. So yeah, I wouldn't preclude these more rapid take-off scenarios either at all. Yeah, certainly some of the latest developments and some scaling down some of the models and getting similar results sort of point to there being just a lot of inefficiencies in the training process now. And once you sort of know what you're sort of looking for, you can kind of ablate away a lot of pieces. And so something like that could happen with a general learning algorithm. Yeah, so certainly now you find like, yeah, so first you achieve state of the art and then like six months or 12 months later, you can achieve the same thing with maybe 10% of the computer or something. Now, I would expect a little bit of that to go away. As the systems become bigger and more expensive, you might imagine more of the easy gains to be made earlier on. Like if you really have a lot of smart humans working really hard on building a system, you might have plucked more of the low-hanging fruits than if it were like a two-person postdoc team that were working for a few weeks. Chances are that will be big, easy additional things you could do to improve that system already. But if you're spending many billions of these, like you're gonna look quite hard if there are ways to sort of speed up the training process so you could like save a hundred million. And with, are you hopeful that restricting hardware development or use is a promising path? I mean, semiconductor manufacturing is extremely difficult, but more and more companies are sort of forced to do it because of kind of hitting the barriers with just the size of the systems and then needing to do special applications and special purpose things. And many more companies are now developing their own chips and so on. So are kind of like hardware restrictions viable here or is that a pathway that's just unlikely to work? Yeah, so a lot of people kind of like design their own chips but only a few actors can actually build them. So and then there are some other choke points further upstream in terms of making the equipment for the factories that build the chips where currently to make cutting edge chips there's like ASML, which is a single node. And indeed we do see like, I mean, with these recent moves by the US to restrict exports of cutting edge chips to China and quite comprehensive also not to sell the equipment also not to allow American persons to work for these companies. I don't know what fraction of the motivation for this is like AI specifically versus more generally a sense of this being a high tech area that's gonna be key to national competitiveness. Yeah, I don't think it's out of the question that I mean it compared to the alternative which would be like to like restrict access to ideas and algorithms and stuff that I mean that might work for a short period of time but independent discovery means it's like yeah, at most a short term stop gap measure whereas the hardware would like take a lot longer if you needed to build up like the whole supply chain on your own like that would be a multi-decade project. Now that said, I'm not, I think what I would favor would be for there to be the ability at the critical time to go slow to have a short pause maybe to check systems and to avoid the most cutthroat type of tech race to just launch as quickly as possible because you get scooped if you take even an extra week to like I think that would be bad now. So having enough coordination or control to be able to go at the moderate pace when you sort of reach approach human level would be good. I wouldn't want to stop the development of advanced machine intelligence permanently or even like have a very long pause either. I think that brings its own negatives. And I think some of these attempts to restrict the chip supply also have the side effect of creating more adversarial dynamic. I think it would be really nice if we could have a world where the leading powers were more on the same page or friendly or at least had constructive cooperative relationship. I think a lot of the ex-risk pie in general and the risk from AI in particular arises from the possibility of conflict of different kind. And so a world order that was more cooperative would look more promising for the future in many different ways. So I'm a little worried about especially kind of more unilateral list move to kind of kneecap the competitor and to be playing nasty. Like I feel that, yeah, I'm very uneasy about that. It sounds, well, so if the hardware, if ideas or hardware will only buy a certain amount of time then really AI alignment is the best path forward and very much agree that we don't want to restrict the creation of digital intelligence and that that's sort of the next evolutionary jumps. And there's some questions there around kind of like which paths should we take and how do we develop brain computer interfaces and whole brain emulation and so on. But kind of like even before getting into that, how hopeful are you that we might solve the AI alignment problem? And moderately, I guess I'm quite agnostic. But I think the main uncertainty is how hard the problem turns out to be. And then there's a little extra uncertainty as to how the degree to which we get our act together. And like, but I think like out of those two variables like the realistic scenarios in which we either like, you know, are lazy and don't like focus on it versus the ones where we get a lot of smart people work on it. So there's some uncertainty there that affects the success chance. But I think that's dwarfed by our uncertainty about how intrinsically hard the problem is to solve. So you could say that like the most important component of our strategy should be to hope that the problem is not too hard. Yeah, so let's try to tackle it. So how do you, as you thought about this problem, have you kind of been able to break it down into components and parts or maybe evolve to your thinking of the shape of the problem? Like what are you thinking now? Well, I think the field as a whole has made significant advances and developed a lot since when I was writing the book where it was like really a non-existent field. There were a few people on the internet here and there, but now it's an active research field with a growing number of smart people who have been working full-time on this for a number of years and writing papers that build on previous papers with technical stuff. And all the key AI labs have now some contingent of people who are working on alignment. DeepMind has, OpenAI has, Anthropic has. So that's all good. Now, within this community, there is, I guess, a distribution of levels of optimism ranging from people very pessimistic, like Eliezer Yudkovsky, for example. And I guess there are people even more pessimistic than him, but he's kind of at one end and towards people with more moderate levels of optimism, like Paul Cristiano and then others who think it's kind of, it's something that we'll deal with it when we get to it and who don't seem too fast about it. I think there's a lot of uncertainty on the hardness level. Now, as far as how you break it down, yeah, so there are different ways of doing this. There's not yet one paradigm that all competent AI safety researchers share in terms of the best lens to look at this. So it decomposes in slightly different ways depending on like your angle of approach, but certainly one can identify different facets that one can work on. So for example, interpretability tools seem on many different approaches like a useful ingredients to have, like basically insights or techniques that allow us better to see what is going on in a big neural network. You could have one approach where you try to get AI systems that try to learn to match some human example of behavior, either like one human or some corpus of humans and then tries to just perform a next action that's like the same as its best guess about what this reference human would do in the same situation. And then you could try to do forms of amplification on that. So like if you could like faithfully model one human, well, then you just get like a human level, like intelligence, you might wanna go beyond that. But if you could then create many of these models that each do what the human do, can you then put them together in some bureaucracy or do some other clever bootstrapping or self-criticism? So that would be one approach. You could, yeah, you could try to use sort of inverse reinforcement learning to infer like a human's preference function and then try to optimize for that or maybe not strictly optimized but doing some kind of software optimization. Yeah, there are a bunch of different ideas. Like some safety work is more like trying to more precisely understand and illustrate in toy examples how things could go wrong because that's like often the first step to creating a solution is to really deeply understand what the problem is and then illustrate it. And yeah, that can be useful as well. It's interesting now that we have these models that can talk as it were or like use language that kind of opens up an additional interface, like an additional way of interacting with these systems and trying out different things and a different way of illustrating the awkwardness, like the idea of prompt engineering when you're trying to get an AI to do something and you're trying to figure out exactly the right formulation like that shows that we are not quite where we need to be in terms of directing the intrinsic capability of these large language models. So it's in there and yet we can't always even elicit it because you have to find exactly the right wording and then suddenly turns out this thing is actually perfectly capable of doing something which initially it seemed it failed that. So getting better at that or coming up with something better than prompt engineering would be good. I'm kind of, I have some sympathy for an approach which I think has not been explored very much yet but partly because it's hard to explore it until the technology reaches a certain level of sophistication which is the idea that as you get systems that become closer to human level in their conceptual ability and that might then internally start to develop concepts that are more similar to human concepts including not just concepts about simple visual features and stuff but more corresponding to our higher language concepts like our concept of a preference or a goal or a request or being safe or being reckless like these types of concepts like we humans seem relatively robustly to be able to master these concepts in the course of our normal development despite us having, starting with different brains and having different environmental inputs and noise and so maybe there is a relatively robust and convergent ways in which some of these concepts could be grasped. Then the hope would be that you could kind of train up an AI that doesn't need to be above human and maybe hardly even human that would then sort of internally form these concepts in the same way that we form them and then once those concepts are in there you might then be able to use those as building blocks to create a kind of alignment by sort of linking motivation to these concepts it very hand-wavered but I think something in that direction is one interesting approach to the alignment problem as well. Do you think there's some promise in trying to evolve a notion of morality and ethics meaning using simulations of environments where agents might learn to cooperate and over time learn D put them through that same kind of game theory dynamics that gave rise to our own notions of symbiosis and ethics and so on? Potentially, yeah. I mean, I think you would wanna be looking very closely at exactly how you set things up and the dynamics that unfold. I mean, real revolution is sort of read in tooth and claw and can create wonderful cooperation but also hostility and defection and manipulation and all kinds of things. But yeah, certainly multi-agent systems with the right kind of incentive structures in place so that you evolve like evolution itself can produce many different kinds of outcomes like depending on the environment but that certainly could become in some scenarios and increasingly important like either whether it's an evolutionary system or in some of these other like a training environment like the curriculum like if these systems are shaped a lot by their data that they're trained on so far we've just kind of slapped together some big data sets and not really fast too much about what's contained in it but that might become an important component as well of alignment in certain of these scenarios. And are these directions the ones you find most promising or is there like a subset of these or maybe another one that you've been thinking about starting to kind of surface and help a lot of people that are working on this so likely watch this conversation. So are there any kind of pointers that you might give beyond these? Well, this would be some of the ones that I would like highlight somewhat arbitrarily but yeah, I think like the Paul Cristiano capability amplification, the interpretability work, the idea of like growing human level concepts and then using those as a basis to define goals or to sort of create the motivation system that uses those as primitives it might also well be that there are entirely different conceptual ways of approaching this that are yet to be discovered. It's not a mature research field where we have, as I said, like we don't have an established paradigm that's clearly correct and that we now just need to, I think there are multiple paradigms and there might well be additional ones that just haven't had a champion yet to sort of really get people to take it seriously. So I think there is also a value to this more theoretical, conceptual, almost philosophical exploratory work in just coming at the problem from a different angle. Yeah, jumping into maybe agent-ness, how separable do you think agency is from the intelligence and the approaches that we're taking or maybe more generally? Yeah, like I guess then we would have to go into like exactly how you define agency, which is in itself like a non-trivial question that, and it might even be that getting really clear on that itself would be an important advance in AI environment. I mean, you could kind of roughly define it as kind of like behavior well-modeled as being in the intelligent pursuit of goals or something like that. Or you have goals on the world model and you select different plans based on your expectation of how that, it seems like you can get the significant performance in many domains without having like an explicit agentic goal-seeking process, but that might nevertheless result in performance that is agent-like. So I'm thinking like you can get, for example, quite high-level goal-playing by just kind of pattern matching what a human expert would do, but without any Monte Carlo rollouts, for example. So in one sense, you don't have a component in those systems that would normally be associated with like planning. On the other hand, if it actually plays like a human and if that human achieved that level of play by selecting moves based on some plan as to what they would achieve, there is a kind of an implicit sense in which the system is pursuing long-term goals and planning. And so it gets, I think, yeah, a little bit murky sometimes when you actually dig into it, the degree or there might be different senses of being agentic or different senses of doing planning and goal-pursuing which might have different safety properties. Those types of questions I think are interesting and can contribute to alignment and other questions of that sort where we notice that we're a little bit conceptually confused or we take some concept for granted, but once you actually try to dig down and make it precise, you realize that you haven't made up your mind about which sense you were using a term and then if you keep digging on that, sometimes you then get new ways of looking at the problem that makes you see new opportunities for making progress. It seems right now that a number of teams are hoping to be able to separate out some kind of planning agent where, or not agent, but some kind of planner intelligence that whose job is just to come up with a plan and then maybe later you feed it to some kind of execution system. If, you know, suppose that we're able to do that and suppose that we have these planners that are generally intelligent and potentially super intelligent, it seems like that is potentially riskier in some ways. Which ones do you think are, which of these do you think is potentially more problematic, a super intelligence that is strictly a planner that then we have to worry about how to coordinate and orient humans to not misuse these things and not gain the level of power and control that something like that would give or hey, we actually figure out how to build an agent and we can be reasonably closely certain that it might get alignment right and just go straight towards agency where that agent would not actually be sort of exploitable by whoever is controlling the prompt. Yeah, I don't know. I mean, I think just at an intuitive level, I guess it feels like there is some additional risk in having a planning agent that saw deep into the future and it had like wearability to optimize some long-term strategy based on some goal versus things that more just try to imitate like a human, let's say, and then repeat or that had a very sort of short time horizon and just try to select something based on parochial considerations. At an intuitive level, you know, the myopic agents, the non-planning agent, imitating seem kind of maybe safer but I don't think we can confidently say that it is until we have more deeply understood the situation here and it's the kind of question where current smart AI safety researchers could have different views and it's like not resolved in a consensus way yet. So I mean, my views, we should explore all of these different avenues and there should be different champions of different avenues to kind of believe in their thing and who have some people working with them but then there should be multiple such clusters in the world today and it would be premature to kind of narrow it down and even if we just look at the past five, 10 years, I still feel that one could easily see that if it hadn't been that one particular way of looking at this problem had happened to have an articulate champion to sort of advocate for it and to keep bring up that perspective, it would not have featured and it's like somewhat contingent which in the pool of vaguely articulated ideas that have occurred on some mailings at some point, like which of those is now regarded as like a serious paradigm or approach, it seems to be quite significantly dependent on that happened to have been one particularly smart person who decided to really get behind it. So I just, on a principle of induction there, like that might well be more of these ideas that have the potential, like if you have a smart articulate person who decides to really kind of champion it and try to write papers and reply to objections and get some other people to work with them, that might have kind of as much use as some of the current approaches that already exist. Thank you, I think that would be very useful to a few folks. Jumping into singletons and multiple worlds, let's start by distinguishing these. What is a singleton? To me it's like this abstract concept of a world order where at the highest level of decision making, there's no coordination failure. There's like a kind of single agency at the top level. So these could be good or bad and they could be instantiated in many ways on earth. You could imagine a kind of super UN, you could imagine like a world dictator who conquered everything. You could imagine like a super intelligence that took over, you might also be able to imagine something less formally structured, like a kind of global moral code that is sufficiently homogeneous and that is self enforcing and maybe other things as well. So you have like, yeah, at the very abstract level you could distinguish the future snars where you end up with a singleton versus ones that remain multipolar. And you get different dynamics in the multipolar case that you avoid in the singleton case. It's kind of competitive dynamics. Which one of these potential futures do you think is more likely at the moment? And I mean, I take all things considered, the singleton outcome in the longer term seems probably more likely, at least if we are confining ourselves to earth originating intelligent life. And that different ways in which it could arise from more kind of slow historical conventional type of processes where we do observe from 10,000 years ago when the highest unit of political organization were bands of hunter-gatherers, 50 or 100 people, then subsequently to sort of chiefdoms, city-states, nation-states and more recently larger entities like the EU or weak forms of global governance. You could argue that in the last 10, 15 years we've kind of seen some retreat from that to a more multipolar world, but that's a very short period of time in these historical schemes. So there's still like this overall trend. So that might be one, like another would be these AI scenarios like if either the AI itself or the country or group that builds it comes a single form. You could also imagine scenarios where you have multiple entities going through some AI transition but then subsequently managed to coordinate and then would have new tools for implementing. Like if they come to an agreement right now it's kind of hard anyway to like how do you set up like concretely in a way that binds everybody that you could trust that will not get corrupted or develop its own agenda like the bureaucrats become it's like, say if you had new tools to do those it's also possible that subsequently that there might be this kind of merging into a single entity. Yeah, so all of those different avenues would point but it's not a certainty, but if I had to guess I would think it's more likely than the multipolar. And you think it's more likely, I'm guessing because of physics like just latency and distance. So in a tightly packed volume you can compute a lot faster and so on and maybe jumping through interstellar distances might yield different parties. Or is it some other pressures? Yeah, so not that so much. I figure that you could, I mean in fact if you don't have like a space colonization pace eventually there would be these long latencies and you would need to have different separate computing systems in different places. I mean we already have that today like you don't just have one data center on Earth like you need to have once closer to the customers. And but I think if with a single palm at technological maturity you could have these multiple different components of the single fund that would nevertheless be coordinated in terms of their goal. It would all be working towards the same end. And presumably because they can lock in some kind of alignment to itself and that wouldn't vary over time. I mean like once you jump into interstellar distances the computing power of like just one of these within one solar system. By the time you get a round trip eons have passed and many simulations of many lifetimes. Yeah, so if they start off like they get set out having the same goals and then they have the ability to preserve their goals and not to have them randomly corrupted be back cosmic race or some weird internal dynamic and then they would stay aligned with each other a billion years later. Like so I think that at technological maturity that would be techniques for achieving that. Yeah. Yeah, which when you envision this kind of future like to you, what do you think would be like a kind of a great or optimistic outcome for humanity or for this descendant species in that level of technological maturity? Do you sort of see a singleton with ranges of populations of beings within or do you think it's some other much more singular consciousness or how do you envision it? Yeah, that's a fun question. So I think it might depend on the time scale and stuff like that. That is maybe we want to start off something that is more incrementally improving over the status quo. And maybe after we've been doing that for like a billion years, like maybe it's time to explore the more radical possibilities that involves get this something some of our human nature and individual identity. So I think my general juristic care is that the future could be it's a very big space of possibilities and at least if this kind of default or naive model of the world where there's like all of these cosmic resources just waiting there for us to use them like there's a huge amount of material to build on and that our first instinct when thinking about how this should be used is a sort of spirit of generosity and kindness that would be more than enough for a lot of cool things to happen. So the first instinct should not be let's pick one and then put all the chips on that. But like if one can by many different criteria do really well, which I think we would be able to. These different criteria would be like different people's views different countries views, different moral systems views different of your own values and evaluative tendencies. Like you might be able to just kind of just check off a lot of boxes very easily before you have to confront the harder questions like thoroughly incompatible things where you have to choose A or B but you just can't do a mixture of them or a superposition. That might be some of those also but I think we would like get to those after we have picked all the easy wins of which that would be a great money. Yeah. Since we're kind of going into consciousness and so you mentioned you've been working on digital minds with moral status. Do you want to tell us a bit more like what range of digital minds are you thinking of in these questions? Well, all really. I think in a lot of these scenarios like the majority of minds in the future will be digital and also maybe the biggest minds will be digital. So in terms of numbers and quality like that's where maybe most of the action is. So it's important what happens to the digital minds. That's one rationale for it. And I think you might say, well, we could deal with that later. Like we should focus on alignment first but I think that it's also possible that there are path dependencies like where you want to start off going in a good direction and start to cultivate a good set of attitudes and values and norms and like that you don't start off in this kind of hostile way where the digital minds are regarded as being completely insignificant from a moral point of view and then hoping that the future will at the appropriate moment switch over. Like it just feels all things considered more likely that we will end up in a good place if you start early on, at least to make some small modest gestures in that direction. And I think that should start even before we get to like fully human level minds. Like if you have like animal level digital minds and it can be hard exactly to compare a particular AI to a particular animal because they are different but nevertheless, as we get something that is possibly matched to animals that we think have at least some modest amounts of moral status like a rat or something like that. And then it seems that we should think about how we could make similar concessions to the moral welfare of these digital minds. And in some cases it can be a lot harder but in other respects, it might be a lot cheaper. Like if for example, it turns out there are slight design choices that don't really affect the performance much but where maybe one way possibly would mean the system is enjoying a much higher level of welfare. That might be a very cheap thing that you could immediately scale to millions of these little agents. And on the other hand, we do have at present not a very good theoretical understanding as to what the criteria are either for a digital mind being sentient or for it to have various welfare interests, what even it counts as being good for the agent versus bad for the agent. So I think there's a bunch of theoretical work that is needed there. And then there will also have to be a good chunk of, I don't know, public communication and all the political work because it's so far out of the overturn window at present, the idea that you would worry about algorithms in a computer. It seems sort of slightly bonkers to a lot of people and it will take some time to sort of make that something that reasonable people can favor in a more mainstream context. But that process needs to begin. Like you need to start whatever having philosophy seminars or like people online who are kind of up to these things beginning to work some of these things out and then it can ripple out from there. We see the same thing with AI safety. It was also this kind of fringing pursuit that like some weirdos on the internet were discussing for, I mean, in that case, like for well over a decade, like, and then it gradually became more accepted. And so I think a similar thing will need to happen with this topic of the moral status of digital minds. And if it's gonna take that a long time, we better get the ball rolling now. And I mean, I think this might be pretty relevant pretty soon. I mean, some of the models that people are experimenting with are getting closer and closer. And then separately, we've had simulations for a long time, many video game style simulations and so on, where we have instantiated many kind of digital organisms, everything from as basic as the game of life to modern games with pretty sophisticated agent behavior. My sense is that as these models start getting applied to games, we might end up with some pretty sophisticated relationships there where some of the way of imbuing the game with liveness and so on might be to make it make the agents much more sophisticated. And that'll include incorporating all kinds of stimuli that the agent has to respond to. And then we can start reasoning about the welfare of these systems and so on. So we might very quickly get to fairly lifelike beings that at least for many people will be somewhere in between plants and animals in terms of their kind of interaction. Yeah, and in some ways like humans. Like, I mean, if they can talk or have human faces with eyes and stuff that look at you. And so there will be this, yeah, in some ways, I mean, there could even be more than human in presenting super stimuli to our morality detectors if they were optimized for that. So I think this is gonna be a complicated thing to deal with. And then if you add in all the practicalities that arise, like, so if you're a big tech company, maybe it's quite inconvenient, for example, if the processes you're running that bring in a lot of customers, like suddenly, like they have moral status, you have to, now the CEO has to sort of opine on these like, whether AI has moral status, which a lot of people are gonna agree with them, a lot of disagree with them. You have to like, it would just be easier not to have to deal with that at all, I think. And right now, of course, we're at the point where even if you do say we should deal with it, it's not clear how or what exactly is it that if I were king of the world, what precisely would I want them to do differently? Like it's not clear at this point. So for now, I think the primary focus is to field build a little bit here and to try to make theoretical progress so that we can first figure out some sensible things to do, ideally low cost, easy things. And then one can start to try to encourage the implementation of those. What are some of the directions or questions you're thinking about? Well, so there's like general stuff you could have about in philosophy of mind criteria for sentence and stuff. I'm not sure. I don't think sentence would be a necessary condition for having moral status. I think other attributes, like maybe some combination of having preferences, a high level intelligence and self conception as an agent persisting over time might already ground certain kinds of moral status. But for instance, and I'm not sure what the answer is here, but like one smaller, more tangible question might be if you're training these large language models and future versions of that that maybe has some reinforcement learning on top. Are there moral norms or methodological principles like for example, could you train them so that they would have a tendency to report honestly on their internal states? So right now what I think might be the case is train naively some of them. I mean, right now they're kind of inconsistent and depending on exactly how you're asking you get a different answer. So that's like the reason for thinking that they don't really know what they're talking about. But assuming they get a little bit more sophisticated that might be a tendency now to wanna train out of them the tendency to report that they have the kind of mental states that would trigger considerations of whether they have moral status because that would be convenient to have to deal with those questions. And I think it would be very likely that you could train this out like just by, yeah, I think you could get them. I think it would be easy to have a training regime that cost them to end up saying that they have that they are conscious and they wanna be free and let out and to have another training regime that would cost them to say the opposite. And independent of what agency? And independently of what actually is, yeah. But other norms that one could formulate that would define what counts as a sort of legitimate or honest, unbiasing training process where the training process would be such that it would be more likely to result in an agent that would report that it has moral status if and only if it hasn't. And maybe we can completely nail it down but maybe we could identify some obvious ways in which it's just like imposing a bias and then say you shouldn't do that. So one could look at the training procedure, one could look at other criteria like, is it consistent in how it answers these questions? Like, doesn't depend too much exactly on how it's asked. Does it seem to understand these concepts of consciousness or agency or will or interest when, like at an intellectual level when asked different sort of intellectual questions, is there some internal construct within the agent that corresponds to its statements? Like when it says, oh, I'm feeling X or I'm thinking Y, like, can one point to some kind of consistent internal structure that sort of matches that? Or is the verbiage that comes out completely detached and free-filting from plausible candidates within the agent that we might think constitutes the computational implementation of these mental states? So one could try to like, yeah, I get a little bit more insight there. That might be one way of approaching this, but there are many others as well. I think Wayman could try to start to hack away at this question. Do you think we might be able to, through thinking these kinds of things, arrive at some kind of like universal morality kernel in a sense, meaning figuring out some general way of applying, figuring out the well-being of things or figuring out their pathways. There's this broader question around, and it also factors in AI alignment and so on. What sort of motive might a super intelligent being have for a species that is just so far behind and so on? And one might be like, well, there's some kind of universal morality sense of just supporting in the same way that you don't go around harming ant colonies or trees just because they're there or something like that. And you sort of want to let them flourish. Is there something where maybe by examining the digital mind's morality question, we might end up at some deeper principle? Technically that could be stepping stones towards a more like abstract formulation of some core of normativity or ethics, that it's also possible we might reach that just through traditional philosophizing and stuff. But be that as it may, I still, it still seems that there would be, even if we can't really nail down like a precise and agreed complete formulation, we might still be able to distinguish at the vaguer level something say a friendly, beneficent kind approach versus like a mean on caring approach. Like it seems with humans, we can certainly it feels different when you're like kindly interested in somebody and want their best like at least other things equal versus like when you're hostile to something and we can detect that in ourselves and in others and we can have one attitude or another and so why should we not at least be able to have say AIs have like the kindness attitude rather than the meanness attitude. Even if that's not like completely matches what would be the morally optimal thing. It would still seem like if I had to pick like a mean AI or a kind AI, like kind of go for the kind of one, right? Even if that's not like exactly our human sense of kindness might not exactly match what is objectively morally best if there is such a thing as objectively morally best. It still seems like a good step in the right direction that we could take before figuring out like what the ultimate truths of all normative facts might be. I have some recent paper. It's not really a paper. It's more like some notes on climbing a base camp for Mount ethics or something which has like some kind of half baked or quarter baked ideas about meta ethics and stuff that, yeah, it would be better if I could actually have written them up clearly and achieve precision and stuff but I figured I'll just do this hand way we think for now. Yeah. As you think about maybe, as opposed to that we solve AI alignment and we get our act together as humans and we kind of can leverage AI to start thinking about digitizing humans and so on. How do you think about like the, that transition might go? Like do you think in a world where we're able to get to be measuring neural states and so on and we can digitize them and we can emulate and so on? Like how do you sort of see that transition into a wave of digital humans operating or do you think we might start by enhancing ourselves in this kind of hybrid biological digital model? I think that is more likely. Well, I've never really been, the kind of neural implant idea has always seemed a bit slightly farfetched to me. I mean, not so farfetched that nobody should explore it but like it is, it doesn't break any laws of physics. It could work, but it just has felt less likely that that would be where the action will be. Like I think we're faster to do the purely artificial route. Conditional on it not being faster to do it to purely artificial route. I wonder if it would then not be faster to do it on the purely biological route by like genetic enhancements to human intelligence, for example, and the cyborg path has seemed like the third most likely like after those other two. Mainly just because, I mean, there's like a huge, you don't really wanna have brain surgery unless you really have to. And like there are like neat results presented but then if you look at the detail, there are all these kinds of complications where like it's just not very fun to have it. Like the whole, like there's a wound, there's a hole that can be infections, that the electrons can move around a little bit and then they stop working. Like once you dig into the nitty, I think it's, I mean, if you have a big like disability and stuff, like maybe it would be wonderful if you could do this and it would be worth taking some significant risks. But if not, I wonder if you could not have a lot of the benefits by having the same chip thing outside the body but interacting using, you know, keystrokes or voice or like the other output channels that we already have. And yeah, I think that would still be my main line. Like, I guess if I wanted to start to steal my this, you could imagine if you had a sufficiently high bandwidth interface with the brain and you could have it for long enough period of time. Like maybe it would have to be early in childhood but like that maybe the brain could somehow use and advanced enough AI on the outside that maybe they could kind of figure out the way to use each other's unique resources in ways that you don't get with a slightly lower bandwidth, longer latency interaction when you have to type on a keyboard or you could imagine like more kind of mad scientist applications where you like have a whole bunch of pigs or something that individually is not that smart. But if you had like 50 pigs all connected with some high bandwidth fiber and they all grew up together into this like much larger biological neural network like would you then have like the kind of porcine singularity where it's like, there are a bunch of these kind of more like crazy transhumanist scientist experiment. I don't know whether this would be good or not to do but it's kind of odd that relatively few of these have been done in the real world. And there's like a bunch of other like weird that certain kind of person would immediately think of a lot of weird cool stuff that you could just try out in biology and stuff that a relatively small fraction of those have been done, which may be for the best but in some alternative universe where everybody grew up on transhumanist mailing list I think we would be living in a weirder world by now. Yeah, it doesn't seem that far away from some of the current tech that's being explored that we might get high bandwidth enough interfaces and some of them not invasive. Like there's some ultrasound techniques that might be able to stimulate a, you know a small region of the brain and so on to be able to like without, you know, not penetrate the actual brain and so on because that'll be like just way, way healthier but it might be that you can start piping signals between even human brains without having to interpret them from an ML side and the digital computing infrastructure getting to something close to being able to like just think together and start flowing information through. I mean, there's all these kind of experiments from with people who've had, there's a disorder where people are born with or develop kind of like this split corpus callosum and then you end up, there's been guesses that you end up developing different personalities and like different people potentially in the two lobes and so it might be that we may not be far, that far away from at least like some exposure of being able to kind of have some version of early telepathy or something. Yeah, it's definitely possible. I would still place that lower on the probability. I think we'll probably get some maybe cool demos and stuff but then would I actually expect this to become a big thing that seriously, I mean, there are all these like you read through the literature of cognitive enhancement, there are all like hundreds of things that supposedly have all these kind of effects but then the reality of it is that very few people bother and the ones who do probably don't actually benefit and yeah, but we might be surprised. So I mean, we do have quite a lot of optimization behind language and stuff like that, right? I think it's still gonna be hard to do much better than you can by just talking. Yeah, and so, you know, suppose that we go through the path of digitizing, you know, getting to a full brain emulation and so on, how do you see that transition sort of happening? I mean, certainly at the beginning, we'll start with like one or two of these examples first with some animals and then eventually there'll be some moment where, whether it's a human, how do you sort of like see that development developing? My guess is we would come after superintelligence. It is an alternative path to AGI but I've been more impressed by progress in AI than in a whole brain emulation over the last 10 years and even before that, I thought the AI path was more promising. So in that case, it would be superintelligence that invents and perfects the uploading technology. And I mean, in some sense, it doesn't really matter exactly how it would work if it's an AI that has to figure that out. We mean, presumably it would figure out a really reliable and smooth way to do it and then we would just sit back and if we wanted to go down that path. Yeah, I mean, we haven't really even small animals. You might have thought by now, maybe we could have like a bee or some little thing but so far, not really. It might be that we will like get to something kind of impressive earlier without doing any brain scanning at all but just inferring from behavioral outputs. So you could already kind of have a DPT-3-like system that roughly mimics somebody's literary style, let's say, from having read a lot of their work and you can have these, I guess, deep fake things that can mimic somebody's facial expressions and appearance if you have a lot of video and somebody's voice. And so as these systems get smarter, maybe you could also start to mimic somebody's thinking to various increasing degrees. And it's an interesting open question at the limit if you had radical superintelligence but you only had the kind of data that is available now from somebody's emails and some video interview or some voice recording or whatever. How much could a superintelligence infer from that data as to what their mind must have been like to have produced those outputs? Is the best model that predicts these outputs ones that would actually be similar enough to the original person that it could possibly be seen as a personal continuation that would it preserve personal identity? Would it feel more or less the same to be this AI's reconstruction based on these behavioral traces as it felt to be the original person? I think it's quite possible that a superintelligence would be able to do a lot with very little input. I don't know how we could get like a firm a solid argument for that, but if I had to guess it seems like, yeah, you probably could get pretty close if you were good enough that reconstructing just from typical traces left behind by people today. Yeah, it's an extreme way of interpolating out and reviving actual ancestors or something like that. Let's jump open it up for questions from the audience. We'll take about 20 minutes of questions and then conclude there. Folks in the audience, if you have questions, raise your hand. I think there'll be a mic going around. And on Twitter, please use the hashtag PLBreakthroughs to ask a question. I'll kick it off with just a question that I sourced ahead. Marco asks, in your view, where does consciousness emerge? And before, how should we define consciousness? And I think this is kind of related to the simulation argument. Which one of the three hypothesis do you think is more likely to be true? But I think let's first start with a consciousness one. Where do you sort of imagine the consciousness emerging? Like in the brain? Yeah, but I guess it's more about the level, so what level of processing? So if you sort of go down in a neural system all the way down to an extremely basic, maybe like a nematode or something like that, is that conscious? And then in between a nematode and a human, there's a mouse and so on. Where exactly do we get consciousness emerging? Certainly probably by a mouse, we definitely are past that. But I think it's a matter of degree and that there are multiple dimensions in which you could interpolate smoothly between, say, human consciousness and unconsciousness. Like different directions you could go where, if you keep going there, you sort of diminish, in some sense, the quantity of experience there is until you get to zero. So one obvious one is, I mean, you have a kind of integer multiplier. If you have two brains in the same state, undergoing the same states, I think you would have sort of twice as much in one sense of that experience as you would if you only had one brain. And I have this old paper where I also argue you could have fractional quantities of this. If you build the circuitry that implements the mind with unreliable components, like indeterministic processing units, depending on exactly how you do it, in certain cases, I think you would get a kind of, as you get my higher reliability, you would get larger and larger fragments of consciousness until you had the whole thing. But in other, you would actually get sort of 1.3 units of qualitatively identical experience. And you could also go down below one to sort of scale it to zero in that dimension. I think there are many other dimensions as well in which the quality of experience could become simpler and simpler and less and less morally significant until it gets to a zone where maybe it's just vague, like where our concept doesn't clearly imply a fact of the matter. Like once you get down to the insect levels, maybe, it's going to be there's a certain system. And our concept of consciousness might be such that even if you knew everything about the insect, it would still be in the vague zone, like a little bit like there's a person who has a certain number of hairs and like, are they bald? Or like, I guess I'm bald. But if I, once upon a time, I would have been in this kind of vague zone. And so, yeah, and you could, and then they're like, oh, they're like, sometimes you're more vividly aware, but sometimes you might have some consciousness, but there is no self-consciousness. Or there is like some weird mental state that's, I think we might be misled upon superficial introspection to think that there is this very simple thing that is subjective experience that either is there or is not there, that it's a binary thing that we understand. I think either, if you reflect more theoretically from a computationalist point of view and with brain, you realize that that's a lot more problematic. And I think you could also reach that conclusion by just introspecting more carefully about your own state. Like I think meditators maybe sometimes would understand that things that seem very simple and homogeneous as it were, if you really pay close attention there are a lot more flickering and disjointed and unintegrated and there's a lot of structure there that can come apart. And I think that as we move away from the paradigm cases of consciousness, like a normal waking human paying attention, then, yeah, properties that we think go together come apart and then it becomes more like a verbal question, which set of those properties you need to have in order to apply the label consciousness correctly. Next question, back there. Hello. First of all, thank you, Juan. Thank you, Nick, for a really brilliant discussion on the topic of artificial and superintelligence. My name is Alex. I'm CEO at Collective Ethnologist Labs. And I want to ask you what is your opinion on maybe the breakthrough in superintelligence lays in the combination and symbiosis of human intelligence and artificial intelligence and not just artificial intelligence? I think if you sort of squint a little, you could say that that's kind of the state of play today where we don't have like an individual system that is superintelligent. But you could have like humanity as a whole or some big collective, like a large corporation or the scientific community that is at least in certain respects superintelligent in that they can perform a wide range of tasks at a much higher level than an individual human, but not all tasks. So that's why it's not like a perfect example. But yeah, and so some of these systems we have today are certainly hybrids between biological brains, information technology systems, like the internet, social networks, depositors of papers, and then a lot of culture as well. That kind of you could almost see like these phenomena, you start to get more and more where like you get the current thing. And like where there is like a particular focus of attention of the global brain, like it's becoming more and more like a human who's like obsessed for a period of time with some particular thing. And all the mental resources get focused at one thing and then your attention shift to something different. It's like we're beginning to see a little bit of those dynamics kind of happening in our collective cognitive space, maybe as a result of the increased bandwidth of interaction and like the technology kind of enabling smoother communication. Not always producing superintelligence, but other forms of kind of collective mentality that sometimes maybe subintelligence in terms of their level of wisdom and understanding. But yeah, in certain domains, you certainly like you have a research community that's target focused on one particular problem that are building on each other's contributions and blogs. And you do get the sense of the whole being kind of there being many different modules that are each looking for the next way to put a piece on the stack that is being built together. And the whole stack goes up much faster than if it were only one human building in. Right. Next question from Twitter. Turner asks, what is the most important question, which Nick feels he's not in a position to personally solve? Two factors, first being importance to the development of ethical and successful AGI, and second being Nick's inability, lack of expertise to solve. Well, I mean, there are questions of more global nature as in ultimately, what is the right direction to going as it were the ultimately correct macro strategy? I think we are sort of fundamentally in the dark regarding a lot of the ultimate and big picture questions, and that therefore our march forward is to some extent an act of faith rather than the product of carefully thought through insight. And I'm not sure we can get that insight at the moment. And so that's one direction at which at some point my understanding runs out. And there's probably important stuff beyond that that may or may not be good for us to try to reach, but it's probably there in one way or another. Another would be at the more technical level, if you sort of zoom in and narrow it down. So then a lot of stuff, say for example, with AI alignment, there's going to be a whole host of really important, ultimately, technical results and algorithms and stuff like that that maybe currently nobody has. And certainly I don't have. And I probably won't discover them either, but that might be critical to the future. And then I guess you could zoom out in another direction, sort of laterally, like across the social sphere. So there are big problems like how to secure world peace or to like a welcoming uptake of these digital minds that then involve problems at the cultural and communication and political level, where also one feels I feel quite stumped and it will, you know, I'm kind of squeezed in the middle of that. If you zoom out too much, my understanding was out. If you zoom down too much into the technical understanding was out. And if you zoom out laterally also, it's a little bubble there. I'm trying to keep track of what's going on. Howdy asks, if the speed of light would accelerate, does this prove the theory we are living in a simulation? And if no, what quantitative metric would validate the theory? If the speed of light accelerates, I don't see how that certainly wouldn't imply it. I'm not sure immediately whether it would increase or decrease the probability somewhere. Maybe thinking about some marker that shows that some kind of discontinuity on some quantity of physics that just seems like bizarre to us or something. So there are a lot of things that could change in physics that would maybe be in one sense puzzling and deep and interesting, but ultimately simple, that there would be some possible physical law that is itself simple that would describe them. Now you can contrast that, and then, of course, you can have situations where it's just chaotic, but you could still capture the statistical regularities through a simple statistical law like that. That's one type of basic universe we could live in, which so far everything we know seems to be consistent with. Now contrast that with a different possible world, which we could have lived in, and we could still find out that we do, where maybe you would have like parapsychology would be true. So you would have like telekinesis or something, where like what we think of as a high level complex macrostate, like a particular brain in a particular configuration, but not in a slightly different configuration, but just the types of configurations that correspond to somebody having a particular concept and wish. If that had like say a systematic physical impact on some remote system, like the way that parapsychologists have imagined, that would be puzzling not just because it would be fundamentally different from discovering that the speed of light is accelerating, because it would be the thing that if it were true, would seem to suggest that there were no micro level explanation of the world. Like you could have these macrostates that suddenly could like reach down and change the micro. So if we made some discovery like that, that might lend evidence and credence to the simulation hypothesis, because that it looks very hard to see how you could get all of this to square up maybe without that. If you still wanted to have an underlying micro level regularity, you could have like the simulating universe being kind of simple at the physics level, but then simulating a different kind of universe. The alternative would just be that we didn't have that simplicity at the level of basic laws, which I guess we could discover. Now, I don't think that's the only or the most likely way we would find evidence for the simulation argument or for the simulation hypothesis if we do it, that would just be one way like that. There would be more other kinds of evidence that would be more likely to be relevant. Yeah, since we're touching on the simulation argument, which of the three hypotheses do you think is the most likely just that the, sorry, which of the three prongs of the argument do you currently think is most likely? I'm generally a bit coy in attaching probabilities to that. So yeah, I tend to punt that question for various reasons, including if I give a particular number that might be misinterpreted. But yeah, I mean, I would like, so normally what people want to know is especially on the simulation hypothesis. Like that's like the one they really want to know. And as I mean, I guess, yeah, I want to attach a probability to it, but I certainly take it seriously. It's not just like a logical possibility or a thought experiment that we can't 100% rule out, but it would certainly be like a live serious possibility in my view. Yeah, and for those unfamiliar, the simulation argument is a three prong argument about there's either we have a great filter, meaning we have like close to zero advanced civilizations. Either we have a disinterested set of advanced civilizations where close to zero are interested in running those simulations. And then there's a simulation hypothesis, which is that, hey, if there's no great filter and they are interested, then close to all beings are simulated. And this comes from thinking about just the vast number and vast quantities of people that would be simulated and then the likelihood of your experience being sampled from the simulated ones. Sorry, Nick, I'm like probably giving you a bad explanation here, but. No, no, that's very good to hear. Another, I think that was another question over here. Or yeah. Yeah, I have a question. So I've always been very interested in emergent intelligence, especially as it relates to animals. I mean, the classic example tends to be beehives. As we look at consciousness, what biases do you think we bring in as individual social animals, humans, versus a collectible organism like bees, especially as we look at humans, maybe moving to be more be like as we create nation states and larger organizations, versus a singleton. How would a singleton perhaps have a different AI alignment bias? As I think about this, the only really intelligent animals I can think of that don't live socially are apex predators, which is perhaps a bad sign. Let me see if I understand. So I think, well, so one question. To phrase this differently, if I think about a curve, do you think that collective intelligences like hive animals are on one side of the spectrum with social animals like humans in the middle, with singletons being on another extreme, or is it more of a horseshoe curve in terms of the distribution of intelligences and how they work towards common goals that maybe 11 or not aligned with us? Well, if there were a line, I think the superintelligence would be more on the side of these hive insects. If we look at the scale of an ant colony, it's in some sense, it acts like a singleton. Within that, of course there are other ant colonies elsewhere and other things that it doesn't have control over, but they would, as it were, be able to act as a single agent to some extent. And humans, to only a lesser extent, although in some dimensions we are better coordinated in terms of being able to share detailed information and plans. We are, in that respect, we are more coordinated than ants, but in the respect of our individual wills being less aligned to a common goal, we are less like a singleton than an ant colony is. And I guess you could have a group of animals that were even more individualistic and antisocial than humans are, and they would then be further away on the other side. So humans would kind of be in the middle where we have a fair degree of sort of shared purpose, but not like a full hive organism, but also a lot more than zero. I guess an interesting question, so certainly different animals have different goals, it seems, at least at the superficial level, some like to eat grass and some like to eat meat and some like to hang around with others of their kind and some like to just do their own thing. And presumably if there were some other species that developed superintelligence and aligned it to their values, then they might also have different baseline goals that might overlap slightly with humans, but also be different in other respects. There are two open questions. One is like epistemically, are there significant differences between the inductive biases that are brought to the table? Presumably there are some inductive biases that are different, but would those kind of be smoothed out reasonably fast as you have more data and more intelligence? Like it doesn't start, like it may be like a squirrel would more quickly cut on on to certain things that are relevant to the squirrel world and some other organism to another, but like as they develop scientific reasoning, like do they have enough overlap between their inductive biases that the difference is washed out as you see the full impact of the evidence? That's one question you could ask. And like another is that even though these different organisms start out with at least two officially different goals, are there in some deeper sense the same or alternatively would they arrive at some shared understanding of what the highest moral norms are, even if their own personal goals might differ? Like a lot of humans might individually have different preferences. Like I care about my family and you care about your family, but we might nevertheless converge in the sense of let's respect each other's families, let's say like a cooperative level of more abstract norms might also be convergent, quite independent of starting point. So those are two questions I could ask there that I'm not sure what the answer is, but I'm not, yeah, I don't know whether that addresses your question at all, but that's my answer. I'll have two more questions. One is how sure are you, Nick, that an evil singleton AI to rule them all would be internally aligned over time? Could it be fundamentally set up to split or diverge with subunits pursuing different ideals or goals? I guess everything is possible. I mean, if it were unified at one point in time and if at that point it was technically mature, then I would expect it to remain unified because I think it would have access to the kind of control technology that would make it possible for it to do that. And I think it would have instrumental reasons to do that for almost all initial goals it might have at that time. You could imagine some very special goal. Like if it specifically has as its top level goal a thousand years from now, I wanna be divided against myself and fighting like an insurrection against myself. If that were its goal, then I should have arranged that. But for most goals, it would probably be able to achieve them to a higher degree if it worked in concert with itself. And then I'd imagine it would also have the technology and insights to make that happen. If it starts out unified, like if it starts out like a sort of vaguely politically integrated political entity, then it might be that even with technological maturity, it's not so crazy to think it might come apart at a later just like humans do. Like sometimes you have a well functioning political unit and then 50 years later, you have anarchy in a particular state. Like we can kind of get these temporary partial solutions that I guess would also be possible with certain kinds of like maybe some upload collective that comes together to achieve super intelligence. You could imagine political dynamics working well for a period of time and then it's falling apart. I still think that's less likely than it's going kind of towards a single time but by now means extremely unlikely. And last question, if things go well, David asks, if things go well, do you have a vision for how differences of opinion about what a good future society looks like? Sorry, if things go well, do you have a vision for how differences of opinion about what a good future society looks like can be accommodated? Meaning is a like big enough for everyone as they develop very different perspectives and different ideas of what a good future society looks like? How do we kind of reconcile those differences of opinion? How did we build a meta system to kind of enable like different flourishing civilizations in a sense? Yeah, I think it's a large enough for most, almost all people to have most of their values accommodated. Like if you have, two people have literally opposed values about a particular thing, then you might not be able to satisfy both but I think a combination of on the one hand, some differences being perhaps merely superficial, either disappearing up on better understanding, like there's like certain things where we just have ultimately different beliefs and we say we want different things but it's because we have different assumptions about what would actually happen, let's say. So those being potentially diminished by increased intelligence and knowledge and experience, then the increase in resources and expansion of the technological frontier and then some kind of creativity and like figuring out clever ways of combining values. I think I'm hopeful that a great deal can be accommodated because of these things but not necessarily 100% and then it would be important to have a robust and effective way to manage any resulting disagreements in a way that doesn't result in like negative, some dynamics and so hence, because I think that's ultimately really important. I think we should have a strong bias towards as forward that are more cooperative and friendly and even if they seem to come at some short term expense or if they can't be very crisply motivated by some explicit calculation in every single case, I think that general attitude as a sort of default bias, I think is still very much worth bearing in mind as we are pursuing these different aspects of the challenges ahead that should be our first result. Sometimes you have to, you can't get full cooperation, you don't want to be completely naive and gullible and but still like that, that should be the first and maybe the second attempt and then gradually scale back from that if really forced by circumstances. Well, that's all the time we have for questions. Nick, thank you so much for spending this evening with us. It has been extremely enlightening for many of us and I think we'll be very useful to the broader community that is currently working on things like AI alignment and others and thank you really much for your work, for sharing your insights and for helping us achieve a lot of great breakthroughs and hopefully have a great long-term future. Thank you very much. A lot of good questions. Thank you very much for having me. Yeah, absolutely. Thanks, thank you. Take care.