 All right, so today, I want to follow up on the talking I did yesterday. I will give a little bit of a repeat because a lot of folks weren't here, but I'm going to definitely orient it much more towards, you know, let's nerd out and get serious about it. As always, I start at the end because once I get on a roll, I have no idea how long it's going to take me, and so it's best to let you know the takeaway at the end. The takeaway for the end is we are computing wrong. Our computer architecture, by which I mean CPU and RAM, is a beautiful, simple, stupid, dangerous, obsolete horror. Not even the guy who invented it said it would be gone soon, and he said that in 1950. And we're still doing it. For scalable computing, for secureable computing, number one, we have to give up on this idea of hardware determinism. We have to give up on this idea that the computer hardware guarantees to do the same thing twice. And I'll talk about all these more. hardware determinism is for the fragile, centralization is for the small, and that's the real problem. The whole idea of a CPU, central processing unit, means one guy is doing everything. Well, what's the problem with that idea? The problem is he can only go so fast, and if you get four guys, multi-core. Now they have to coordinate. Did you change that? No, he changed that. That doesn't solve the problem either. Synchronization I'll come back around to, and then cybersecurity, yeah, this is the thing. I mean basically the idea of RAM, random access memory. Every location in memory is the same distance from every other location in memory. That's what random access means. That's the whole point. If you've got a pointer, you can go anywhere. That's insane. It's like I could stick a finger and poke you in the gallbladder. You wouldn't really want to let me do that, and it should be impossible for me to do that unless I have gallbladder, doctor power, or something. But the way we've designed computers, everybody who's running on the same CPU, every instruction from our most secret private medical information to the scum of the internet exists at the exact same square centimeter of SODA. It all goes in the exact same place, and by the grace of God we didn't manage to divert the control at once. It's crazy. It's insane. Fifty years from now, I hope nobody will believe this is how we computed back in the stone age of computing, which is where we are now. All right, that's my basic takeaway. How are we going to get from here to 50 years from now? We're going to get here with this idea of indefinitely scalable architecture, and that's what I want to talk about today. I want to say what the idea is, and the idea, unlike most of my ideas, is actually quite crisp. It's actually quite specific. You might not like it, or you might think it's a silly idea, but hopefully it should be clear what the idea is. Here's the high-order tag. The idea is indefinitely scalable computer architecture can be grown. You can add more and more arbitrary amount of hardware that you can add to it and never run out of anything. As long as you have real estate, money, power, and cooling, you could make the machine bigger by plugging in more basic tiles, basic units, a computer from here to the horizon, a computer from here to Jupiter, if you want, and never run into a 32-bit address space limit, never run into clock distribution tree problems. Okay, that's the idea. Once we do this, once we accept that the only thing that's admissible as a computer architecture is something that has this property of indefinite scalability, certain other things happen. Number one, hardware determinism departs. No matter how much we are willing to spend on a given piece of tile to make it as reliable as we can, there still will be some failure rate, one in a million, one in a billion, one in a billion per year, whatever it is. And that means if we have more than a billion tiles and we run them for more than a year, there will be failures undetected by the hardware delivered to the software. And that's the violation of hardware determinism. Once we have indefinite scalability, we are going to have to deal with undetected hardware errors at the software layer. Are we having fun yet? What that means is we've been pretending, when we teach algorithms and we sort numbers, we pretend that we guarantee that these numbers are going to be correctly sorted. So where on my mother's pile of numbers? That these things are sorted. And how can I do that? I can do that because hardware swore to me that it would do exactly deterministic writing, exactly follow the laws of logic without fail. And if it couldn't do that for any reason, it would crash the entire machine and then we all agree everybody's dead and so there's no sin. The only sin is to create a mistake and not crash. That's the rules we have now. Once we can no longer do that, once we can no longer actually give an ironclad guarantee that we're going to do the right thing, we have to admit there's going to be some chance we're going to get a wrong answer. And that's what we say software becomes best effort. All you can do is you say, well, if everything else is okay, if the hardware is good and there's enough time and the data isn't changing too rapidly, then my answer will settle down to the right answer and we're all good. But I cannot guarantee you, I'm sorry, I cannot guarantee that these numbers are in the correct order. And in fact, nobody else could either, but we just used to pretend when we were children. But now we're grown up, okay? Once we can no longer have hardware be deterministic, once we admit that software and hardware are best effort engineered, that's all we can do. Then what I suggest is the way we're going to build software. Software is going to look like artificial life, meaning software will automatically compete with other software to reproduce itself, not just for more performance, but for robustness. So in case there's some errors in one of them, that's okay, I got cousins that are doing this too. As long as any good bunch of us can get the answer collectively, we're still good. So in the future, where we're heading, this is the story. Software is going to be alive in that literal sense. It's consuming energy, reproducing for performance, possibly for evolution, well that terrifies me as wearing my engineer hat, wearing the scientific hat, let's put random things in and see what happens, engineer goes, are you nuts? These things are real. They're consuming energy. They're taking space, eventually they're driving your car. We don't want evolution, well, maybe we do. Fundamentally what I'm saying is this is a whole other way to compute than what we've done now. Eventually we'll start seeing similarities, so eventually we'll say, oh well, once you have this thing made out of cells that are competing and blah, blah, blah, you know, you can kind of implement sort of a little teeny finite state machine here. You only use 20 billion gates to implement a three state machine, but man, it's a really robust three state machine, you know, you can taser it and it still says, I'm in state too, blah, blah, that kind of thing. That's the whole story. So if you don't want to stand, you got the main point. Now let's unpack it a little bit in our remaining time. Okay, so what I want to talk about today is this idea of the two attractors. There's fundamentally different ways to compute than the way we're doing it now. Alrighty. That's the two attractors of computing. The consequence of that is we have to give up on this idea of strict correctness. We have to give up on this idea that we have cherished, that numbers must be exactly guaranteed 100% correct or else they're crap. We have to say there's intermediate. These are numbers are kind of sorted. You're welcome. Is that good enough for you? Well, it's up to you. This is my best effort, take them or leave them. I joke, but this is how society works, right? You never get these actual proof guarantees. This is not wheat. This is white. No, no, it's wheat. Okay, I'll get it anyway. My head doesn't explode, hopefully. Out with strict correctness, in with strict indefinite scalability, and this is what I've been talking about, unpacking this idea, and then try to actually get into, well, once we are talking about computation, not as an abstract PRAM model, not as an abstract something of bits and RAM and so forth, but a physical device. We can now characterize that physical device like any physical device. How much does it weigh? How many watts does it dissipate and so on? And we can compare computers, not in terms of MIPS, but in terms of MIPS per watt or MIPS per gram and so forth. Now, of course, an instruction on A and an instruction B could be very different creatures, but we're going to blame that on the first person to talk about MIPS as if it meant something, and we'll have that same flaw. Still, a billion MIPS is better than one, almost surely. All right, let's talk about the first one, so the two attractors. And again, I did talk about this yesterday in the context of life, but so I'm going to try to cruise over it a little bit quicker. If anybody wants to jump in, if anybody is bold enough to try to stop the train, and say, what the heck was that word? I would love to have questions if you want to jump in. Otherwise, I just go faster and faster and faster until we all crash. All right, so I've already talked about this, so just to sum it up. Hardware determinism. This was the fundamental contract of computing. Hardware turns physics into logic. Software turns logic into money. That's the idea. And the money has to be enough to pay for the software and the hardware together, plus 10% for management, or whatever. And if that works, if the logic from the physics leads to enough desirability so that there's enough money to stabilize the whole system, then computing works, digital computing works. And it's been great. Digital computing, Negrapati six or seven years ago said 10% of the world economies now, something like that, is tied to computers. It's been an unbelievable success story. But from the beginning, and again, we learned this if we didn't know before in the lecture yesterday, one guy, one lonely voice shouted out that this is wrong. This is not the future for us. And it was von Neumann. In the future, the actual lengths and chains of operations will have to be considered. You cannot use billions and billions of instructions just to put the queen on the king and solve it there. Because that's stressing this idea of determinism. We need to keep the programs short so that there's a reasonable probability that none of the instructions will screw up before we're done. And number two, we're gonna have to allow the idea that operations may fail. We may say, is one less than two and get the answer no. Just every once in a while. I'm trying, I really am. But there was a cosmic ray or something on TV and I got it wrong. So what it comes down to is, I suggest to you there are these complimentary, largely complimentary approaches to computing that you can line them up across all of these dimensions. And I call the first one finitely scalable and the other one indefinitely scalable, finite versus indefinitely scalable. And they have different language, different emphases all the way through. Finite scalability is traditional computing. It focuses on algorithms. You give me an input and then you wait. You stop, hold the world constant and I compute. I am the computer. And when I'm done and you just wait, I give you the output. This is guaranteed correct, you're welcome. That's the algorithmic approach. In indefinite scalability, the corresponding part is the process. It never begins, it never ends. It's sitting there responding to inputs, getting messages, sending outputs. It has some state. It might respond differently this time than it did last time. Who knows? But the last thing it wants to do is exit. The whole point is never exit. Algorithm versus process, finite scalability. We say the number one thing is it must be correct. If your output isn't correct, we can't even talk about it. You're in some state of logical sin unless you have a proof that your algorithm works, that your output is correct. In indefinite scalability, that's not the requirement. The requirement is you give some answer. The requirement is don't die. The requirement is be there to say something. And once you say something, the goal is to say a true thing, to say the right answer. But that's secondary. And then finally, if you're definitely there and you're probably right, your constraint is you'd like to be efficient about it. All other things being equal, you'd rather save your energy for a rainy day just in case of that. Robust, then correct, then efficient versus what we know and whatever we do with, where we say it's not even admissible as an algorithm until it's correct. If it's not correct, we don't even know what you're saying. You're just battled. And once we know it's correct, then the goal is to be as efficient as possible. And then given that it's efficient as possible, you then may have to make it as robust as necessary. You go back and say, well, it turns out there might be an error in one of these bits, the comparison might be wrong. Let's do the comparison three times and vote whether we want to go true or false, you armor where you need to armor, okay? All the way down, two approaches almost duals in how they work, all right? And the takeaway, the short story, the outrageous concept I'm trying to suggest to you is we are in the wrong attractor. We are in the finite attractor and we want to be in the indefinite attractor. And the current state of computer hardware development, as was mentioned in the abstract for this talk, increase in clock speeds for CPUs are slowing down. Or you have to go to seriously heroic engineering like liquid nitrogen to get your CPUs to go five gigahertz or whatever it is. And on the flip side, that we cannot for the life of us make a secure computer. Both of those come from the fact that we are in the finitely scalable attractor for computing, and we have massively overscaled it. We've taken it far beyond where it wants to be. It's like we had this cute little one-celled animal 50 years ago. And now we have this hideous blob, still one-celled, but gigantic. And it's just not working so well. And we're saying, let's let the blob drive the bus. That's where we are today. It's crazy. It's really crazy. Okay? So that's it. That's my mission, my research mission, my career to make this point as compelling as I can, as broadly as I can. And to make it as credible to think it might actually be possible to compute on the other side, and to try to get people interested and build frameworks and mechanisms that people can start to contribute to learning the engineering, building the body of knowledge that we need to understand how to be useful computations on indefinite scalability. That's the story. So that quote from von Neumann that we'll have short programs and they'll admit failures, that's 65 years ago, 68 years ago, something now. Why is he still wrong after all that time? Why are we still in the finite attractor? And fundamentally, it's because we do what we know. We train our students with what we know. And for the longest time, we were able to get times 10 in the clock speed, times 10 in the memory. Relatively cheaply. Design, walk-in, and network effects, I suggest. Account for the reason why von Neumann said we were going to switch, give up on hardware determinism when we got to 10,000 gates in a chip. It wasn't chips, it was tubes. And we're not even close. And we're still trying to make the central story go. When you say design, lock-in, it's easy to say, but what it really means is there's all these assumptions that go into how we build computers, how we program them, how we judge them as good or bad, that mutually support each other and make it difficult to move away from this approach. And here are several of them. The idea of strict correctness, I'll talk about that just a bit more in a second. Random access memory, I already talked about that. Random access memory is great because in step one, you completely ignore physical space. You say it doesn't matter in the least where this thing lives. Here, here, here, here, all you need is the address and it's equally good. So in step one of computer architecture, we eliminate space. And then in step two, we go, oh, a buffer flow, overflow. How did that happen? Well, we didn't really eliminate space. We just hid it from ourselves. And in fact, at the end of the object, there's another object. It's all still living in space. You overflow one object, you're in another object. But we don't know that. Why? Because we threw it away. We thought random access was such a great idea. See how it works? And you never even think it through. You think that reliability is a hardware problem. When you add up a bunch of numbers, do you add them up twice and make sure the answer comes out the same? That's what your elementary teacher told you to do. Does your computer program do that? No, it would be stupid. Why? Because reliability is a hardware problem. If you were to write software to say, oh, let's add them up three times and make sure we got the same answer, everybody would know you were a fool and you were being inefficient. As long as you have the hardware determinism guarantee in your pocket, you are covered. If it ever turns out that you get the wrong answer, it's his fault. Sue him. Bankrupt his company. I am covered. And that's explicitly what best effort hardware takes away. Best effort hardware says, oh, sir, I really tried. What is greater than two? I mean, no, it's less. No, it's greater. That's it. And once you say that we've changed the deal, now the pressures on software are totally different. Software doesn't even need to respond for 250 milliseconds because it's responding on user human input, a click of a button. It could add up the numbers 10,000 times and nobody would ever notice. And now it would matter. We would do it if we could if we knew we had time to do it. And we would only do the magic, super efficient, never do anything twice, make one little peak, and then go crazy if we were desperate, if we were clawing our way through the desert, trying to survive. What is the biggest number? It's 12. If you absolutely have to be efficient, then you'd be efficient. But you expect you're a death door when you get to that time. Living systems will be 30 times redundant, 50 times redundant, most of the time, except in those moments of great drama. And we'll focus on them in the stories. But we're not going to engineer for them. Okay. All right. Okay, here's an example. This is one of the first things I did that's got written up in a viewpoint in the communications of the ACMs several years ago. Sorting. Let's consider pairwise sorting. But suppose the comparison operator is only right 90% of the time. Okay. And you're sorting a deck of cards, a standard deck of cards. Are you going to get the right answer? What do you think? The odds are good? No, odds are not good. I mean, are you going to get the right answer one in 100 times maybe? It's going to be pretty damn small. Okay. And the important point is, if we're saying strict correctness, if we're saying you must be exactly right or you're just being incoherent, that's all we can say. You're not going to get the right answer. Go away. There's nothing more to say. But if we go beyond that, if we take a robustness approach, we go further. What we have to do is give up on the idea of strict correctness, meaning yes or no. 100% correct or 0% correct. And come up with something I don't know. Let's call it partial credit. You got the answer wrong, but you showed good effort. So here's an example. Suppose here's some numbers that are allegedly ordered. And what we do is we compare how many positions they are out of order. This guy's the smallest. He's supposed to be in the first position. So his positional error is 0. This guy is where he's supposed to be. So these guys are switched. So the first one is off by 1. He wanted to be here. Second one's off by 1. He wanted to be there. We add them up. Positional error of 0 means you got it right. Positional error of 2 means adjacent guys were swapped. Positional error of 8 in this case is the worst case. It means you got them exactly backwards. There's a couple of ways to get maximum positional error. This is one of them. All right? So now, if the positional error is 0, you're definitely correct. And if the positional error is not 0, you're definitely incorrect. But we can now distinguish a little incorrect from a lot incorrect. So we took traditional sorting algorithms and sorted a deck of cards using a comparator that was right 90% of the time and graded the result with positional error. Oh, hard to see. Y axis is the average total positional error in the deck over 52 cards. So the average case of the deck was actually shuffled. It would be up here around 900 or something like that. I don't remember exactly. With QuickSort, we got an average positional error, maybe 250, something like that. MergeSort was actually under 200. BubbleSort. Incredible, ridiculous, black sheep-hated bubble sort. Yes? How do you know that the graders would make a mistake? Because the mistakes that were made here, we programmed it. No, I'm saying real... Oh, not in the simulation. Right. Yeah. Yeah, yeah. No, it's a great question. And fundamentally, when you start to say, I'm going to add extra checks. I'm going to add up multiple ones. Now you have an additional question. What if the guy who's doing the majority rule gets it wrong? And this, again, just like almost everything, goes all the way back to Von Neumann, showing that the additional hardware required to do the voting is basically logarithmic in the amount of information that you're merging. So as long as each individual gate is greater than two-thirds chance being right, but 50% plus... I don't remember the details. They actually show that it will converge. So even though the majority thing might fail as well, and basically there's always this big trick, right, that whatever you do at the very last minute before you send the output out, if that fails, you're dead. No matter how many times you did it, if you tried to write six on the answer key and the guy next to you bumped you when you wrote nine, all your great work, well, unless there's partial credit. Okay, so here's the point. Everybody knows bubble sort sucks. Why does it suck because it's inefficient? Well, so what? It took no noticeable time to bubble sort a deck of 52 cards. Now, again, if we were in a circumstance where it was happening in the inner loop of something else, blah, blah, blah, blah, blah, bubble sort could be completely unacceptable. And that's the example of I'm using the machine to its absolute possible stuff. And so I'm doing all these dangerous, dangerous things like not checking my work. And in traditional computing, we think about that all the time. We do worst case thinking. That's what it means. Worst case. But most of the time is not worst case. Most of the time we have tons of time. We could do bubble sort and be much, much, much happier. This is a striking result that you cannot see if you're thinking strict correctness. If there's no partial credit, these are all infinity. These are all max. Okay, that's the idea. All right, so bottom line. Efficiency is not a good. It's not an unalloyed virtue. It's a regrettable sometimes necessity. When computers were young in Von Neumann's time, they were incredibly expensive and they really sucked. And you spent hundreds of thousands of dollars, millions of dollars, I don't even know, to build these things and then you wanted to get as much computation out of them as you possibly could so you turned yourself inside out to make efficient code. But we are in a completely different universe now. We got computers coming out of our ears. Take a guess how many computers are in the room here. Hundreds, way more than the number of people. And it's only going to get worse. And they have more cycles than they need for almost everything they're doing. Except, you know, yes, yes, you're watching videos and playing a game and... Okay. Shoot it. Yeah, right. So suppose I ran this ten times and I'd still be doing less comparisons than one bubble sort. Would that include the... It absolutely would. Here's another approach. Why don't I just take quicksort, but every time I get a pair of dies, just punch it into the comparison three times and then take a majority rule and have that be the result of the comparison, right? And when we're thinking with traditional computing head and that's exactly the kind of thinking that we want to do, then we say, no problem. And 90% correct. Well, you know, let's do best of seven. That'll drive the probability down to the binomial and everything. And then we're right back to the party because now we'll drive the chance that we'll get more than half of the 90% chance wrong is down to the level of it's likely to not happen even once during the time it takes to do the sort. But that made one key assumption. It assumed the failures were IID. It assumed the failures were an independent random variable, which is a perfectly reasonable assumption to make. But did I ever say that? Did the universe ever say that? Suppose you did that and you made everything work great, but then it turned out there was a weird failure in the comparator, which is actually this kind of steampunk mechanical thing that in addition to having a 10% chance of getting the wrong answer, it also has a 50% chance of just returning the same answer it did last time. The little output lever just gets stuck and it just says, true, true, true, no matter what you put in. Now your brilliant idea of punching it in seven times goes out the window again because now you have bursty errors, you have runs of errors. And then you say to yourself, wait, wait, I can figure that out. I will dovetail all of the comparisons together. I'll do read Solomon coding on the thing. And yeah, you can do that. But think of what you're doing. You're attempting to bag the universe into saying, you must tell me what the possible errors are. And if I can get you to tell me what the possible errors are, I will compensate for them with the exact minimum amount of redundancy and I'll get my determinism back and I'll go home. So strict indefinite scalability, where is it here? All right, let's get on to that. Do we have a thing? Well, it'll be coming up in a minute, but the key point is in best effort computing, do we actually see the best effort computing slide? I can't even remember. Best effort computing says you try to get the right answer but you reserve the right to give the wrong answer and furthermore, your errors are uncertain. If anybody asks you, what is the distribution of your errors? The answer is, I don't know. It could be anything. It could be an oracle out to mess you up. And the reason for that is expressly to defeat this kind of thinking. Where we say, ah, what is the error model? You tell me the error model and it's all good. But we don't know what the error model is and in the real world we can't know what the error model is because in the real world, if nothing else, there is malice. There is somebody who is as good as an oracle who's going to come up with that run of seven failures in a row just to mess you up. Okay. All right. So here is strict indefinite scalability. This is the precise idea. Given an indefinite supply of real estate power cooling and money invent, there is no real estate power architecture based on hardware tiles, completely identical to the degree that we can make them so by manufacturing processes that can be usefully deployed at any scale. Any number of tiles can be plugged together as long as we have real estate to lay them out, power and cooling to run them and money to buy them without ever re-engineering. And then there's one little footnote that says you're not allowed to change the mass of a tile. A tile must be constant and the cost of storing one bit is greater than zero. Okay. That's it. That's the whole game. So you try to think about how would you make computing work like that? All right. Yeah, this... I love... It's not PowerPoint, but I love it anyway. All right. I do this talk and I say, so tell me what you're thinking of. Not even close. Supercomputers in general have been scaled just about as big as they possibly can at whatever level of reliability and cost their fundamental tile is using and if it wasn't, they would have made it bigger. So in fact, every time you want to double the size of a supercomputer or times a thousand, you have to reinvent the whole thing basically. It's finite scalability on steroids. The internet, that's the number one answer up on the board. Internet, internet indefinitely scalable. You can keep adding more nodes, keep adding more nodes. What do you think? Is it indefinitely scalable? Yes or no? No? Why not? Prouding. What about it? There's problems with figuring out where everything is at any given time. And if you add things and remove things, you have to update that table. But there are methods, BGP will do it. Does it actually get slower as the thing gets bigger? But it's slower anyway. Well... We have to come up with some notion of what we believe the cycle time is of the internet. Because if it's going to grind to a halt, then we're going to be unhappy with this. But there's a bigger problem. There's a more obvious problem. Absolutely. Who said it? Yeah. IPv4 is basically already blown out. I mean, long since blown out. IPv6 merely 64 bits. 10 to the 38th addresses. How long would it take to fill the universe from the sun to Alpha Centauri with a sphere or concentric spheres of 10 to 30 addresses? Not long. So, the internet is only finitely scalable because, number one, it assumes a global address per node. And number two, that address is finite width. And even if you thought, well, suppose I had a self-terminating string so the addresses could get bigger and bigger and bigger. Whoops. Our tiles are finite mass. You can only have so many bits in them. Eventually, all of the bits will be tied up remembering what this guy's address is. And you're still dead. Okay. So, for indefinite scalability, you have to give up on the idea of having unique IDs, globally unique IDs. Turn machines. You know, you just keep buying more toilet paper to feed into a thing. And so on. Assuming it was actually really real. Same sort of problems. And then finally, the one that's relevant for the, oh boy, remaining zero time of the talk. Cellular automata. People familiar with cellular automata? Some people refuse to raise their hand. Is anybody in the room? I said, okay, we have one guy in the room. That was an honesty check. First answer wasn't reliable. Yeah. But actually, it was closer to one minus the answer. But so the takeaway from all of this is, boy, actually it's a real jerk. You know, 10 to the 38, that's really big. That's more than there are atoms in the entire planet. You know, you could assign an IP address to every single atom on the planet and he's not happy. Well, that's the whole point of strict indefinite scalability. It's a theoretical notion. And it forces you to say, well, what would you have to do if you couldn't have unique node names? It doesn't solve any of the problems of networking. It doesn't solve any of the distributed networking problems, but it forces them all on the table. You can't hide them. You can't pretend you have global routing. You can't pretend you have constant latency links between any pair of nodes. Those are not physical. You can't have them. It's not indefinitely scalable. Okay? That's the way it works. And by doing this, by being strict, it will lead you home. It will lead you out of the correct and efficient attractor and force you to start thinking about indefinite scalability and all that other stuff. Robustness. The things may fail. Dealing with incomplete information about who the neighbors are. Not even knowing the neighbors. Neighbors don't have names. That guy, his name is West. His name is East. We could start talking and we could agree on a name, but that's part of the computation, not part of the hardware. That's the idea. So, what are we doing? We're giving up on CPU scaling. This is the history of computing. Clock goes faster. The bus gets wider. The memory gets bigger. Alright? This is 1940 1970 or 80, something like that. Whatever. And then, you know, so this clock is now going 3.6 gigahertz and it's just too expensive to go any faster or the pipeline just gets turned into mush. So now we're doing this multi-core thing and we're all excited about that. You know, the stupidest phone has eight cores now. Is that ridiculous? I mean, it's ridiculous, but I love it. And the problem is is now we have eight different caches and each of these guys is reading and writing at the same time and we have to coordinate all this information. Did you change it? Did you change it? Do you change it? And there are a little bit of research that's trying to make cache coherency cheaper, but you read between the lines and it doesn't actually work, not quite yet. So at the moment, cache coherency is basically quadratic as you get more and more cores. It's a hack. It's a nice hack, but it's a hack nonetheless. And where we have to go is network scaling. We're now, instead of making the single CPU bigger and bigger and bigger, we say the CPU is not C anymore. There is no central processor. But we have endless, endless of these guys. And the goal is not to make each one as big as possible. The goal is to balance computation and communication in a way that's price competitive, that's economically advantageous. All right? So the goal is to switch to network scaling. So it's kind of like internet. It has some feelings that are like internet. But it's a personal internet in a box. It's your internet that's dedicated to you. And the processing units out here that touch the internet, you know, we reboot them every 10 minutes just on general principles. That kind of thing. And they have to go from here to here to here to here to here, talking protocols, talking languages, making requests, not doing shared memory, boy, all the way across the whole thing, because we don't have that, only local connections. Before they actually get to my medical information, before God forbid they get to my screen and could affect me, because I'm the most important thing in the internet in a box universe. See? We could have this. We'll have to get started. We have to start springing the design bear traps to do it. All right, I don't have time to do this at all. Right, here's where it is. So what we're doing is we're renegotiating the contract between hardware and software. Instead of hardware guarantees reliability, software guarantees desirability, where hardware is best effort. It's going to try properly as directed by what the software says to do, but it doesn't guarantee you to do so, and even its failure patterns are uncertain. That's to fill the hole. So you can't say tell me the error model. Sorry, is no error model. Have a nice day. Software on the other hand will make its best effort to make progress and accomplish whatever the user wants with a minimum amount of failures and the damage when failures occur while once again reserving the right to fail if it can't do anything else, and correctness, strict correctness is abandoned and degrees of correctness becomes a quality rather than a requirement. Okay, and that's the story. Am I supposed to stop at 10 of? No, no, no, no. 13 minutes to the end of the slot depends on how much you want the questions. Am I trying a little bit more? Okay, a little bit more. Oh yeah, and I promise we were going to talk about this. That if we give up on things like, well, everybody knows it's X86 or ARM7TDMI sub 3 version 2, because ARM is always compatible. Now if instead we just fundamentally talk about physical devices performing computations, we can compare them on first principles. So here, I don't know if this is really readable, these are some basic, basic physical computation metrics that I made up to sort of study what we've got. The peak computational density that's a row sub s, the sub s means indefinitely scalable, that when you're measuring these things, you can't just measure it using three tiles. You have to measure it for three, and then three by three, and five by five, and seven by seven, and amortize until you get down to the price per tile, assuming it can be as big as possible. If you actually just measured off a one tile, you do not have a scalable notion of peak density and so forth. And so it's the tile compute speed divided by a power mass, and measured in terms of instructions per gram second. Instructions per gram per second like that, which is the obvious SI unit, but it's pretty big. And has anybody ever read a science fiction book called Accelerando? A couple of folks. It's really, if you can tolerate science fiction, it's really very cool. And it's pretty, it's on the internet. In that, there's a unit, a slogan, how many mips per milligram? And a mips per milligram is just ten to the ninth times rho, like that. So it's a more reasonable size unit. The peak power efficiency is rho divided by watts, how many instructions per milligram per second, I'm sorry, the instructions per gram per second, per watt, the dissipated power, the peak communication velocity, this is a very important one, is in terms of the physics that you're simulating on the thing, with the scalar automata that you're running on this thing, how long does it take for the first bit to get from the center of one tile to the host level, the center of the next tile, like that? So it's a measure of the latency, but it actually matters how big the tile is. Smaller tiles will have a higher velocity, literal bit velocity than bigger tiles. And then finally the average event rate, the number of events per site per second that the cellular automata sees on a tile, all in with the communication cost, the locking, whatever it is, and this is what we want to see. When we go out to buy our indefinitely scalable hardware, we're going to want to know what's the air? What's the AER, the average event rate on this benchmark, on that benchmark? Question? So is that useful if you don't know the error rate? I could give you something great, right? Yeah, yeah, right. My best effort and your best effort might be slightly different. Yeah, and the honest answer is that we have to nail down the contract between hardware and software to say what counts as a legitimate error rate. And so there's some number. One in ten to the ninth events will fail or something like that. And yeah, you're absolutely right. And again, it's got all the same problem like an instruction. These might be different instruction sets, different widths, how do you even compare instructions to first order though we don't care, right? Anything that could credibly be called best effort has got to be doing hundreds or thousands of instructions per error or else everybody is going to go on Twitter and rag on whoever it is that claimed they had 100 air with a success rate .1. Okay? And just to say that we've got these things here are a few hardware tiles that we can measure these parameters or at least we can estimate them. I didn't really do it. So here HP 65 that you'd make an indefinitely scalable grid out of it by laying a whole bunch of HP 65s next to each other in space and then you measure it. And you say row s, what's that? Oh yeah. The instructions per second per gram well this thing weighs 215 grams or something and it runs a couple thousands of instructions a second maybe. So the row is about 10. The communications velocity the peak communications velocity is 0. It's got no actual way to talk from one calculator to another and that's in fact a reason why it's not a very good tile. But that's a success story for our metrics because the metrics revealed that this was not a very good tile. This is called the Illuminato X Machina the IXM. This is the tile that I developed with a company called Lex Liquidware in 2009. It was briefly for sale. These four connectors connect laterally to four neighbors indefinitely scalable. As far as you want we get numbers around 10 to the sixth, 10 to the seventh, the fourth this is the X minus XK1 it does a little better. The weight of this thing is what matters. So the point is we can already start drawing lines in the sand and comparing potential solutions for indefinitely scalable computer architectures. What we can't say here is what's the air what's the average event rate because the software never got running on these this guy doesn't even run it and so on. We're just at the beginning. Question. The network this one OK. Yes. Sure. As long as we don't pretend there's wrap around or toroidal connections or anything like that. We didn't use the old five processors they had an old file. Yes, absolutely. How is it different? Technology they are indefinitely scalable but it's a super infinite computer or nobody wants to pay for millions of files. I don't exactly want to pay for it but you can be a grant that's a great question and yes two responses number one it works a lot like the five if we just filled this out in a five by five two things the bandwidth off-chip is substantially less than the bandwidth on-chip, right? So that whatever we get from these guys talking to these guys is not going to be the same that we're going to get from this guy going down to the balls and going over to the board. It's latency it's latency more than bandwidth and for our peak communications velocity because we need to have these guys agree who is going to do an event on the edge. So bandwidth is a little important but latency is a lot important. So that's point number one. And point number two the practical problem I mean this is like an engineering question in the fives in the cores on the fives they ditch the instruction cache on the per core things they're really small and for the machine that I'm talking about which I really didn't get to show you so much because we nerded out so hard these guys are seriously mingy. They're doing very very different things like that and to do that these guys all have to go out on the routing memory bus to pull in the instructions they're going to execute. So as a practical matter they made the wrong decisions they needed to make each individual core a little bigger, a little realer but there's plenty of other things coming down the pipe the adeptiva people, the parallel made a similar mistake soon, especially if we know what we want we'll get there. Other questions? So on the software side of things embedded real-time system you have software that you call any time algorithm which are algorithms that you give me more resources I'll do a better job but when you say time's up I'll tell you what I know when I know that I'm just not done with it and that's sort of self-stabilizing in terms of the answers well the question is that so the idea is that you give me more and I'll do better but that's not something that you're willing to grant I would love, again this goes back to if we had this secondary channel if we had this urgency channel then you'd be able to say time's up give me an answer well when you call time it's up, you give me whatever you've got best answer you know we have to implement that sort of purely as software we have to make a protocol message saying time's up and so forth and I would like to push it lower I would like to actually have signal lines they're signaling urgency between modules somehow yes yes it absolutely would and you know you think about things like how to compute the max in an on-demand way is you know it's like just pick a number and return it pick a random element of the array and return it and if you have more time pick another random A of the element and compare it to the one you've got and if it's bigger, pick it and as long as you have plenty of time you'll end up returning the right answer except when the one that you're holding gets corrupted so even that even something as simple as find the max in a here's how much time you may have way gets interesting other questions? I'm cheating, I was there yesterday so I saw it if these things are not addressed how do you not globally address them how do you initiate the program, not how do you write the code I can sort of imagine it but my brain hurts how do you get the code in there? there's two questions, number one is how do you get the sort of table of elements in there that says the properties of the fundamental types and that's the laws of physics how do you get the initial condition in there to actually get it to run and there's separate answers for both of those and the idea is for the laws of physics what you know is they really shouldn't change very often so you have a background neighbor to neighbor thing where you're saying I'm running version 32 of the physics, what are you running oh you're running 31, here copy mine and the laws of physics opportunistically spread through the grid and that is part of the fundamental model in the 2009 grid they would update themselves dynamically on the fly and then when they finished updating they'd reboot to the new software and everything would just keep going and you could see the way moving through the grid like that to get the initial condition in the thing is in an indefinitely scalable architecture there's no boot time there's no power on time things are going on and off the whole south 40 just got hit by and it's up in the next few minutes there is no initial condition there is only indefinitely and that is however you make it happen we are hoping, we don't know it depends on the bill of materials to have an accelerometer on our next generation tile so for example you could just tap on the tile and that could be programmed to inject some atoms into the local cellular automata which the processing could then grind up and say oh I think there was a tap like that and finally that's going to be the system question of what are you doing with these guys last question so I see what you're trying to say with the vastness and all that there's one thing when you're trying to solve practical problems is also the computation of time so are we doing all these at the expense of time or do you think it actually would come in well once again time is only urgent when it's urgent lots of times you know you have a system control clock of 1000 hertz or something like that and all the modules have to report in comfortably faster than that but that's it there's no actual wind to be going way faster than that and again at the system level then that's what we would like to push back and the idea is this particular model and this is what people hate the most is they say energy energy energy they've got all of these tiles they've got clocks spun up they're doing events over and over and over even if the grid is mostly empty you're wasting energy and the answer is yes indeed why? because living systems and this is my biological inspiration they flourish when there's ample free energy living systems can survive when energy is tight but they don't flourish so for here we're saying energy is a sunk cost the cost to run all the tiles is already paid for so in this framework a cycle saved is a cycle wasted so instead rather than saying oh I should stop the clock and save energy we say I should check my work and then later on we can say okay well the fact is the voltage is now down to 3.7 volts whatever it is so now you've got to start pinching but the design level wanted to be done where there's plenty last one I'm struggling with my questions alright so if you're insisting with this one you're still computing general purpose compute is that correct? what I'm really insisting is to take a system level view we come in with what is the system doing and work it down we would like the architecture, the hardware to be quite general with respect to systems that you could perhaps want to guess any value and basically looking at the type of problems dedicate the system that fits yeah certainly as a practical matter for initial testing demonstrations and so on we sort of imagine playing with little robots that kind of things and just doing sort of basic system control in real time because that's a natural fit for this sort of thing but the hope is that once we say wow this is really cool and look you can chop the robot in half and both halves go driving off like that we say well I could take that architecture I could apply it to this and this and this and then it can specialize in optimization for serial determinism we got to give this guy a chance folks thanks so much for coming, thanks so much for saving me thank you