 I want to talk about robust first computing, it's what I've been working on, and show a demo of the demon horde sort, which is a particular approach to sorting that exemplifies this robust first idea. This is a research notebook video, meaning it's kind of rough. I'll try to make it quick, but it'll probably go on a little bit. All right. This is not the topic for today. This is just the setup as quickly as possible. We invented computers. They were incredibly expensive and very weak, and that created a culture of focusing just on getting the program right and making it as efficient as possible. Now we have computers that are extremely cheap and less weak, fairly powerful, and that culture of correctness and efficiency only, CEO software, is creating incredible security problems, the security nightmares that we're living with today that we kind of think they're sort of normal when in fact they're crazy, and in addition the fundamental architecture of a CPU, a central processing unit, one thing that directs everything in a vast random access memory that just does nothing except for remember stuff, is running out of ability to scale. We scaled that by speeding up the clock, the clock speeds getting about as fast as we can do conveniently, and so we need a new approach, and the new approach I'm suggesting is robust first computing, and I take that so seriously that that is meant to mean it's more important to be robust than it is to be correct. It would be nice to give a correct answer if you can, but it's more important to give some kind of answer and have it be close. If it's not exact, it'll be close. With CEO, with correctness and efficiency only, the idea of getting close doesn't even come up. If the whole idea of correctness and efficiency, the way it works is the later steps trust that the earlier steps got it right, and that nothing changed in between the time the earlier step did whatever it did in the time the later step uses it. But if there's a bug, if there's an attacker, if there's some hardware unreliability, anything that could cause that assumption to be violated, then we ask the question, well, how bad could this computation go? If this guy is assuming that this thing is good, and it's not, how bad off might we be? With the CEO software approach, that question never comes up, and that question needs to come up. So if we think robust first, we say, let's make sure we're in the ballpark, and then try to get it dead on. All right. The movable feast machine is this architecture I've been working on. If we're saying we're not going to use the central processing unit, we're going to have lots and lots of processing units spread out in space. The architectural goal is to be able to take this computational model and scale it indefinitely large, that we can keep on adding more and more computing power to it as long as we have money to buy the nodes and power to power them in space, real estate, to plug them all together. To do that, we need to focus on robustness. This computer potentially could be so big we cannot finish it before we start using it. We're going to be able to plug in more stuff while it's going. It's going to be so big that some fraction of it's going to be getting repaired or crapping out one way or another as we go. The computations that we build have to be inherently robust so that we can slap this stuff together and make it work. Okay. See what it looks like. I don't know if this is visible or not. This is a grid here. The movable feast is this version is a two-dimensional grid of individual sites. Each site can hold a small amount of information on the order of dozens of bits, perhaps, or hundreds maybe, but not on the order of kilobytes or megabytes. But in addition, each of the locations, each site has a certain amount of processing power so that we can have a bunch of things going on at once. In each site, it can either be empty or we can have an atom, one atom. Here's an atom. It's called DREG. It stands for Dynamic Regulator. Now, when this guy takes his turn, whenever he wakes up and says, okay, compute, what he does is he looks at his neighborhood and decides what to do according to the behavior rules of DREGs. The way we program this thing is by building elements. We say elements of type DREG. When they behave, they do this. Elements of type REZ. When they behave, they do that and so on. Build elements, we make them up. We build their behavior functions and we come up with configurations to do some useful computation of some sort. When a DREG atom runs, when any atom runs, it's not like they can access the entire grid. There is no global origin on this grid. There's no zero zero. That's why the grid fades out at the edges. From the point of view of a given atom, he is zero zero. She is zero zero. Indexing is all relative to the self and it's very short range. This diamond shape is called the event window. That is the limits of the number of sites that this guy in the middle can access when he executes. There's 41 sites total. They're all within a distance of four, counting in city block distance. Four straight, one, two, three, four, one, two, three, four, magnet turn and so on. When this guy in the center is executing, he can read and write to the event window arbitrarily. Once he gives up control at the end of his event, then some other guy gets picked and he can read and write the event window, which might be overlapping, might be completely separate. The movable feast machine as a whole can be executing many events simultaneously as long as none of their event windows overlap. This guy is all alone in this picture. Here's perhaps a more realistic little thing. We've got a few other atoms of some sort. Now, the way DREG specifically works, because we're going to look at it in a second, is this. When it acts, when it behaves, it looks north, south, east, or west. One of those locations picked at random to see whether it's empty or occupied. And for example, suppose it looks north, it's empty. When it's empty, it picks some random number and with some low odds, it will create a resource atom. And then I'm called REZ in that spot. And with a very low probability, instead of creating a REZ, it'll create another DREG. So DREG can reproduce into empty spots, but it's very low odds. On the other hand, if it looks in a direction, in this case, if it looks west, it finds it's occupied, it picks a different random number with a different probability, and maybe just erases the thing, whatever it is. And that's basically its rules. And then whatever it did at the end of the step, it diffuses. It considers moving randomly one square north, south, east, or west. Assuming it can. Assuming it's not all boxed in. And in general, diffusion respects whatever constraints there are on the local layout. Okay? That's it. That's the DREG rule. So can you imagine if you started with a piece of grid with one DREG in it and you let it start to have events, what would happen? Let's take a look. Here's a grid. It's a simulator, so it's a finite grid. We've got one DREG in it, and we'll start it up. So there, it was just diffusing around. And eventually, it will get lucky and decide to make something. Now, there's, you can't destroy anything because it's the only thing that there is in the universe there. It made a res. The res does nothing. It just floats around, just diffusing the way everything does unless it actually stops itself from diffusing. And this is all, you know, fairly boring. But it's just warming up. Let's get rid of the grid because it gets visually a little crowded. We're artificially slowing this down so that we can get a little clue what's going on. Let's let it run at its natural speed. And you see what starts to happen. If we, if you can read these numbers over here, there's currently two, one, one, there's 50 res, 60 res, and so on. Overall, there's 18, 19% of this little grid is occupied by something. If we let it run, given the particular probabilities I picked for creating stuff versus destroying stuff, it's basically twice as likely to erase something if it sees an empty spot versus, I'm sorry, if it sees an occupied spot versus creating something if it sees an empty spot, twice as likely to destroy as to create. And that tends to make it float around a third of the sites in the grid occupied. So now we're at 32%. It'll end up going a little above a third 35%, something when it reaches equilibrium. Okay, so is this what you predicted? We get a world full of res and a few drag that are creating them and deleting them as everybody just diffuses around. Well, so what? Well, the idea is this is a basic regulation of free space. One of the essential things about parallel computing is you have to deal with deadlock, crowding, cues full, and so on. In this robust first approach, this particular demonstration, we're saying, let's just deal with that by putting in this lower level that keeps the world about one third full. And if it gets much more than one third full, the dregs are going to tend to see occupied spots, so they're going to be throwing their erased shots more than their create shots and so on. If the world gets empty, they'll see more create shots and so on. But we can do more with this. Suppose I put in another element. Okay, this is a guy that will talk about more in a second. He's a sorter, the demon horde sort, but one of the behaviors that he's got is he looks around himself when it's his turn to go and if he sees res, he converts them into more sorters. Okay, so if we let this run just for a brief bit, the res have been taken over. Now if we let it run, we essentially virtually never see any res. They're still getting created by dreg, but then they're getting converted into sorters essentially immediately. And this is the idea. We can take a dreg in red base, dreg and res base and build things on top of it that will compete for the available resources and that will they will automatically keep their total populations or control because they can only get as big as the number of resources. Okay, well, let's see a little bit more about what sorters do. This is a cartoon of the world of a sorter. Okay, we've got a sorter guy in the middle. We've got the event window around him and it's got these four areas, these little two by two things that are just part of the sorter's world view. And what the sorter is trying to do is move data items, which are these blue atoms and the data items for purposes of this demo just carry a number like from zero to a million or something or zero to 100 in this picture. And what the sorter does in addition to looking around for res to convert into more sorters. He looks at to his right and up to see if there's a number that's bigger than the number he's carrying. Because that's a number we're going to try to sort smaller numbers up bigger numbers down. So if he sees a bigger number above him, he would like to move it down. If he sees a smaller number below him into the right, he'd like to move it up and to the left because in all cases, we're going to sort from right to left. So in this case, the 79 is bigger than the threshold. And so he'd be interested in moving it down. The 24 is smaller than the threshold. He'd be interested in moving it up. But the destination for moving it up is all currently occupied. But we've got an empty spot. So what the sorter decides to do is take the 79 atom, copy him down to this empty spot, erase him from where he used to be. So in effect, moving the 79 right to left and moving it sort of a little quantum, a little increment of sorting. And then as the final step, the sorter updates its threshold to be the value of the atom it just moved. So at the end of this event, the 43 will have changed to a 79. And then he will come be comparing other things to 79. Okay, so that's sort of a little bit like a in a bucket brigade where you're not trying to do the entire sorting job yourself. You're just trying to make things a little better. And by making the spatial assumptions saying that we're going to sort from the right to the left, small is up, big is down. That allows the individual sorting element to make things better saying, Well, okay, given our base assumptions about geometry, I can say with some confidence that this guy should be lower than he is. Because why? Because he's bigger than my threshold. What does that mean? Well, my threshold was set by somebody that went through recently. So this guy is bigger than someone who went through the guy who went through immediately previously. So that's how the sorting works. Let's take a look at it. So here we've got a much bigger grid than we have last last time. It doesn't look bigger because the atoms are all drawn smaller. And so on the field, there are these little eggs at regular intervals. Each consists of two sorters and two dregs. If we look on down, there's more and there's more here in the middle. Another egg. In addition, there's this guy who's an output guy. And the idea is we're sorting right to left. So eventually the data guys will reach the left. And these out guys will suck them out of the system and do whatever the rest of the computation needs with them. In our case, it's just going to score them. It's going to say, Well, I know what numbers ought to be coming here and we'll check to see how well the sorting is going. Similarly, over at the other end on the right, we've got an input guy. He's going to generate data just at random for purposes of this demo to be sorted by the demon horde sort. And the guys in the middle are the demon horde. Demon in the sense of like Maxwell's demon, if you get, you know, just making distinctions based on temperature. Here we're making them based on data. All right. Now, if we just had the one input guy and the one output guy, it would be everything would get incredibly clogged up around them. Number one. And number two, it would be a single point of failure if something went wrong. So when I start this up in a sec, it'll be a little confusing because a bunch of stuff will be happening at once. But the first thing you'll see is the input grid will spread because in addition to creating inputs, the input elements are designed to look above, look below. And if the spots where there should be another input guy are don't have an input guy and they're empty, it goes ahead and puts a new input guy there. And the output grid makes a denser row of two. So we'll see the grids grow as the input grids grow, they will start generating data, the blue atoms, and then the sorters will do their diffusion and the drag will do their diffusion and create res and so on. So let's take a look at its behavior. All right. So pretty quickly, we've now got a full stack of green input grid guys, they're spaced every other spot, which produces the maximum amount of room in the grid for datum to be added. You can still see some of the yellow guys, those are res because the density of sorters is so low that it takes a little while before sorter notices gets into the event window of a res gets into the event window of the sorter. Now, we're not doing very well. If we look at these things, I guess we haven't even gotten any data here. This period works in groups of 500 and we haven't gotten through one yet. But, you know, the input grid is completely clogged with data and very few have actually gotten to the output yet. But the thing, it takes a while for the machine to warm up. We've got 17% of the entire grid occupied. Once again, just because like this is drag controlled underneath, the machine will settle in at around 35% occupied, plus or minus a bit. All right. So now we've gotten some data. The sorting error is seven positions on average. That's pretty bad. So that's each of these output guys knows about because it knows the range of numbers coming in. So it expects that it should be getting its proportionate numbers. And if it's off by, if it should be the neighboring guy, that's the sorting error of one off by two guys and so on. So a sorting error of, well, now it's down to five is, you know, better, still not very good. The categorical error is the number of data items that arrive more than eight off in either direction, sort of like a, didn't even get high, medium, low, correct, sort of. And we'd like that number to be very small. But all right, let's let this run quicker and start to equilibrate a little bit. The overall grid occupied site density will rise. We also can focus on the delivery rate, which is the number of data items that got injected by the grid versus the number of guys that got removed by the output side during a given period of time. That number can be bigger than 100 because we might have a bunch of guys who were sort of buffered up in the grid that ended up getting drained out in the next period. But if the machine is working well, the delivery rate, on average, will be essentially 100%. Now we're doing a little better. The sorting error is like down to about three and a half positions on average plus or minus. It's not too bad. And the categorical error is a tenth of a percent, sometimes never over the course of 500 step period. Not too bad. Now the point of all of this is, well, it's still not equilibrating yet. So now we've got a sorter that, by its very nature, that the data comes in whenever it comes in and gets hauled through and out, that it's not even clear what correct would mean. I mean, the way the output grid is scoring it, it's relying on its knowledge of zero to a million as the possible range of numbers. And the first guy is assuming he should be getting zero to whatever. And the last guy is assuming he should get whatever it is up to a million and so on. But it could be the case that the input grids, because the guys are generating random numbers, might have a whole bunch of guys in a row that are all from zero to 800,000 or something and none bigger. And what are the sorters in the grid to do? Well, what they're going to do is they're gradually going to send those 800,000 guys all the way to the bottom, because they're the biggest thing that they're seeing. And that's going to end up scoring some error from the point of view of the output grid. So the entire formulation doesn't even exactly amenable to perfectly correct solution. But it's a very natural way to judge, are we getting closer? Are we getting further away? And this thing is embarrassingly robust. Unlike the correct first CEO approach, where you do something once, remember it and then trust it from then on, which leaves you vulnerable if whatever you did was wrong or corrupted. Here, we are continually in the process of recreating ourselves. So if something gets screwed up, if we hit it with some x-rays, say, that just means flip random bits, like a bunch of them, like something like a third of a percent of all the bits in the machine. I don't remember exactly. If we blast it, you can see what are there's a bunch of hits. What happens to the sorting errors we get to the end of the next period? Didn't notice categorical error was higher. Actually, the occupied site density dropped because we can be corrupting bits that actually cause sorters to no longer be sorters or data no longer to be data could to be illegal types of atoms. And they just get erased. Doesn't matter, system recovers. Bleach doesn't really apply here because we don't have any bonded molecules, but nuke just to reset a bunch of bits. There we go. So just blow a hole in the corner. And what happens is, well, the output grid reestablishes itself because that's part of its normal business. And the data all sort of collected the leading edge of the damage because they don't have any sorters to pull them through anymore. But very quickly, the sorters diffuse in from elsewhere. The drag comes back in and the machine rebuilds itself. And we can do something even much more radical. We can blow like two thirds of the entire machine away. And again, now the scores definitely categorical errors up to a third sorting errors. Terrible. The thing is getting all jammed up here. So it takes a little longer to recover. We can look at this variety of ways. So so this is it's fairly well recovered again, that's a little bit anemic. The sorting grid hasn't reestablished itself, but it's doing a pretty good job considering it was a pretty major insult to the system. All right. One last point. I've made this grid so that it's about twice as wide plus some as it is high so that I mean, if datum of really big datum starts at the upper right, he's got to cross all the way down to the bottom before he can be sorted right. And similarly, the lower guys need to go up. So the amount of width affects how well we can sort. And actually, if we look here, this is a picture that's showing just the sorters. But instead of showing them as red, it shows them colored by what their threshold is. And we can see that near the input grid, there's lots of different colors. But as we move from right to left, the colors get more and more laminar, more and more uniform. And, you know, it looks like there isn't a whole lot happening in the last third of this grid. The colors seem fairly uniform. But they're making increasingly fine distinctions between, you know, 805K and 830K, you know, whatever it is down at the bottom. So if we want, we can start stealing resources from this sorting thing. And again, this is not, you know, a cooler thing. Instead of having the input and output grids nailed to the floor like this, we'd have some kind of movable bag that all this sorting would be happening in. But just for purposes of this demo, I've hacked this up. And if we start stealing space from this thing, the machine adjusts. Eventually, it's sorting performance starts to degrade. But now we've got all of this space over here, there's still events happening here. There's just nothing there doing anything. And, and if we want, we can put it back. Alright. So that's it. A sorting machine that essentially can't not sort. It's got no real beginning. It's continually rebuilding itself at this basic level. Maybe we should turn it off. Okay, so that's it. This is obviously a different way to compute than the way that we've gotten used to it. But in a way, it's a lot more like how society computes. All of the stuff that we do in the world tolerates these little errors. The idea is that each computing element, instead of assuming that everything else is absolutely perfect, should assume that everything else is trying, but might not be perfect. So computational elements that give a little better than they get, like the sorter doesn't try to get the correct answer, just tries to give a little better than it gets. Thanks for watching.