 For the safety of society and to build really big computers, we should pursue robust first computing. A system is robust if it tends to do something sensible even when it's design assumptions are under stress or violated, which sounds like a good thing, and you see robust in feature checklists. But at the same time, a lot of computer science is about using computer hardware efficiently. That also sounds like a good thing, but efficiency and robustness are at odds over redundancy, which robustness requires, but efficiency eliminates. So for computers out in the real world, for systems meant to have actual responsibility, we should put robustness first and view maximizing efficiency not as an inherent virtue, but more as a regrettable sometimes necessity. So that's the core message of an essay I've got in the October issue of CACOM, and this is a little video intro. So I want to start with a question. Why do computers crash? I've been asking that question around the country over the last year and a half, and people say bad hardware, bugs, user errors, malware. The deeper reason is we made them that way. We designed computers to freeze up at the first surprise. Why do we do that? Well, as computer programmers, we were counting on there being no surprises, everything being completely reliable so that we could just put all of the steps that we needed in a row, and we basically never gave a second thought to what would happen if any of those steps went wrong. That attitude goes all the way back to the earliest days of digital computing. And at the very beginning, it's like there was a contract between the computer engineers and manufacturers on one side and the computer scientists and programmers on the other side, though they weren't really called that. And the idea was hardware shall provide reliability. It takes the unruly physical world and turns it into logic. Software's job is to take logic and turn it into functions that are valuable, turn it into something that's desirable enough to provide enough revenue to pay for the hardware and the software and everything together. That's the way the deal worked. At first, hardware was incredibly expensive and really weak, and that created incredible pressure on the software to get as much desirability as possible out of the sunk cost of whatever hardware you were dealing with. And so there was incredible pressure to be correct for whatever desirability meant and then to be as efficient as possible. And this sort of CEO idea that software is about correctness and efficiency only is still the backbone of computer science, algorithm design, and so forth. The world has changed from then. Hardware is now much, much cheaper. It's often more powerful than we need for some given task. And there are bugs everywhere. Correctness, it's not even clear what correctness means in most cases. So the world has changed, but in many ways we're still carrying this original software CEO attitude with us. For the safety of society and to let us build really big computers, we should change that attitude and think about putting robustness first and figuring out what that would mean. Building a body of science and engineering knowledge around computing robust first. So that's the pitch. Let's do a demo and then wrap up. So the example I do is sorting, specifically pairwise comparison sorting. So let's see what that is quickly. All right, I've got a deck of cards here. We've got cards, they're sort of magic cards. We can shuffle them and we can square them up like that, okay? So if we take some of these cards and whoops. The thing that makes these cards weird is they are like alien cards from Roswell, New Mexico, something like that. So if we were to try to put these in order, how would we do it? Is the two suns with the thing better than the green and the red? Who knows? Well, it turns out in addition to this alien card deck, we got an alien card comparison device. And that's what this thing up here is. And the way it works is we stick a couple of cards in it and it crunches over them and it decides, okay, this one is higher than this one. So then we can stick this one in and compare it to that guy. All right, well, this one's higher and so on, we'll do this and all right, this one's higher. All right, so in fact that guy, this guy just kind of worked his way down to the bottom. So now do we know that these guys are all in order yet? Well, we don't exactly because we haven't compared all of them to all of them, but we can try that. It's easier to move the comparison thing. All right, those guys seem all right. These guys, oh, they see they are also wanted to swap. And finally these guys, all right, they're happy the way they are. So is that it? Are we done? Now it's said they're both equal. Well, let's say that's all right. Those guys are still in order. Those guys are still in order. Okay, so there it is. So what we've done here is what's called bubble sort or sort of a part of the bubble sort algorithm. And bubble sort is kind of the black sheep of sorting algorithms. All we do is, well, we take our comparison thing and we compare Jason cards and if they come out of order, we swap them and then we go back to the beginning and we do it again and we do it enough times until we've gotten everyone. So there's enough time for if this guy on the right hand then actually belonged all the way on the other end, he has time to bubble one by one up to the other side. Bubble sort. Now computer scientists love to hate bubble sort because it's so inefficient. But here's the twist. What if it's the case that our alien card comparison device fails sometimes, sometimes gets it wrong? I kind of saw that. Those other than they were equal. If our comparison routine is not guaranteed to be right, we can't guarantee that we're actually going to get the correct answer in the strict correct sense. So we need some kind of way of saying, well, if you got it wrong, did you get it really wrong? And so we made up this thing which we called positional error. Okay. And so the idea is if we were sorting just these few numbers, one, two, three, four and those are in correct order, one is where it's supposed to be, two, three, and so on. So one is off by zero positions, two is off by zero positions, and so on they all add up. Here two and one have been swapped. So two is one position off, it wants to be here. One is also one position off, it wants to be there, the others are fine, and so on. Okay. If we get them exactly backwards, that's one of the cases where it's the worst possible positional error. The four wants to go one, two, three spots. The one wants to go one, two, three spots, the two and three one reverse. Okay. So we could take our alien comparison device, which in fact did act a little bit strange. We had that case where it seemed to score things differently when we tried it at times. Now it seems like it's all right now. And suppose instead of doing bubble sort, we did a more efficient sorting algorithm in this case. And the one I want to look at, whoops, the one I want to look at is called merge sort. And merge sort is one of a class of very efficient algorithms that computer science has developed over the decades. And it's being used a lot. In fact, it's still getting more popular lately. Almost surely if you have a general purpose computer or a phone, what have you in your house, there's probably an example of merge sort in your software someplace. And the idea is you take whatever it's a sort and you break them into two groups and you break those into two groups until you get down to tiny groups of two. You sort those, just sorting them by swapping if need be. And then you start merging the little groups of two into groups of four and then groups of four into groups of eight and so on until you get all the way to the end. Okay. Now this is a clever idea for at least two reasons. So the efficiency win number one is once you have these sorted groups that you're putting together, you only have to check the lead guy. Is this guy less than that guy? Because you know all the rest of them are bigger. So you can use the front guy as sort of a proxy for comparing against all the rest of each group. It's very clever. The second win is especially when we're getting to the end of the algorithm, we've got these big groups that we're merging together. When we decide to take this guy and then this guy and then this guy, we're moving those items long distances in the array. So if you think in terms of positional error, like is this guy way out of position? Well, when we make a long swap, we're reducing the positional error a lot and that's what makes it efficient. All of the efficient sorting algorithms for pairwise comparisons are based on that idea. But on the other hand, the flip side of making it efficient is what if it doesn't work right? What if there's a failure in the comparison? So here's a case where it works. We have, we split them and now we're comparing that two, seven is less than eight. Two is going to be less than four so it goes first. That ends the round of twos. Now we're merging twos into fours and then we'll do two twos into four, another two twos into a four and then finally we'll take two fours and we'll merge them into an eight. Now especially in this last round here, the two is going a long distance over the ways. So the increased efficiency comes from the fact that merge short is designed so that it can end up making these long distance moves. It does one comparison and then moves guys a long way. So it has a large leverage and the increased efficiency implies increased leverage. And so if it's the case that the comparison routine might fail sometimes, then that leverage which just helped us so much can now come around and hurt us so much. And here's the data that we got. Imagine shuffling a deck of 52 cards and then trying to sort them using the alien comparison device, which gets the comparison right, less than, greater than or equal, 90% of the time. But in this example, 10% of the time it just gives a crazy random answer. If we look at merge short, that's the guy in the middle, or quicksort another one of these efficient algorithms, they get pretty well slaughtered by this. And on the y-axis going up and down, we have the total card positional error. Now, depending on how the shuffling actually works, sometimes it's worse and sometimes it's better, so these are averages. But look at bubble sort. People look at this and they think, well, you know, this is easy to fix. We can just, you know, try it a bunch of times before deciding whether to swap or not and so on. There's more details in the essay. The point I want to make is just this, that because we have the software CEO mindset that says reliability is someone else's problem and we can feel free to screw the efficiency down as hard as we can and feel like it's a good day's work, maybe that's not the best approach anymore. Maybe we have to say what we're really looking for is not the high-strung prima donna piece of software but the team player software that gives better than they get. So okay, it comes down to a question about what do we imagine the future of computing and not to put too fine a point on it, but what we have now inside a single computer, a CPU and RAM is essentially sort of a centrally planned dictatorship. The CPU is there controlling everything and all of the pions are just cowering in the memory doing nothing until they're told exactly what to do and they do it or they don't. And it's easy to understand how to make such a thing. It seems logical from one point of view but it's very fragile and it only gets so big eventually you can't have that one guy in the center saying what he can't go any faster. The alternative is something more like free market capitalism, something more like democracy. Now, I mean if you know we look out in the world and there's like three coffee shops on every block, that's clearly redundant. Most of those coffee shops are not operating at the peak efficiency that they could if we scheduled when everybody was supposed to go get their cup of coffee. But on the other hand, if one of those coffee shops goes out of business or poison some of its customers or whatever, the other coffee shops are more than happy to pick up the slack. The redundancy makes it robust and if we think about organizing computation on a sort of democratic capitalist individual autonomy kind of basis and potentially we can make a computer system that scales indefinitely. All right, that's it. That's the robust first computing concept. If you get it, you got it. Thanks for watching.