 Welcome everyone. Let's start our next session. Let me welcome Mr. McKenney, which will hold talk about parallel programming. One remark please for questions or for the Q&A session. Wait for me to walk around the microphone as we are recording this session. Thank you. It works. So we'll have a Q&A session at the end, but please if you have questions at any point, wave your hand, get the microphone and just ask them. It works a lot better if we get the questions asked when they come up as opposed to trying to remember them for 30 minutes until there. So don't be shy. So parallel programming is becoming more familiar as near as I can tell, but it has had a reputation that's not entirely positive over the years. And in Linux kernel though, things changed in about three years. There's a big change. There's still changed. As you saw from this morning, if you went to the, and saw Jim Semlin talk, there's a lot of change in Linux kernel in all sorts of ways. But from my viewpoint, there's a really big change from about 2003 to 2006. And in 2006, Linus noted this. There was a Intel scalability days, I believe it was in November of 2006. And he said that the change in 2003 years prior to that, before 2003, if there was some patch involving concurrency, it was usually totally broken to start with. And it would take lots of review, lots of rework, lots of testing, and eventually they would get something resembling right. Whereas in 2006, it was not unusual for a patch involving concurrency, straightforward concurrency, to be reasonably close to correct on first submission. And there's a lot of reasons why people say that concurrency is hard. And I'm, I actually am not of the opinion that concurrency is hard, but rather that programming, and by extension concurrent programming, is where the challenge is. My experience leading me to believe that concurrency was natural happened when I was a teenager. So it was one Saturday morning, I'm minding my own business and there's a knock on the door. Idiot that I was, I answered it. I opened the door. And suddenly five toddlers about this hike, I'm just streaming across the threshold and going all over the place in the house, and our house is in child proof. And so each of these five kids is going somewhere where you really don't want a two-year-old going, but I couldn't, I was just standing there kind of like, because I couldn't figure out which one I was supposed to chase down, or what I was supposed to do when I got it. About that time, their older brother and sister and their two parents came across their souls laughing like hyenas because I was totally discombobulated. What it was is the Anderson quintuplets. They were making their way from Washington State where they lived and they were going on some vacation or some kind of trip to California. And for whatever reason, they had a strange aversion to hotels, or maybe it was the other way around. And so they were stopping off at friends and friends of friends and we were in the latter category. So we were just one of the places they stopped their way down from from Central Washington down in California. And it was really interesting watching them deal with the quintuplets that evening. The parents seemed quite capable of spinning up as many threads of consciousnesses were needed to deal with the situation at hand. With seven kids, five of them two-year-olds, it was almost never only one. All right? If you think about that, some experiences later, look at team sports. There was a big football game that the locals were really excited about last night. And so you've got 22 players on the field. And then you've got the referees and you've got the coaches. And a given player might only have to keep track of four or five of these people. But a given player probably fairly rarely only has to keep track of one. Plus of the balls in the air, that's something else. It's moving too. All right. So that heresy out of the way. Let's let's assume that the improvement from 2003 to 2006 was real. That it wasn't a figment of Lina's torvoles in my imaginations. Well, now there's people that claim that to do parallel programming you have to have the right language. And by the way, C is not the right language, right? Usually this is uttered by people who are inventing their own language. Well, you know, we were doing C before, during, and after. And I mean, the Linux kernel dialect of C is kind of unusual. I mean, there's a lot of interesting macros and other things in there. There's a lot of stuff layered on top of C. But in three years, it didn't change that much. Okay? New synchronization primitives. Well, I'd like to claim credit for RCU, but unfortunately it went in in 2002. And it wasn't used heavily until after 2006. And so we were pretty much talking about locking before, during, and after. This is the locking thing that if you read academic papers as evil, you can't possibly make it work at deadlocks. It's broken. It just doesn't do anything for you. Well, sorry, but that was what we were using. It wasn't a change in personnel. The, we had relatively low turnover during that time. We added a bunch of new people, but a lot of the people who became proficient at it were there beforehand. So, you know, the main maintainers were there in 2003, and the same main top-level maintainers were there in 2006. A few additions, a few changes, but for the most part, unchanged. Now, the learning external community has a lot of really smart people in it. It's a really wonderful community to work with because of that. But I'm here to tell you they're not born parallel programmers. I was working in 2001 and 2002, and I'm sorry, but no. Okay? They had to learn it, just like everybody else. So, it wasn't any of these popular four reasons that people usually give for why parallelism would be easy. And clearly, I mean, you could say that it didn't get easier, but the results say differently. We had a period in the early 2000s where it was really difficult for people to get these patches right, and later on it got better. So, the results indicate that it did get easier somehow. So, what caused it? Any, any thoughts? Good examples to follow? That's not a bad one. When did locked up come out? That's another interesting one. There was another one back there. Patterns. In other words, not just the examples and the code to follow, but the fact that there were abstractions people came up with. If you want to do this, do it kind of like this. That's a good one. Other ones? I mean, we've got this. How about, you guys, you guys got some ideas too, I'm sure. Experience. You actually do it a few times and bust your head on it and figure out, oh wait, it hurts if I do this, if I do it the other way, it doesn't hurt as much? That's not a bad one either. Other thoughts? Okay. So, let me, let me see if I, if I've got that right. So, what happened is that from 2003 to 2006, the clock frequencies hit the wall. If, if the clock frequencies keep increasing, we'd be at what, 200 gigahertz or something like that by now? And so, they leveled off and it was like, you know, if you want it to go faster, you're going to have to use parallel. Sorry, there's no choice, get over it. Is that, that fair? Okay. Okay, yeah. So, the necessity. Okay. Other ones? Going once? There were a lot of talks. I don't know if any one of them was a tipping point sort of a thing, but there certainly, there certainly was a lot of people just breaking their heads or figuring out ways to do it and then giving talks about how they did it and propagating things out. That's certainly true. And articles and LKML discussions and a lot of things. That's, that is a good point. Yes. Commodity hardware. I'll, I'll, I'll, that's very good. I have a, I have a little bit of slide talking about that too. That's before in, let's see, 1985, you tried getting a parallel processor, period. In 2013, try getting something that's not a parallel processor. At least if it's a desktop or even a smartphone class machine. That's an excellent point. Okay. Another one? All played out? Okay. That's, we're getting tired of this slide anyway. It's time to go to a new one. I'd say all of the above, probably plus some more. And I think if we go through those, you can fit most of them into those three boxes. So the gentleman in the back that said, look, if you want to go faster past 2003, you got to go parallel, there's no choice, is kind of a culturation. Expectations changed. In the 1990s, you could be a performance programmer and not knowing about parallelism and, and be able to look yourself in the mirror in the morning. After 2000, maybe not so much. Now, there was a comment over, over there about availability systems. In the 1990s, there weren't a whole lot of systems, but economics fits in. And let's, let's take a closer look at this. Okay. And so, you know, you got a time, is money at hardware availability? So, how much are you going to invest in something if there's only 6,000 copies of them in the world? If you, if there's only a few tens of thousands of parallel processes in the world, how much sense does it make to go make a, any kind of thing that helps do parallel processing? And the answer is it doesn't help much. And we'll, we'll take a closer look at that a little bit later. And one of the things that economics leads to is tooling. If you have, as we have now, hundreds of millions of parallel processors in the world, all right, if you can do something that makes a 10% difference in each of those, that's a big value. And it's worth making an investment for it. And the locked up example is possibly one example of that. So the more developers, the more machines you have, the more sense it makes to put time and effort into things that make the developers' lives easier, that make the machines work better. And that means that it's easier to do it, which means that more people can expect, be expected to do it, which means, and this loop goes around. And I think it's, the key thing out of this is that a virtual cycle like this can make a really big difference in a connected, closely communicating community like the Linux kernel community. You can do that very quickly. If we're back in the 80s or 90s, where each, where the parallel community was kind of split up among a bunch of proprietary projects, I was one of them, Dynamics PTX, which was Sequence Operating System, a Unix-like operating system that was parallel in the 90s, that was what our eye was. But we didn't talk very much with the Solaris guys, or with the HPX guys, or with the Digital Unix guys, or with the Oracle guys at the application level, because they were all in different little proprietary islands. With the Linux kernel community and other parallel open source projects, we have communication among all of us. And that means if somebody comes up with an idea over in, I don't know, Postgres SQL that turns out to be applicable to Linux kernel, there's a reasonable chance that after a little while they'll actually make it here and we'll be able to act on it. Or as earlier, if something showed up in Oracle, it might have helped out Dynamics PTX. It wasn't going to be communicated. So to add a little more to it, as you increase the acculturation, you get more developers and better productivity. That means it makes more sense to have more parallel systems and use them more, which means that the economics improves. It makes sense to put more investment in things that make the world better for parallel programming. And that means you've got more tools, which again enables more people to make things work. So this is one of the things I think that made a big difference in those three years. And I think it's still making a difference. We do have some challenges still. In a shameless plug, I'll be talking about some of them in my validation talk on Wednesday for plumbers. It's not perfect. Nothing ever is perfect as far as I can tell. But if you at least move things in the right direction, you make things better and that's better than making them go worse. So we talked about sample code and patterns, and that's one of the things that feeds in there. And LockDep we heard about earlier, and I think I would argue that Sparse and Coach and L helped as well to some extent. LockDep is more heavily oriented towards parallelism, but Sparse really does have some locking matching techniques. And if you have a parallel bug that has a pattern and source code, you can make Coach and L go look for it. And so these are just some examples and some of them you guys gave. You did a good job earlier on on how you can make this stuff work better. I'm going to give an example of economics. And the gentleman over on the side talked about this. So in the 1990s, it was like 1991 or something like that. We were having a benchmark at the Sequin Benchmarking Center. Sequin was a little company that made database server machines back in the 90s. And the benchmark was running. It wasn't getting the number they wanted. They wanted it to go faster, but the disks were not overloaded. The bus had plenty of excess bandwidth. There weren't any lock contention problems, but the CPUs were 100% saturated. And if you're benchmarking what that means, you need more CPUs. And so I was carrying the stack of five, these are huge things. They're like this big. They had two 8046s on them. And this was before CPUs had their cache on the chip. So they had this big block of cache memory for each CPU and bus interface and everything. So these were just the CPU boards and a little bit of cache. There was no ethernet, no disk, no disk controller, no nothing. It's just a CPU, two CPUs, this big. So they had a stack of boxes with five of these things. I get about halfway across the parking lot and I realize I'm carrying three times the purchase price in my house in my arms. All right. And yes, I did walk a little more carefully after that, in case you were wondering. The thing was, is that if you wanted to mess with parallel processing back then, you didn't have a whole lot of options. These things cost many hundreds of thousands of dollars for like a two CPU system with almost nothing on it. If you wanted a system that could actually do something, you were talking millions of dollars. There weren't that many hobbyists that could afford them. And I didn't know any hobbyists that were that rich. Maybe you guys did, I don't know. So what that meant was there was a very few people that had any chance at all of playing with a parallel processor. There were guys like myself. I worked for a company that sold parallel hardware. They had no choice but to pay people like myself to play with the software so they had something they could sell. Our customers might have a test machine lying on the side. Now these things were horribly expensive. So the tendency was to keep them fully utilized. But sometimes it's just the way things work. You needed to have a test machine dedicated for testing. And there might be sometimes when it wasn't being used, in which case some of the employees of that company might be able to play with parallel processing at that time as well. The third group were a few college students. You might have a university that had a research grant on parallelism that bought such a machine and then would use it for classes afterwards. But even if you add all that up, you got this many people in the field at that time and about this many that had any chance to get any experience at all with parallel programming. I don't care what kind of system it is. I kind of doubt that you're going to be able to get very good at it without actually using it or an emulation of it. Flight simulators, I'll grant you. Those do work at least to some extent. But people were complaining about parallel programming being hard. Well, if you don't get a chance to try parallel programming, how can you possibly get good at it? In contrast, in 2004 I was collaborating with a kid from University of Toronto on RCU, of course. And we got a paper into a conference. And he went out and he bought a dual core Mac for the sole purpose of being able to say, yes, I'm doing this presentation on a multi-processor. In less than 15 years, the price of a reasonable processor went from multiples of a house to a fraction of a used car. And of course, that was almost 10 years ago. Now you could probably get one for almost free if you didn't mind locking yourself into a two-year plan or something like that. Of course, you'd have to root the thing to be able to do anything on it besides use it as a telephone and a place to drop apps. But still, if you're in the developed world, you can afford a multi-processor. Even in the developing world, if you really wanted it badly enough, you could probably work your way around to get it if you were, at least if you're living in a city. The guys in rural India, China, or some parts of Africa might still have a little problem. Regardless, there are now a lot of people who have access to multi-processors, hundreds of millions, billions, somewhere around that range. And that's kind of cool and might be a little scary as well. We'll probably see some very strange ideas coming up about how you can program multi-processors with that many people. There's a lot of crazy people out there and some of you come up with crazy ideas. And if you come up with enough crazy ideas, you'll find what that actually works really well. And so we may see some interesting changes in how we do these things. On top of the changes coming from things like GPGPUs, FPGAs, and other hardware accelerators. So we had LockDep and sparse and so on. And these things, sparse is motivated by, as far as I can tell, by user kernel pointer errors. Where you have a pointer to a user space address and you try to use it directly from the kernel. Which can cause you to get page faults in places you don't want page faults among other things. But in 2004, the idea of static analysis was really exciting. There was a lot of excitement about how you could do great things with it. And so it was extended to a bunch of things, including a little bit of concurrency and did some approximate analysis of locking. It'll yell at you if you acquire a lock and then don't release it, for example. Or release a lock you didn't acquire. Give or take, it doesn't under, it only goes function by function. So if you have a function that acquires a lock in returns, you have to put a marking on there saying this acquires a lock and so on. But still, I can find things like that. I think LockDep is the real workhorse. I think it's one of the things that's really made a really big difference in people's ability to program Linux and also in Linux's reliability. It was actually motivated, as I can tell, by some real-time work that was going on. I was part of that. And one of the problems we had was that increasing the degree of preemptibility made concurrency bugs more probable. If you have a deadlock, but one condition of the deadlock shows up for microseconds every few milliseconds, the probability that deadlock is quite low and you might find that it's almost impossible to hit it in a real workload. When you introduce preemptibility, that few microseconds, you might get a preemption there. If you get a preemption there, suddenly the lock's held for a very long time, many milliseconds, which greatly increases the probability of actually seeing the deadlock. And so the real-time guys are just getting killed by deadlocks of various sorts. Some are their fault, but others were just there in the kernel. So what the locked-up tool does, it's not perfect, you can still get deadlocks that locked-up won't detect. Can anybody tell me a deadlock that locked-up won't detect? Yeah, something back there. I'm sorry. Oh, okay. So the purpose of locked-up is to detect deadlocks, and it's a pretty good job, but there are some deadlocks that it won't detect. Can you tell me anyone? Yeah, Alan. What kind of device? Okay, so there's one. Why don't you tell us why? Okay, so let me rephrase that a little bit. What locked-up is doing a proximate analysis, rather than treating every instance of a lock separately, what it does is it groups them together. So if you have a given field and a given structure that's a lock, by default, all those locks will be treated the same. And so if you require one lock from that device and acquire another one, normally it'd yell at you about deadlock. Now, it's possible to sometimes, you know, put, tell it, separate them into classes. You can say, look, I know you think these are all one lock, but this is one group of them and this is another group of them. But if it's just kind of random in a tree-structured thing, it's really hard to get locked-up to understand that. So in some cases, given locked-up has been turned off, and therefore you can see deadlocks involving those locks, and locked-up won't detect them. So that's an excellent example. Another example is if you have a deadlock that involves waiting as well. So you've got somebody doing weighted vendor or something like that, and they're holding lock while doing that. If you set it up just right, locked-up won't realize that there's this dependency between the wake-up event and the lock it's holding. And so there's some, but nonetheless, it handles a huge number of cases and does a pretty good job of it. And it's made it so that rather than these things that pop up in production on some really busy machine, they get shown up at test time. For example, by Wu Fengguang's K-Build Test Robot or in Dash Next testing by Stephen Rothwell or any number of other places, or just by your own tests if you're a developer. So Cochinel is a really cool thing. The thing that's really different about Cochinel, a lock-up and sparse were pushed by people that were part of the community. Sparse was started by Linus Tarvalz and lock-up was started by Engel Walnar, Peter Zilster and that group. Cochinel was an academic project. If you pay really close attention to the top 20 developer lists that come out with every release, you'll have seen the name Julie Lawal on it several times, and this is her baby. You can think of it as sort of like a version of said that understands C syntax, and so you can make these sort of like said scripts except that they are manipulating specifically C, and you can make them so that instead of just changing things, it generates patches. So you can do something like, for example, if you have a function that's used all over the place in the kernel, you need to add another argument to it, and by default that argument is just a null pointer. You can make a little Cochinel script that just goes through the entire Linux source tree and produces a patch that converts all the old style calls to the new style calls, and it takes as long as it takes for this thing to read through the kernel source, and it's a heck of a lot easier in doing it manually. The other thing is that you can also, instead of just making a pattern that looks for a change, creates a change, you can make a pattern that looks for some type of bug. So if there's a bug report that happens with a specific pattern that's known to be almost always wrong, what they'll do, Julie and her students and colleagues will write a Cochinel script that looks for that and just scan the whole kernel tree. And there are people who, as part of regression tests, run these things on each kernel version or sometimes each change that comes out. So it helps us find these things much more quickly. All right. So one question then is what's left to work on? I mean, if we've got all these great tools and everything's working wonderfully, do we just pack up and say parallel programming to solve problem and that's it? And I don't think so. I haven't been able to attribute too much myself, but there's been some very interesting work going on in the decache system to try to make it so that we can handle file systems in a much more parallel manner. And that's part of SMP systems having hundreds or thousands of CPUs. Now Linux has been able to handle large systems for quite some time. The first bug report I got against RCU for a 512 CPU system was in 2004. SGI has been pushing this for a long time. But there are actually several vendors now that will sell you systems of 1,000 or more CPUs in one shared memory instance, running one instance of the Linux kernel. Another big thing coming forward is the special purpose hardware accelerators. GP GPUs are a big thing right now. Anything from NVIDIA and AMD's offerings and of course Intel IZ on PHY and plus a bunch of other people as well. FPGAs, crypto units, everything else. What happened back in the 80s and 90s when single threaded performance is increasing exponentially, doubling every 18 months roughly? If you were crazy enough to make a hardware accelerator, by the time you got done with it, the CPUs probably were going faster than your accelerator was, all right? Because, I mean, think about it, if it was gonna take you three years to get the thing done, you were gonna have to make it go if you wanted a 10x performance advantage, because I mean these things are inconvenient to use, right? You have this funny thing, you gotta buy an extra one, you have to have software that uses it, you have the drivers, and so you can't just offer equal performance. You gotta, you wanna offer like 10x. Well, it'll take you three years to get the hardware and software done before you start selling it and you wanna sell it for another year, so that's four years, all right? That means you gotta go 40x faster to begin with in order to still have a 10x advantage by the time you're still selling it. The amazing thing was that there were some accelerators that actually made headway during that time, including GPUs. Since 2003, that exponential increase in single-threaded performance has either ended or died down a lot, depending on who you talk to, and so there's one reason why hardware accelerators might not be stupid. There's a lot of other reasons why they might be stupid, like the fact you have to do drivers and special things for it and train people about them and so on, but we're going to probably see a lot more people trying these things again, and probably a lot more of them succeeding than did in the 80s and 90s. One that's been keeping me entertained for about the past eight years has been parallel systems with both real-time and energy efficiency constraints. It's one thing to make something scale and perform well. It's quite another thing to make it do that while you're getting real-time response and not wasting energy. There's a lot of applications that you can parallelize straightforwardly and we've run a lot of those already, but there's some that are harder and there's a lot of work left in making those go faster. Now, that might not mean parallelizing them. It might mean special-purpose hardware. It might mean that you come up with a clever trick that allows you to do it still sequentially but a lot faster. That's happened multiple times or even a number of other things that might happen, but parallelization is one possibility. Another one is just the unit volume mold core embedded systems. There's at least 100 million of those things out there and as the smartphones migrate to mold A core, we're gonna be seeing numbers over a billion. I made myself really unpopular a sequent by telling people that no, we could not allow 10,000-year bugs into the code base. What a 10,000-year bug means is if you have a system running for 10,000 years, you have 50% probability of that bug occurring. The problem was we had about those six or 10,000 installations and the thing is you don't just have one of those because you allow them in, you get a bunch of them and if you got a bunch of systems, then you start seeing them and they're kind of strange because they happen once and they don't happen again. They're reproducible. But even then, if somebody told me they had a million-year bug, I probably said, yeah, it might not be worth fixing. As near as I can tell, if you include the single-processor systems, the installed base of Linux OS instances right now is well in excess of a billion. Well in excess of a billion. You got yourself one, just one million-year bug out there. It's happening almost three times a day. Three times a day. So this is gonna put some serious emphasis on validation. Back in the sequent days, I could construct tests that puts as much stress in a weekend or so as our customers have put on in a year. Well, if you got a billion of them out there, that doesn't work quite as well, right? Either you need a lot more machines, or you need to have better techniques, and I think it's gonna be a case of both. One saving grace, the machines are a lot cheaper, so we can afford to, you can imagine having a test rack of 10,000 machines. I can't pay for that myself, but if you look at the amount that would cost for the low-end machines, some people could actually afford that. And there's some things that are coming along in the academic world that look pretty interesting for doing fairly brute force, really fast validation. The other thing, this was the pattern's thought, is taking things that are currently expert only, and coming up with ways of expressing them or ways of constraining them that allow people that aren't so much experts to be able to use them. And we've had this over and over again. My favorite example actually doesn't involve programming. In the 1500s, there was a German nobleman who, he was more a businessman than a classic nobleman, but he had a son, and he wanted his son to be able to take over the business and continue it and grow it and everything. And so he asked a local college professor what kind of education he needed to get for his son. The college professor gave this answer. Well, if your business is such that he only needs addition and subtraction, any of the universities in Germany will be good enough. If he needs to do multiplication and division, you need to send him to Italy. And I think part of the problem was Germany was still using Roman numerals at that time, and Italy had figured out about Eric and beating numerals. But nonetheless, in 500 years, 500 years later, I was taught multiplication and division in third and fourth grade, as was everybody else I knew. So what went from a kind of advanced studies, you can get this in one country in the world, university topic, became something we did for elementary school over that period of time. And if you think the same thing won't happen with parallel programming, I think the burden of proof is on your shoulders, because this happened with a lot of different things. Used to be the guys driving cars were sort of combination chauffeurs and mechanics. They needed to be able to fix the thing if it broke down, which they did fairly frequently. Now we take teenagers and throw them in the car. You know, you got two tons of steel, they're equivalent to three sticks of dynamite in the gas tank, and yeah, you know, go, right? The other thing is that in the last few years, in my opinion, you feel free to quote me on this, I'm already very unpopular in that area anyway, and you can't make me any more unpopular. A lot of the theoreticians have been stuck in the 1980s. And they have this kind of 19th, less than an airport laid out, we have sequential consistency, everything's perfectly ordered, if you do that, it'll work. Never mind the fact that no hardware is strongly ordered. Not even x86 in the mainframe. They're close, but they are not strongly ordered. The reason is that it's very inefficient to do that. You really want a store buffer. In the last few years, there've been some people that are beginning much more aggressive. I was at a workshop a couple of years ago in the Schloss Daugetool in Western Germany, which whose name I probably mangled. And I didn't make my, I was engaged in not making myself popular with theoreticians again. So what happened was I described some of my formal methods I've used, I've used promuline spin and things like that to do mechanical proofs of various things around surrounding RCU. And I said, well look, the problem is that every time I do that, a couple years later, I come up with a different way of getting the problem solved that's so much simpler that I don't need a mechanical proof anymore. At which point one of the formal theoreticians put her hand up kind of nervously, well then what are our formal analysis people supposed to do? I said, you know, look, I've done formal analysis for 20 years, but I'm not gonna do much of anything formal by hand on the links kernel, which at that time was only 13 million lines of code. I suggested that if they had techniques that could run on that much code, that'd be a really great contribution. Most of them didn't look particularly reassured by that statement. There's a guy named Richard Bornat. He's a UK guy, he's got a Wikipedia page. He's been doing formal methods since the 60s. He was a prof in the 60s doing computing formal methods. He got up there a little bit after I did and said, our job as formal verification people is to verify the code that the developers write in the language that they write it in the environment they write it in. And he and his students had written this thing that does a precondition and post-condition. If a block of code, the precondition says, you have to have this be true before you enter it. For example, if you're gonna divide, you need the denominator not to be zero. And then the post-condition is what happens. So post-condition is you get the quotient, right? So they had this thing that automatically generated preconditions and post-conditions being fed the code. Oh, I guess I'm boring the laptop to death. Hope I'm not boring you guys to death, but the laptop definitely. I think they should do that too. There we go. The, anyway, they'd fed all the Linux kernel code through this thing and they were able to generate valid pre and post conditions for a million lines of the code. Now I'm not sure what I would do with a million lines worth of pre and post conditions, but the fact they're willing to take that on is pretty impressive. More recently, some guys in the infinite Cambridge have come up with theoretical models of weak memory ordering. And there's been a, and I'll talk about this more on Wednesday, there's been some people using, of all things, Boolean status ability to make efficient verification methods for parallel programs. So there's some really cool stuff happening, but we're gonna need all of it to deal with our billion or more instances of parallel Linux out there. This slide was sponsored by IBM Legal. I do appreciate them keeping out of trouble at various times in the past 10 years, but there we are. And questions or comments, I'd be happy to take them. Yeah, we got a mic coming for you. We got a mic coming for you, that way I don't have to try to repeat what you said. So we've sort of, we've sort of moved from, everyone now has parallel processors, so we can all do parallel things on our particular phones. Do you see any like feature things where we will use everyone's parallel processors and do more parallel processing across different machines? You mean like SETI at smartphone? Yeah, yeah. That's an interesting question. The reason that SETI at home worked was because we had everything hooked to AC. And of course your smartphone is on battery. So I would hope at some point that they would recognize when they're connected to power and maybe make themselves available under that situation. Of course, that'd be nice that they also checked for your plan and made sure that you had unlimited data or something like that or unlimited something. But that sounds like, sounds like could be a really good thing to me. There's one of the benefits of having a billion instances of parallel programs out there is that there's a huge amount of computational capacity. So hopefully that does happen. Good, more questions? Got one in the back over here. We've seen a lot of the, or we've seen your work on making like CPU isolation where you use one CPU for a specific task. You think that's going to increase to the point where we sort of automatically schedule dozens of CPUs and sort of put tasks to them? That's a really good question and I don't know the answer. I can speculate a little bit. My guess is that a lot of that has initially anyway has to happen at the application level because the application really understands what's going on. I do have one counter example of that which may be something can be made more general. And that's in the late 1990s. This is in Dynax PTX. Dynax PTX validation was simpler because there was one application and if that application ran, you were done, all right? And that application had an interesting structure where it had pairs of processes. So the application told that we made a new system call. The application told the operating system these two processes go together. And by doing that, by taking advantage of that, a guy named Phil Krieger did the scheduler mods. He was able to make the benchmark numbers come out 10% faster than the best possible person doing it by hand. So the potential's there. On the other hand, I'd hate to have a separate system call for every application out there. Hopefully after a few rounds, we'd be able to generalize it somehow and come up with something that worked across a broad range of them. So I think in the short term, we'll be doing it, that'll be happening, but it'll be the applications doing it because they know what's going on. And maybe longer term, we'll be able to learn from that to make something that allows the application to tell the scheduler enough information, but in compact and canonical form that they can do something useful with it. That's a good question as well. Just also to try to kind of point, we're one of those people that will quite heavily, I saw CPUC set everything off down into singles and, but yes, if, say for example, hyperthreads cooperated. No, that would be neat. I thought the general strategy for hyperthreads and heavy real-time and HPCs applications turned off. But it's an interesting thought at the back of my head. Have you had much sort of, just sort of trying to look a little bit ahead of now that simple parallelization as it were is in that virtuous cycle? I'm wondering if maybe the next step is moving that out from the CPU view and making it more memory centric, especially with all the big data and so forth. And that feels like it's now the difficult part of the problem. Well, it is a timely question in one sense. Just in the last couple of weeks, I ended up with a bug in the CPU stall warning, RC CPU stall warning stuff, that appeared if you had more than six terabytes of memory in some configuration. So scalability, yes, scalability is more than just CPUs. And it's, there's a bunch of things that people have done to try to take advantage of locality because the Von Nomen bottleneck is something we're talking about for a long, long time. And in some sense, the idea was that, there was this big thing back in the 80s. The architecture of computers has to change because right now we've got the single bottleneck in the CPU of memory and it's limiting everything. And so they came up with all these non- Von Nomen architectures, each of which had non- Von Nomen bottlenecks. They were more severe than the Von Nomen bottleneck, to which point somebody said that the main bottleneck in computing wasn't the Von Nomen bottleneck, but was it fewer than, what was it, fewer than 10% of the people had more than half of the capability of John Von Nomen, I think was the way they put it. But the thing is, is that CPUs are really multifaceted things. You've got GPUs, you've got the CPUs, you've got memory, you've got IO interconnects of various types, you've got all sorts of buses with different bandwidths, you've got caches. And trying to make one of those go full out. I mean, you take a specific instance, don't try to be portal, take it as a specific interest, specific instance, you get every drop of performance you can out of it is a really challenging thing. And you'll do that for one application. And then as soon as you get the next rev, it'll not work as well. But, and in doing that, you have to take account all of the hardware. So people have only done that. Even back when I was the average age in the room, you tended to try to avoid doing that, even though portability wasn't as big a concern then. But perhaps we'll be able to automate more of that. But yeah, memory is gonna be a big issue. Getting stuff into the SSDs or causing shockwaves going up all the way through the kernel and then through the hardware because suddenly the performance characteristics of mass storage is way different now, or might be. We'll still have rotating mass storage around for a long time, as far as I can tell. So yeah, there's a huge amount of things going on. It's not just, I was mostly focusing on CPU scalability, you're right. But yeah, everything else in the world too. If there's no more questions, I think it's about lunchtime. But I'm certainly happy to take more questions if people have them. If not, thank you very much for your time and attention. And have a great conference.