 There we go. I get to do a whole talk in icons. This is a follow on from last year where I gave a talk about Spectrum Meltdown, what happened then, and then let's talk about what's happened in the year since then. A lot of things. There's new icons. There's new logos. And there's more fun. I am, again, going to vastly oversimplify everything here. There's a whole bunch of good technical background here. Full details, resources, go to that link. It's all online. The whole talk is there, notes, everything if you want to know more. So that being said, let's just dive into technical stuff. So this year, earlier, beginning of this year, we had something that came out that was noticed or reported as MDS. That was a common phrase for this. And this is all of these bugs that are happening are all bugs in hardware. Now bugs in software, sort of bugs that the software and the kernel or the operating system for other operating systems, these affect everybody, has to fix because they're bugs in hardware. The same family that Spectrum Meltdown was in, so it's kind of the same idea. And everything we're doing here is we're exploiting the bugs in the hardware when the chip is trying to look into the future, trying to speculate as to what's going to happen. So chips these days go down in multiple paths, thinking they're going to figure out what's going to happen in the future. And when the code doesn't go down that path, they throw it away and keep on going the other way. When you throw things away, turns out it wasn't always throwing everything away. And there's all these different variants. There's all these different issues involved in here and there's different types of this stuff. And let's go through some of these little types because this is what's been announced and what keeps being announced this past year. And like I said last year, these problems are going to be with us for a very long time. They're not going away, they're just not going away. It's going to be with us. We've got more coming. So MDS, MDS equals the same type of thing. There's Redroll, Fallout, Zombieland, and others. Again, these are CPU bugs. All the same problem. Again, speculation. But the cool thing is it's being found by different research teams. We have researchers all around the world now, a bunch in the Netherlands, a bunch at other universities that are finding bugs in these issues. And the fun thing is they're finding them by reading patents. They're doing the research, reading the patents of what Intel and IBM and other people have done in the past in designing CPUs and they're finding problems in areas to find there. So open patent issues are affecting security problems. It's kind of funny. To solve these issues, you have to update your kernel and you have to update your BIOS. There's all microcode updates. Inside a CPU is a whole other world. We've learned more than I ever wanted to know about how a CPU works inside. And I'm even learning more and more. There's all sorts of fun terms. I'm gonna get to some of them now. But CPUs inside are a whole different state machine. You think you know what they look like on the outside and how registers work. It's something totally different inside. And now that insideness and that inside the way CPUs are using and how they work is being exposed outside that barrier. And that's the problem. That barrier is becoming porous. So let's talk about MDS. This one, MDS is one program can read another program's data. All these issues are not severe and that you can't modify what somebody else is doing, but you can read secrets from somebody else. And that's a bad thing when you're running in a shared environment. Cloud computing, even browser tabs to other browser tabs. You can cross virtual machine boundaries for a lot of this. MDS you can. And MDS exploits the fact that CPUs like Intel CPUs are hyperthreaded. They have multiple cores on the same die that share caches. When you share caches and the TLBs, you can actually detect what the other CPU core was doing. And by doing that, you exploit this issue. And I really want to call out OpenBSD. A year ago, they said, disable hyperthreading. There's going to be lots of problems here. We don't think we found them all. And so you said, stop doing this. And everybody's like, yeah, that's going to slow down things. But they were right. They were right for the wrong reasons, but they were right. They chose security over performance at an earlier stage than anybody else does. Today, all the Linux distros, everybody says disable hyperthreading. And that's the only way you can solve some of these issues with kernel changes and BIOS updates. So OpenBSD, by disabling hyperthreading, didn't solve all the problems, but they solved a bunch of them earlier and before everybody else. So I really want to call out props to them. They did a good job. They told people to disable in June. They told people to disable in August. All the kernel distros said, beginning of last year, disable it now. So all new installs, basically Linux distros, will disable hyperthreading. Microsoft did the same, Apple, I'm not sure, but they don't really affect it here. So Riddle was the first one that was published. It's called Rogue In-Flight Data Load. So CPUs inside, again, we're going to talk about how CPUs work inside. There's something called a line fill buffer and ports. And this is how data comes into it, how it gets used and before it gets passed on to another area. So this is, because of the way these work, you can steal data across applications. You can steal it across virtual machines and secure enclaves, which is really funny because these are supposed to be secure. So inside Intel ships, there's something called, I think it's SGX, secure enclave where you can run code that nobody else can see outside of it. It's really porous. You can see right through this thing. To solve this in the kernel, it takes a BIOS update, but you also, in the kernel, we fix this by flushing the buffers every time we switch context. So we stop this, we flush the buffer, we keep on going. Solves the problem, everybody's happy. Fallout, another cool logo, came out at the same time. And so those were load, these are store buffers inside CPUs, how CPUs write things inside before they do things. This one's a little more dangerous in that you can read kernel data from user space. It doesn't cross the virtual machine boundary, but you have secrets inside the kernel, keys, other fun things, you can now read them. And this totally broke the fact that the kernel does random addressing for when it starts up. So it actually made it worse. So the random stuff just makes it easier to exploit. Meltdown mitigation, when we added that to the kernel, made this easier to exploit, so we tried to fix one thing and make something else easier. And again, we fix this by flushing the buffers and move on. Again, BIOS update, fix it. Another one, Zombieland, best logo ever. These guys had a cool demo. So go to their website, look at the demo, run it. You can steal data across applications, across JavaScript, across browser tabs, really, really scary stuff. Good logo because it's scary. Just like riddle, line, fill buffers, another little tiny thing inside the CPU. Again, steal data applications, secure enclaves. We fix this by flushing the CPU buffers when we context switch, and a BIOS update. And then there's a whole bunch of other variants that came out at the beginning of the year. Store to leak forwarding, Meltdown UC. Again, all these things we can steal data across secure boundaries. We fix it by flushing the buffers on every context switch. And middle this year, after I started giving this talk, another one came out, SwapGS, another like Spectra. Again, found by reading Intel patents, which was really funny. And again, when we do these flushing, when we flush these buffers, it takes time. So this one was a one to 5% performance hit, the other one, the other performance hits. We flush the buffers every time we hit this stuff. And that's the only way to solve this problem. Well, flushing buffers, maybe, takes time. Every single one of these mitigations to solve these problems, to solve these hardware bugs slows down your machine. People have tried coming up with ideas called gang scheduling to only schedule certain applications on certain parts of certain cores. It isn't there yet. Other operating systems have tried this as well. There's patches on the LKML to try and do this well. The performance is horrible. It's much, much faster to just disable hyperthreading than to try gang scheduling. Gang scheduling is something that's been in academia for forever. It isn't ready for the real world just yet. So the only way you can solve all these problems is disable hyperthreading and do kernel updates. So you can't just do one or the other. You have to do them both. Even OpenBSD had to do the other thing. We are slowing down your processor. We're slowing down your workloads. And that's the only way you can solve these problems. Sorry, but that is it. But how much does it slow down depends on exactly what you do. So I do two things on my day to day. I read lots of email, write lots of email, but I bundle up a whole bunch of patches and send them off to a build machine. And that bundling up of things and sending it all off is pretty much all IO bound. And with all these mitigation, in fact, it just is about 2% slower. So not really a huge difference. But another thing I do all the time is I build kernels. My machine, my build machine's building kernels. My machine's building kernels. With all that stuff, I see a slowdown of about 15. And now I didn't update the numbers. About 20% slowdown. And that's real. That's real and that's noticeable. In order to solve these security problems, you will slow down. But again, it depends on your workload. Some workloads are fine. Some workloads are not. The scary thing is the kernel developers, we fight for a 1% speed increase. 2% speed increase. People updating to newer kernels should always go faster and better. And that's normally true. We put these mitigations for these security things in and we go back like a year in performance. It's sad. If you're using a year old kernel, you go back even farther. Use the latest kernel. You might come out equal. But it is a problem. We do slow things down. Now, syscalls in the kernel are expensive. Linux used to have the fastest syscalls available. Now, syscalls slow down. So you have to be aware of that in your application, in your workload, and you have to take that into consideration. But everybody's workload is different. You need to test to see what's going on. Now, you have to choose between performance or security. And that's something you should never have to choose. You should be able to buy some hardware and expect it to work, how it's supposed to work. But now, you need to choose, and you need to figure out what your cloud provider chose. I've been giving this talk pretty much all year. Up until two weeks ago, my cloud provider chose speed over security. They finally switched that, and now they're choosing security over speed. And my build loads went down. But it's now secure. Check your cloud provider. Check what your system is running on. Everybody did differently. Hopefully, they've gotten it right. But it does affect you. And kind of tongue-in-cheek, make Linux fast again, gives you the kernel command line to disable all this stuff. I did that. And my kernel builds on my laptop go up by 20%. If you know what you're running and you know your environment, you can disable all this stuff. If you are running in a secure environment that you do trust all the applications and you do trust your users, then do this. And you get the feedback. But otherwise, if you're running in a shared environment, running untrusted code like JavaScript or even a browser, do this. You need to be secure about this stuff. But it's a good list of all the command line options. Last year, I talked about Linux, Inspector and Meltdown, and how we were involved, or really not involved in all this stuff. This time, we were better. The stuff that came out, we had patches available on the announcement date. We Intel notified most of us in advance in time. We worked with all the other operating system developers. We have a pass and way to do this stuff. We have a process in place in which to talk in a secure way, excuse me, share patches, and work together. It's really, really good. But it isn't working really well yet because Debian was only notified about all this stuff 48 hours in advance because they weren't considered a company. Turns out 80% of the world runs either Debian or kernel.org kernels. The world has moved on. The number of processors out there, this is not including Android, the number of processors out there running Linux, not including Android because the rest of the world is a rounding error for Android. 2.5 billion devices. Crazy. 80% run Debian or kernel.org. So that is very important to keep in mind. It's not just Red Hat. It's not SUSE. It's not canonical. It's kernel.org and Debian. Those are still big issues, big users, but the rest of the world has got even bigger. So to ignore the community distros and whatnot is a big, big mistake. Intel's now learned. They're finally talking with Debian, kernel.org me. So it's getting better, but it still has some work to do. I just had a meeting with Intel. All the companies, all the distros and kernel developers got together a couple of weeks ago, and we yelled at each other, and it was all fun. We're all working together. More fixes. So the big problem is everybody sees the cool logos. It gets pressed. Patches go out, and everybody updates their machines, and they think they're fine. But the kernel developers, we find problems because we can't test this in the public so we don't see all the real workloads, so weird hardware issues out there, so then we fix more problems, and then we find more bugs and we fix them, and we fix them and fix them. We're still fixing Spectra 1 issues two years later. You have to keep updating your kernel. So always update your kernel, always take the latest kernel, and always take the latest BIOS update. Those latest BIOS updates and those microcode updates are being pushed by Intel and the vendors. They're testing them, take them. They're there for a reason, not just for fun. Take them, reboot the machines, reboot your world, revert the cloud, it's fine. You have to keep updating your kernel and you have to keep updating your BIOS. Everybody's like, we don't want to update our kernel, we just want certain patches and whatnot, but that's not how the Linux kernel security model works. We do kernel security fixes at least once a week that I know of. There's a lot I don't know because a lot of bugs that happen and are fixed in a kernel aren't determined to be security issues until years later because somebody looks deep at that. We're fixing 22, we're applying 22 to 25 changes a day to the stable Linux kernel. To the old kernel that was four years old, 4.4, we're still doing 10 patches a day. These are bugs being fixed in a kernel, take these fixes. The kernel treats all bugs as a bug as a bug, security or not, we fix it and we move on. So don't be just worried about these times when we have special press releases and whatnot. We take fixes and we find problems and we fix them every single day. Jim talked about SYZBot is finding so many bugs, fuzzing and testing these things. They're going to the stable kernels, they're getting back ported and everybody's happy. You need to keep taking these fixes. And don't look for CVEs. CVEs mean nothing for the kernel. Very, very few CVEs ever get assigned for the kernel. Again, I'm fixing 20 patches a day. I could create a CVE for every single one of them. I was told not to, because it would burn the world down. It's kind of funny, but I gave a whole talk about this in Paris a couple weeks ago. We are talking about ways to track this a little bit better, but just take all the security updates. Again, CVEs do not work very well for the kernel. Look at Spectra. Spectra one had one CVE and it's taken like a couple thousand patches over two years. They don't list them, but you don't realize it, but you need to keep updating to solve these problems. There's a link on my talk on how the kernel security team works, the reasons behind how we do what we do, and how patches get out to you. Just take all the stable updates all the time. In fact, the Android, Google last year, oh, I'll talk about that in a minute. So look at this, yeah, Google did this work too. Looked at all the CVEs for the kernel for the past 12 years. How many thousand CVEs? That's ridiculously slow. The majority of all fixes have a negative date. So CVEs are asked for on an average of a fix that was fixed 100 days ago. That shows you the CVEs really don't mean anything. It shows you that people use CVEs in order to grease the wheels of their internal engineering processes. 88 of them fixed in a week. Standard deviation for these numbers is over a year. The biggest one is one we fixed, we asked this for a CVE for a fix we fixed 12 years ago. Longest one is we haven't fixed it in 10 years. Maybe CVEs don't really matter. They don't, I'll tell you that. So don't think of CVEs as kernel issues. But look at what actually we fixed. So the Android, Google security team comes to the pixel team and says, hey, take these fixes because we find these problems. We found them reported, we dig it through the code. And last year for 2018, 92% of all the bugs that they asked for were already fixed in the LTS kernel before they asked for them. I fixed it and it was out there for people to use. And the tiny percentage of fixes that were not there was only due to code that they had added to the kernel that was not upstream or that they had back ported in an incorrect way. Every single thing was fixed before they knew it. That's what we do, if so much. So every single, so now Google is requiring newer Android devices to take all the LTS releases. The new Android update comes, we say you have to update to a newer LTS release and maybe we should start taking them as time goes on. I'll call out some vendors. Sony has been very, very good. Their phones update with the latest LTS release every couple months. They've been doing this for over a year, no problems. Sony has good phones. Essentials also doing that. Pixel's starting to do that. There's a whole bunch of other issues like that. Make sure you take the LTS fixes. So much so that I now publicly say, if you're not using a supported distro, I'll qualify it by that way or a stable long-term kernel, you have an insecure system. It's that simple. Sad, it was that simple. So if you're trying to do this on your own and not take LTS updates, you have an insecure system. You have a supported system by a company that supports and maintains the security, the secure kernel, like Red Hat, SUSE, Econautical, Debian, Debian is great. You're secure. If not, you have a problem. So all your embedded devices out there that are not updated, totally easy to break. When I gave this talk first in China, I had this response by somebody in this room. It is, this is a sad talk. All hardware always has bugs. It's a job of a kernels to paper over the bugs and hardware to make it look like a unified system to user space. That's a job of a kernel. The problem is when the hardware has security bugs that we have to fix and that we have to slow things down. That's when you actually see these, because otherwise we fix bugs all the time, we fix bugs in our own code, we fix bugs in hardware, but hardware has bugs just like software does. So you have to update the bias and you have to update the kernel. We are fixing these bugs before you realize it. Again, Google publicly documented this fact. We fixed things before they knew it was even a problem. That's good. So I keep saying all in conclusion, disable hyperthreading, you'll go slower, I'm so sorry, not my problem, but an O and open BSD was right. And always update your kernel and your bias and everything will be okay. Thank you very much.