 Hi, I'm Greg. I'm one of the rare kernel people in the world, it sounds like. Um, I tried to do my title in all icons. Um, that's MDS, Fallout, Zombieland, and Linux. And my talk is about the follow-on, where last year's talk about secure, um, Spectra meltdown. And it's how you can make sure you use Linux in a safe way. That's the main overall goal here. So, again, the same disclaimer. I'm vastly oversimplifying everything. Um, see all that I wrote down. Good notes, details how you can do this. See the link. It's all there. Um, let's talk about MDS. MDS was the last big security problem that was released about a month or so ago for Intel processors. It's the same type of family that Spectra and Meltdown was. These are bugs in your CPU. These are bugs in the way the hardware works. And the whole goal of an operating system is to make the user space all the programs, like Linux says, run well and make it not care about what's underneath it. So our job of the kernel is to fix all the hardware bugs. Um, that's our role. Um, Spectra on MDS exploits the fact that CPUs look ahead in the future. They try and figure out what's gonna happen next, and then they roll things back. And when they don't roll things back properly, you can leak information. There's lots of different variants of this, lots of different ways, because CPUs are very complex things. And like I said last year, it's gonna be with us for a long time. I'll say again, looking in the future, we will be dealing with this even longer. It's gonna be around for a long time. So MDS is the generic name for all this class divide, all this category. There's one called RIDDLE, Fallout, Zombieland, and a few other names. These are all CPU hardware bugs again. Variants of the same basic idea. Um, again, only Intel CPUs this time. They were the only ones found to have these problems. Researchers tested a lot of different chips. Um, a lot of different researchers found these all at the same time. Just like Dan said, evolution causes the same type of... uh, different groups of people to find the same areas and find the same things independently. I think there's five different research groups that found this same problem within weeks of each other, which is amazing. Luckily, they all worked together in the end, but that was very, very rare, but it's also continuing to happen. Researchers are keeping looking in this area. In order to fix this, you need to do two things. You have to update your operating system kernel, and you have to update your BIOS. If you just do one or the other, you will not be safe. You have to update your BIOS. And that means you have to power-cycle your machine. I'm sorry, I can't do anything about that one. Um, all of these in the secure world, it's not that big of a deal because you can't modify other people's execution. All you can do is read other people's data. And that's fine if it's not a problem if you have a system that you run all the code on in a hardware-controlled system. When you're running anybody else's programs, you can't read, you don't care if you can read your own data. But when you're running in a shared resource, like a processor on a cloud computing which is shared with other unrelated people, or you're running just on a desktop when you have different programs running at the same time, like browser windows, you don't want other one browser window to be able to read from another one, then it matters. But you can read data, and you can cross the virtual machine barrier, and you can read the system to see what the process is. And this is the first fact that CPUs do something called hyperthreading or SMT. And this is the way CPUs try and share resources. They share the TLBs, which is the low level way CPUs look up memory addresses. And they share the CPU cache, the first level cache. And by sharing those types of things, it turns out that things can leak. We thought this might be an issue a long time ago, And it turns out OpenBSD was right. I give these people huge credit. They were right for the wrong reason, but they were right. They said over a year ago, disable hyperthreading. People laughed at them, so why would you ever do that? And again, in August, they said, please disable this. We think there might be problems in this area. And they were right. If you had disabled hyperthreading, almost all of these issues are gone. Almost all of them. Without any updates for the operating system, any updates from the bias, all these issues are gone. They chose security over performance. Because when you disable every other CPU in your system, you will have performance issues. I have huge respect for them. They got this right. Good job. Listen to what they're doing. Different operating systems, though. We didn't do this. So let's talk about this. So Riddle was the first one. That's the most popular one for MDS. It means rogue, in-flight data load. All of these issues are how CPUs actually work underneath the operating system, underneath the assembly language. These deal with how operating systems, how chips work inside of themselves. And they have something called line fill buffers and load ports and how data moves around inside the CPU. These are exploits that up in application level, we can exploit the way the hardware works inside. And it steals data across applications, across virtual machines, and crazily enough, across into and out of secure enclaves. Which Intel had created saying, this is a secure way you can never get in and out of. Turns out it's very porous. Luckily, nobody really uses this yet. So it wasn't a big deal, but from a researcher's point of view, it was a huge, huge deal. We fix this by the CPU or in the kernel, we say flush the buffers every time we switch context. And of course, update your BIOS in order to take care of the advantage of that flushing. Fallout, another good icon, another good name. It exploits the store buffers in that CPU. This one is a little different. You can't run read application application or across virtual machines. You can read from user space into the kernel. And the kernel stores lots of secrets that you don't want other applications to read. Shores keys, stores other memory locations that you can see what other applications are doing. You want your kernel an application not to be able to read into the kernel. This breaks that. As part of the way the kernel tries to protect this, we do something called randomizing the addresses. This totally breaks that. And our fixes for meltdown, the last big security issue, actually made Fallout easier. Because we thought we were keeping things in separate ways, separate memory spaces. Fallout exploits that and makes it easier to figure out what's going on. The researchers credit us for making their job a lot easier. Again, you fix this by flushing the buffers in the kernel, updating your BIOS. Zombieland, great logo. Great marketing. It's really the same thing as Riddle. You can steal data from application application, but they had a great logo. It's a wonderful, wonderful demo. Go look at the website. Look at the video for this. Run the program. You can steal. You can run and browse in one window. You can open another window with the app. What's the problem in it? And you can see what the other window is doing. Huge, huge marketing win. Great fuel name. We fix in the kernel, flushing the buffers. Every context switch. We hand over. We moved on. And again, there's other variants. Flushing, meltdown, you see. These are all little tweaks on the same general idea. Same areas in the processor. Again, you can steal data. You can just read data across the security boundary. Again, we fix the problem by flushing buffers. Flushing buffers is slow. There's a reason we never did this before. And the reason the BIOS never did this before is because it slows things down. You don't want to slow things down. Every time you call into the kernel and out of the kernel, traditionally, system calls in and out have been fast. Now we have to flush these buffers. It slows things down. A way to get around this is you can take every logical CPU and only run logically security-wise the same type of problem. Process here and over here. It's a really hard problem. Academically, they've fixed it. In reality, they haven't yet. Something called gang scheduling. There are kernel patches out there to solve this. To try and do this, it really is slow. It's getting better. Microsoft for Windows actually has this option. You can enable it if you care about it. It slows your machine down. It is one way to potentially mitigate these issues. It's not ready for prime time yet. The best way to do is disable hyperthreading and update the kernel. That's the only way to solve all these issues. You can't do both, or one or the other, you have to do both. A lot of Linux distributions recommend just disable hyperthreading and moving on. But doing that slows things down. My development, I read lots of e-mail, I write lots of e-mail, and I do two things, really. I build kernels a lot, and then I create a kernel locally and shove it off to another machine to build it there. Building kernels is very, very CPU-intensive. More processors I have the better it goes. This slows my workload down. If I don't disable hyperthreading, it's about 2% slowdown, which is kind of in the noise. If I do disable it, it's noticeable. 15% decrease. That's real. That's a real performance hit. I notice it takes an extra couple minutes. Don't like it. My kernel creation is single-threaded, though. I take a git tree, clone it, apply a whole bunch of patches to it, tar it up, send it off to another machine. All single-threaded. SMT disabled. At all. Same amount of time, identically. So run the test yourself. Test your own workload. Everybody's workload is different. We all use Linux in a different way to solve your own problems. It might be that this doesn't affect you at all. It might be affect you a lot. We don't know. So it all depends. Again, syscalls are now very expensive. Some programs do a lot, and some programs don't. If you depend on IO, that hasn't changed. Things like that. Test your own workload. The bad part about this is now you have to choose performance or security. And that's not a good option anybody ever wants to make. And you are relying on your cloud provider, also, to make that choice. I have a number of test machines in the cloud. My cloud provider made the choice to go performance over security. Look and see what yours did. I'm now switching cloud providers. I'm now switching cloud providers. You might need to see what your provider does. What did they choose? What do they care most about? There's a website called Make Linux Fast Again. It's kind of a joke. But it gives you the command line to change your kernel boot to rip out all the security things we've done to make things slow and make it faster again. My kernel builds are 15% faster now if I make it again. That's how much things have decreased in the last year just by dealing with all these security issues. It's a real performance impact on real people. Take a look. It's kind of funny. How did we deal with this? Linux last year with Spectrum Meltdown was involved really, really late in the process. Intel siloed us all, wouldn't let us talk to each other. It's reaction we reacted after. It was even public. It was a total and complete nightmare on our part and on you as a user. Intel got it better. They worked with us. They brought us in really early. Most of us in really early, not all of us. And we got it done. We got patches when it was announced by the researchers. We were ready for patches. Went out to the world. All the distributions were updated. Everybody was happy. But like Debian, which the majority of the world runs, wasn't brought in until 48 hours before the release. I will credit the Debian developers for actually doing more work with all the other distributions in getting this to work right because they had to scramble at the very last minute and they did a great job because if you rely on Debian, most of the world does, their stuff was impacted. They did a wonderful job. So that needs to fix. Intel needs to bring Debian in earlier. As always, when we have private announcements, we can't have all the developers look at it. We don't see all the weird workloads. We don't see all the odd CPUs out there. So we fix things like instantly afterwards because we had some reports. Things that we just saw patches yesterday that made things go a little bit faster because we didn't need to do some of the things we did. So you can make things go a little bit better. Fixes keep coming all the time. They keep coming every week. You have to keep updating your kernel to make sure you're correct. It wasn't just the original announcement. It's all the updates that come after that you also have to take and update your bias. It will not work without that. So updating every week. We do a security release. I do a new kernel update every week. We're running about 22 changes per day in these kernels. So one a week, a whole bunch of patches come out. At least one of those fixes every week is a security issue. Sometimes we don't know until years later because in the kernel, we treat a bug as a bug as a bug. We don't know if it's a security issue or not. We don't care. We fix it. We move on. We get it out to the world. It's up to you. Take it, whole thing, update your machines, and then you know you're secure. And people look at CVEs. CVEs is the way of tagging security issues that the world has been using for a while. And you notice very few kernel issues are CVEs. It's not a thing. Do not count on CVEs to mean anything for the kernel. There you go. A very small number of the fixes we ever do in the kernel get a CVE. If you try and just cherry-pick kernel a CVE to this fix, to that one, and backport it, you will get it wrong because there's follow-on fixes. If you looked at Spectra Meltdown, that was like 80 patches. And then it was like an 80-nother patches after that. And we never tagged them all for all the CVEs. We didn't realize it until later. And this is not how Linux works with security. I have a link there. There's a whole big long article about how this kernel security team works. You can download the presentation, go to the link, read that. This is how we work. Again, very, very few kernel issues are actually get real CVEs. One of the security researchers looked at all the past work that we had done. In 10 years, 12 years, we only had 1,000 CVEs for the kernel. 20 fixes a day. That is a hugely... a disproportional amount of what is really going on. The weird thing about kernel CVEs is the average time between when you apply for a CVE and when the bug is fixed for the kernel is negative 100 days. So that means 100 days later, somebody says, oh, look, that issue you fixed a long time ago, that really was a security issue. Let's go get this number so we can track it easier. That means for 100 days, if you hadn't updated your machine, you actually were vulnerable to that issue. But the data is skewed all over the place. Again, 40% are negative. The standard deviation of these numbers is 400 days. That's over a year as a standard deviation. CVEs and the kernel mean nothing. It's crazily weird. So I worked with the Google security team and tried to figure out how is the best way to do this stuff. And they looked all last year. They go and they look through all their research and find all the problems that are public and all the problems that aren't public. And they tell the pixel security team, for Google phones, take this patch, take this patch, take this patch. We want to make sure our phones are up to date. What not? And they identified last year 218 different issues. They said, hey, go take this patch. It turned out every single one of those, 92% of those issues were already fixed and only released in a public LTS kernel, a stable kernel. They were already there. The only ones that were not were code for that they had imported into their tree from not upstream kernel or things that they had backported incorrectly. Everything that was there that they identified was already fixed. Their phones were already secure before they realized it. So now Google Android recommendations is you have to take the LTS kernel updates. Take it every three months. I'm happy every one month would be wonderful. Some manufacturers are doing it every month. There's one manufacturer that is very good. They push new kernel updates every month. Some every three months. Some not at all. Watch this. Look at the kernel version number of your phone if you want to see how secure your device is. This is real proof that taking new kernels is the only way you can be secure. You can use these kernel kernels. In fact, I said this a couple of years ago at one of these other conferences. The only way you can ensure that you're running a secure machine is if you're using the latest LTS kernel or stable one or working with a distribution that does it themselves. Debian, very wonderful. SUSE, wonderful. Red Hat, wonderful. You use, rely on them. They're very good at that as well. If you're not relying on that already, the world does. Use the LTS. Use the kernels. We're releasing because everything's already fixed in them already. That's the best way to be secure. Thank you very much.