 Hi, I'm Greg. I can do a whole talk and just icons. So last year, I've been giving a talk about what happened with Spectra and Meltdown. And this year, I'm actually talking about what's happened in the past year. It wasn't as publicized, but it's just as important and just as nasty in a way. So I gave this disclaimer last year. I'll give it again. This is a vastly oversimplifying all of this. I don't actually have code in this talk anymore. There's a whole bunch of really, really good details, links to technical papers, links to the academic papers that are written about this, links to everything else in the presentation, links that are in here, too, as well. So go download the presentation. So let's talk about this. So MDS. MDS was announced beginning of this year as yet another type of Spectra Meltdown issues. These are, again, hardware bugs. These are bugs in the CPUs themselves that the kernel is then responsible for fixing. Because it's hard to fix hardware, you have to buy it. The CPU life cycles are years long. It'll take a while. All these issues exploit how processors see in the future. So in order to go faster, you have to guess what's gonna happen next. The problem is when you guess what's gonna happen next and you guess wrong, you have to unwind. And it turns out a lot of CPUs don't really unwind everything. If they did something and it affected the system and when that happens, you can take advantage of it in a side channel and get some information. There's a whole bunch of different variants. A lot of them got lumped under this name of MDS. And like I said last year, this is gonna happen for a very long time. I've been giving this talk already for a number of months and I'll talk another issue just was publicly released two weeks ago, I'll talk about, and there will be more. This is not going away. So again, all these are hardware issues. MDS really only affects Intel chips. No other processors are affected, no other hardware companies are affected. Spectra Meltdown did hit all the other companies, PowerPC, AMD was only half affected. But all of these are for Intel. And there's all again the variant of the same problem about going ahead in the future unwinding. And the only way you can fix this is if you fix it in the kernel and you update your BIOS. So you have to update your BIOS. We have to work hand in hand. They have to do things in the processor to fix this. And if you don't update your BIOS, you are still at risk. So please update your BIOS. It'll ask you to do so. Don't ignore that. You have to do that. And if your vendor is not providing you a BIOS update, you are insecure. So push back on your vendor. They have to have this. So MDS is again the same type of thing as they're not a really extreme security issue. You can't write something to somebody else. You can't cause something else to happen in the machine. All you can do is read what somebody else did. But the interesting thing about this is it'll cross the virtual machine boundary and in cloud computing, you're running untrusted things in different virtual machines. You don't know who else is running on your machine. So this can be a real issue. I can read a data of somebody else. Somebody else can read your data. And that's not a good thing. It can also cross in the user space. You can run web browser and read other web browser tabs and things like that. So that's exploited ways as well. The big thing about these MDS issues is it exploits hyperthreading. Hyperthreading is when you put multiple cores and they share a CPU cache. So you share some caches and there's a way that Intel and a number of other processor companies have gotten to give you a fake second CPU. But when you start sharing resources, it turns out you can see the resources that the other people are using. Again, you share the TLBs and they level one cache, sometimes level two cache, depends on your chip. These are the problems that are issues here. I wanna call this out. OpenBSD was right. When Spectrum Meltdown first came out, they told people, disable hyperthreading. We think there's gonna be more issues than here. They were right for the wrong reasons. They didn't guess what was really gonna happen, but they knew something could happen here. And they took the heat. They took the heat from a lot of people because when you disable a hyperthread CPU, you are now removing a processor from your machine and it can affect performance big time. OpenBSD again repeated it in August. Please disable hyperthreading because people weren't doing it. And because of that, they were not vulnerable, almost not vulnerable. They had a little bit of vulnerability, but they took away almost all the vulnerability because you disabled hyperthreading. They chose security over performance and you have to give them credit for that. They were ahead of the curve that way. Now, all the Linux distros say disable hyperthreading. All the other operating systems say disable hyperthreading, preBSD, NetBSD, even Windows. You have to do that, but openBSD was first. Gotta give them credit for that. So that was MDS. Here's the specifics of them. One was called RIDAL. It was rogue in-flight data load. There's a cool research paper about this. Inside CPUs, there's something called line fill buffers and load ports. CPUs do, there's a whole nother machine way down underneath there that does all sorts of other complex things that I'm learning way too much about. I never wanted to know. And these again, you can steal the data across an application, across the virtual machine and out of a secure enclave. And this is really unique. Intel processors have this thing called a secure enclave that it was supposed to be trusted. Nothing can ever get out of this at all. It turns out everything can get out of these. They were totally and completely broken. We fixed this by flushing some buffers in the CPU when you do some context switch and again, a bias fix. So we flush buffers and it fixes it. There's another paper that came out called Fallout. This one had a cool logo. Again, the CPU store buffers, not load buffers. Same idea. This one, from user space, you can read what the kernel is doing. And normally that's not a big deal if you own your machine, but sometimes you can read secrets like if you're in a container from somebody else's container. Or you can read secrets in the kernel where they have keys, other issues like that. Not good at all. It totally and completely broke the random kernel addressing. And a nice side effect of meltdown is it made it much easier to exploit this. Side effects, it is bad. So we fixed this by flushing the buffers in the CPU on every context switch and a bias fix. Zombieland, best logo ever. These guys have a really, really cool demo. A nice little demo in the browser. You can read another browser's tab. Good at marketing. These academic people are really cool. Just like Rital, again, steal data across applications out of virtual machines, out of secure enclaves. Another little different variant. We fixed it by flushing the buffers on the context switch. That's fun. And then there's a number of other ones. All of these got announced at the same time. Store to leak, meltdown to UC. They all, again, reading data across security boundaries. Places that you should not be able to break. You can read data across them. We, again, fixed it by flushing the buffers on the context switch. No fun. And then two weeks ago, another one came out. Swap to yes. This one's a little bit different. It's more like Spectra was in the beginning. And the best thing about this is Intel has documented all of this stuff in their patents. Security researchers and academics there's a great tweet of one of the academics who I think published this one showing all the patents they printed out from Intel and how they're finding these issues. It's really funny. So now you have all these professors out there reading patents. There's going to be more. Again, they found it reading patents. So this is public knowledge. One to 5% performance hit depending on your workload. And we fixed it by flushing the buffers. Fun stuff. So all this flushing buffers is slow. So all of a sudden, the kernel has to do a whole bunch of more work that it used to never have to do that. We can't do anything about this. There's a way in theory that we could do something by scheduling things in different ways and different processes and whatnot. It's a wonderful academic theory. There's nobody actually been able to do this in a way that works. We have some cool patches out there for it if people want to try it and work on it. But so far it is not a viable solution. The best way to do it is disable hyperthreading. And then almost all these issues go away. So all the distros, all the processor company, all the Linux, all the operating system companies are saying disable hyperthreading. Do it. And that's the only way to solve this issue. So if you're worried about this, you have to do this. Not good. But that's the only way to do this. But how slow is this really? And it all depends on your workload. So when I do things, I do two things all the time besides read email. I build kernels and then I make a kernel and ship it off somewhere else. When I build a kernel, highly OS or highly IO and CPU bound, huge issues, it is a 15% decrease on my workload. That's noticeable. That's real. If I leave hyperthreading on with some of the performance issues, we're still flushing some buffers, there's still a little bit of security issues. It's only 2% down. But that's not very good. But when I create a kernel, it's all one thread. It's all IO bound, no impact on my performance at all. So it all depends on your workload, whether you'll see this or not. Run your workload. Everybody has a different workload. Run it and test it. Because now the problem in the kernel using Linux is syscalls actually matter. Before, Linux syscalls were the fastest thing out there. Now they're slow. And now you need to worry about that. Depending on which syscalls you're doing, if you're doing an IO syscall, now it's not as slow because you're doing IO and that's the slowest thing in the world. But if you're doing other syscalls, they can be slow. And that's a real problem. Everybody's workload is going to be different. There's nothing we can do about it. So test your workload. It's all different. So now you have a choice. You have a choice whether you want to go for performance or whether you want to go for security. And everybody's choosing differently. My cloud provider shows performance. I don't know if they're going to be my cloud provider much longer. The cool thing about the kernel is we export all this information. And you can see below you what is happening in the virtual machine and on the physical machine whether it has these updates, whether the bias has been updated, whether the patches have been made lower than you in your virtual machine. Your kernel will report that information to you. Look at that and make your own decision what you want to do. And tongue in cheek, make Linux fast again, shows you the command line you can give to the kernel to disable all this stuff. All my kernels go back to 15%. This actually, all my numbers, my performance numbers were before swap Gs. So add another 5% to my workload. So now my kernels can be 20% faster back to what they were two years ago. The kernel, all new releases are usually faster. Now we're actually having to go slower to fix hardware bugs. It's not fun. But look at what your cloud provider did. Everybody did something differently. Hopefully everybody will make the right decision for cloud providers at least if you're running untrusted virtual machine to be secure, I hope. So I talked about before, Spectrum meltdown with Linux was a total and complete nightmare the way it was handled and whatnot. So for this, we were much better. We've gotten a lot better. We had kernel fixes for everybody available on the announcement date. Intel did talk to us and we let us work together in a nice way. It was much, much better. We worked with the other OS vendors, we worked with the BSDs, we worked with Windows, worked with Xen, we worked really well together. Much, much better. It still needs some work. I will call out Debian was only notified two days in advance. That was not acceptable. For SwapGS, Debian wasn't notified at all. So Intel actually regressed there. They've since fixed that and they are now knowing what's coming up next. But Debian worked their butt off and had things ready. I gotta call out a very good job. Because it turns out 80% of the cloud systems out there are either running a kernel.org kernel or Debian. And companies don't realize that anymore. They think that the old model of the enterprise distros, which their market has grown and they're still important, but that's not the majority anymore. The majority is Debian and kernel.org-based kernels. So AWS has a kernel.org-based kernel that they use on their systems, things like that. You cannot ignore Debian. That is not acceptable until it's getting better. But it's doing that. The thing is we release these things now a larger amount of kernel people can see it and we fix another thing a week later. And again, keep updating. We're still doing fixes. We're still finding problems. We're still fixing other issues. It is not a matter of patch once and you're done. You have to take these kernel updates every week. Always update your kernel and always update your bias. That's the best way to do this. And along these lines, talk about security fixes overall for Linux. You have to do this. We update and push out a security fix at least once a week. If we know it or not. We're doing about 22 patches a day in the stable kernel releases. Those are all known bug fixes. You need to take those. Sometimes we don't know if a bug fix is a security fix or not until years later. Infamously there was a bug in code that I wrote and then fixed that wasn't figured out till three years later that it was a security fix. I didn't even know it. So take these fixes and do it. The kernel, community, mantra is a bug is a bug is a bug. We fix it, we push it out and we go. And we don't get CBEs. CBEs for the kernel mean nothing. We don't assign them. We don't do anything about them. You cannot rely on a CBE to say, oh, I got the CBE fix, I'm okay for the kernel. It doesn't matter because only a small fraction of the fixes for the kernel get a CBE at all. I will never reply for one. There's issues with trying to revoke them when they're invalid. It's just a nightmare. For the kernel, it doesn't matter. It does not work well because we have fixes like spectrum meltdown had one CBE number, but we had 80 patches and they were fixed by Gustavo more spectra patches last week. So he's been fixing spectra patches for about two years now. You can't say it's one CBE every, here's the 200 patches. It does not work that well. Somebody, so there's a link. You can look at my talk, how the kernel security team works and why this matters and how we do this type of stuff. Read that document. It's a big long white paper I wrote a long time ago. It's really, really relevant. And another thing is we've looked at what CBEs are actually issued for the kernel. Case Cook did this research. We looked at 12 years of the kernel CBEs. The best is negative fixed date. Average fixed date of negative 100 days. Think about that for a second. Somebody says, I wanted a CBE for this bug fixed. So I go, okay, great. They assign a CBE. He's like, what was it fixed? It was fixed four months ago. What? People are abusing CBEs to get around management problems with their companies. They assign a CBE because as the CBEs are there they have to get the fix in instantly, right now. Engineers are routing around the crazy policies that management has put in place. That's what this is proof of. I mean, look at the negative 3000 days. I don't even know why that CBE was assigned. The standard deviation is what? 400 is over a year. This data is all over the place. It means nothing. CBEs and the kernel do not work. I've sat down with the Mitra people. We're trying to figure out what to do. They've said the CBEs and the kernel does not matter at all. Open source and CBEs for a lot of projects are not relevant. That's not what CBEs were created for. We're trying to come up with a way to have a better type of thing. I talked about it with earlier at a conference this week here and we're talking about it in Paris and another conference in a couple of weeks. We're coming up with some ways to handle this. For now, do not handle this for CBEs. There is a better way to do this. And Google actually has documented this stuff. They've proven this. We always said the kernel, the stable kernel releases, they have all these fixes in it before you need it. And Google's like, yeah, yeah, we don't believe you. So we didn't track the numbers. The kernel security team at Google goes around and finds all the problems that were reported, that are found, that are told to them, that are announced to them with their bug bounty program and then they tell Android, hey, go take these fixes. For all of 2018, 92 over 201 of all the problems were already fixed before they knew about it. Every single one of them was already fixed. The only ones that were not fixed were ones in the code that they had added to their kernel that was not upstream or that was there because they back ported features incorrectly and they got it wrong. So every single fix, every single problem that they found that they were reported or they dug up was already fixed before they knew it. And because of this, Google is now mandating for Android devices, you take the LTS releases. Pixel phones, they update to the LTS release. I'll call out Sony. Sony's been doing this for the past year and a half, at least. Update the kernel every two months or so with all the latest LTS, they are safe and they're good. Essential, another phone company, really good about this. They're doing the right work. I've seen the numbers so far from Google, it's not public yet. The numbers so far were over halfway, they're already in the 200s. So they're actually finding more and more problems. And again, we're on track record to be the same percentage again. So it is, take LTS releases, it'll be fine. So I will say this, I've said this is other conferences, please get this through your head. If you're not using a distro kernel that you trust, some distros trust more than others or stable a long-term kernel, you have an insecure system. That is the facts. If you can't update your kernel on a device, it's insecure. I'll say that right about the time. So I gave this talk, somebody in this room told me this quote, somebody else had said this needs to be the title of my talk. It's sad, and you should feel bad. We feel bad. The goal of a kernel is to paper over the bugs in hardware and make it look like a unified system to upper user space. You can run any kernel, run any hardware. That's the job of Linux. We do that really well. We've been doing that for 20 years. The problem is when the hardware has bugs that breaks the model of how we thought they worked. And you can't really fix it. And we have to do things to work around those problems that directly impact you. That's the only way we can fix this stuff. Hardware has bugs. We've always known about it, but now we have to fix this and we're fixing this before you know about it. Spectrum meltdown, zombie land, MDS, they're all issues. They're not as important as the whole security instance. We push out fixes that can cause your machine to crash, that can handle nasty network packets that do bad things. We fix those bugs every single week and we push them out to the world and people take them. Update your kernel. Disable your hyperthreading. Have to do that. We are now on board with OpenBSD. Again, they were right. Fully give them credit. Disable hyperthreading. That's the only way you can have a secure system. And update your kernel and update your bias. And yeah, it's sad. I'm sorry. Thank you very much.