 Hi. Sorry for the delay. My laptop hates me. Anyway, this is about the kernel self-protection project. I'm just going to switch pages on both so I can see my notes. So this is, like we talk about security and that's a pretty loaded word, so I'm trying to describe what this project is. It's more than access control, more than tax surface reduction, bug fixing, protecting user space. This is mainly about the kernel protecting itself from attack. It's like integrity alone doesn't cover the ideas around stopping the methods attackers use to gain control, for example. Not a lot of people in this room need convincing as to why this is important, but a lot of people do. I kind of think about having two and a half billion devices running Linux as pretty terrifying. The one aspect I try to talk about a lot here is the lifetime of flaws that we're dealing with. How long between when a flaw was introduced and released on some machine and when it actually gets fixed is very long, even from the perspective of the software maintenance, just looking at the Linux kernel project itself, like, oh, we fixed it from, you know, in however many months. But that doesn't tell you anything about the deployed systems. So trying to come up with defensive strategies to minimize that damage and make it so fewer flaws actually become security vulnerabilities is pretty important and shouldn't be considered not our problem from the kernel development community. I was inspired by John Corbett's analysis of CVEs, looking at the history and trying to say when was something introduced and when was it fixed. That's the lifetime I was talking about. And he'd come up with it. It was about five years between when something was introduced and when it was fixed, which seemed insanely long to me, but it turns out that's pretty stable. I've been using the Ubuntu CV tracker to look at kernel security flaws since their kernel team tends to find when was something introduced so they can figure out if they need to backport it to whatever kernel versions they've got. I was concerned for a while because the average was starting to grow closer to six years. And I started to think it was because in doing your analysis you're looking up to the end of git history. And your averages kept growing because, well, things were, you know, the flaw was introduced prior to git history. So the further away you get from it, the longer the average is between fixing it. Luckily, this has gotten better. And I have some graphs to sort of demonstrate that. It's kernel versions on the side. Beginning of git history is at the bottom. Current release is at the top. And the bar is how long things were exposed for. And we've got, I'm only paying attention to critical and high CVs because I would go blind trying to analyze all the medium and lows. So the data is pretty noisy for those, but this is pretty decent. This also, I note, ignores the mitigations for CPU issues because those aren't bugs in the Linux kernel software. Otherwise, things would look completely insane. It's also worth noting that this is a subset of flaws. This is things that got a CVE identifier. And not all flaws actually end up with those identifiers. So this is sort of the, what I would think of as the best case scenario of what you're looking at as far as lifetimes because you don't have data on the things that were fixed and stable updates at another time. And a lot of people, say, have said in the past, well, this is all theoretical. No one noticed that this flaw was open for five years. And that's just not true. And thankfully, we've had some cases where black hats would happily boast about them finding it the day it was committed and exploiting it for a couple of years. This linked one is, you know, they found something in 2008 and used it until it was fixed in 2010. But as it turns out, most people doing this kind of attack work are not interested in telling you about the fact that they have a flaw. So there's a small look into that. But bug fighting continues. We're finding a lot of bugs faster and faster, but we're introducing them faster and faster. So it's a weird steady state. We have a lot of tools, static checkers and dynamic checkers are doing a good job. You could ask Greg K.H. about how many patches end up going into the stable kernels. And you can ask me also, because I looked it up for this talk, as of 4.1967, the last, the most recent stable release that is an LTS release, there are 6,610 fixed commits which is like 98 fixes per stable release which is usually weekly or sometimes more frequently than that. A great analogy from Constantine a while back was looking at the car industry in the 1960s which was that cars, like computers now, have been designed to run not to fail. The idea being that you can comfortably drive down the highway and everything's great and things work correctly. But then you blow a tire and everyone dies. So we actually want to get past the 1960s car thing and get caught up. The picture here is a 2009 Chevy Malibu in a head-on collision with a 1959 Bel Air. So it's a 50 years difference in car technology and the entire front cabin of the 59 car has been obliterated. And the 2009 is looking pretty decent. So we should get caught up because, yes, we know how to run but we really, really do not know how to fail correctly. We need to fail safely. And some people do not necessarily think about where all the things, you know, where Linux is used and how it affects people's actual lives. An example being there was a Futex kernel bug in 2014 that was turned into a towel root for rooting Android phones which was co-opted by a hacking team who sold it as a weapon to oppressive regimes who in turn targeted activists and their families and stuff like that. So it actually is more than, oh no, my server crashed. It can get kind of dicey. We want to do our best. So getting back to, yes, it's good to kill bugs. We have to do it. But we need to do a little bit better than that. So if we move on, we could say, well, let's kill entire bug classes so that none of those bugs ever come back. We will not reintroduce them because we've stopped ourselves from ever having that problem begin with. And this means also out of tree code can't hit those bugs because the infrastructure or usage pattern for creating that bug has gone away completely. And better still is stopping exploitation methods. Since we're always going to have bugs, make sure that the kernel itself is not easy to really take advantage of as an attacker. There's a lot of infrastructural things that are weak inside the kernel for no good reason. They're just designed the way they are because there wasn't, you know, it runs fine but it doesn't fail safely. One issue with this that there's a lot of pushback on is sometimes these changes make the lives of developers harder. Going back to the car analogy, it's kind of like, well, I put a, you know, a titanium plate along the door but now there isn't room for the window to roll down, so we've got a whole new engineering problem for that and sort of getting people to accept the responsibility of, okay, we've got to work around the defenses that we're putting in place. So, and actually the best is probably not using C at all. So, and while I'd like to get the kernel move to a galaxy brain level of memory safety languages like, say, Rust, that's really realistically going to take a very long time. So, in the meantime, there's the kernel self-protection project and this is aimed at protecting the kernel from attack. There's really a lot of people involved. Some people spend, you know, all of their time working on it. Some people just show up and do things on the weekends. It's individuals and organizations. I used to say our progress was slow and steady but I liked Alexander's comment that instead we should be considered a flexible and persistent and he had a great version of that. The numbers of people involved are kind of hard to count but that's why I'm up here to sort of represent all of that work. So, without further ado, this is sort of a year's worth of kernel releases and some details about things that were going on. So, VLAs are, you know, variable length arrays sized on the stack at runtime and have been a source of a lot of problems so there was an effort to remove them and in 2019 we're getting very, very close to getting rid of it completely. We're making conversion from atomic types to a tested ref count type that is still atomic but would actually check for overflows and underflows and those are slowly trickling conversions in and I also started looking at places where bugs have been found due to the fact that ref count T is in use in the kernel and we just keep getting them. More and more bugs keep showing up and we actually tripped on the underflow and overflow. Another systemic issue was implicit fall through markings for C's switch statement. In switch you have a series of cases and when you want to leave you say break and you have the option of falling through to the next case to continue processing in some way. However, there is no semantic information to say I meant to fall through here and I chose to oops, I forgot to write break so there is not a fall through mark of any kind and this has led to a lot of interesting or hard to notice bugs so we've been trying to mark all these implicit fall throughs and so an example here is put in 120 end markings and found in the process three different situations where break was missing like legitimate bugs. We've got some shift overflow helpers to expand the overflow, the multiplication and addition overflow helpers, CPU vulnerability mitigations, further restrictions in temp files for FIFOs and regular files and then clearing registers or unused register clearing on syscall entry so that they can't be used for leaking stuff into speculation flaws. In 4.20, VLAs were completely removed. I kind of like to talk about the, in the time span it took to get rid of all the VLAs because it was started much earlier than 4.19. I talked about it last year. System D I think had two exploitable vulnerabilities due to using VLAs. This is just not a safe feature to be used in C. More ref count conversions, more implicit fall through markings. The stack leak plugin for clearing the stack on syscall exits you don't have uninitialized state. Elena talked about this a little bit earlier this week. Stack canaries on other architectures weren't per task. There would be one stack canary for all tasks. It would be per boot, which was not great for leaking information. If you could leak from one task, you could attack another task. More read-only memory areas. And then to quickly talk about the bottom one, a recent refactoring of wait ID introduced a trivially exploitable controlled kernel memory override flaw. And none of the fuzzers noticed it. And it's because faults through the copy to and from user, if they end up targeting kernel memory, and that kernel memory say is not mapped because it just happened to be in a hole somewhere, the fault would look like it was unmapped user space memory. There was no distinction being made. So everything would just either, you know, fail quite, you know, quietly, or would terribly corrupt the system and would be found much later. So actually getting faults out of this means that really obvious bugs can now be found in there. And 5.0, we've got more conversions. RM64 gained a bunch of features on the linear mapping. Used to be read and write. And the idea was you could just manipulate that memory directly, even though the other memory, like the other portions of memory were no longer had individual markings. You could still manipulate the direct map and write to it as a target. So that's made read-only. More per task canaries. Some top-byte ignore stuff for helping sort of sort of pave the way for memory tagging, which I'll talk about briefly. And pointer authentication on RM64 and a kernel-only key ring that you can't get at from user space. In 5.1, we've got more ref count conversions, more bugs. And you're starting to see, like, the number of implicit falters found on average is about 10%. So 10% of the places where there was no break, the break was supposed to be there, which is, to me, an extremely high bug ratio on these things. So it's nice to soon be rid of it. Got a pitfd for having a stable interface for talking about processes. Slowly, one interface at a time is being introduced. At LCA, there was some discussion on validating how heat memory gets mapped into user space, and there were some ideas about, oh, here are some cases that we don't check for that should be invalid, and that change was made and immediately found two bugs. So clearly, this was an area that needed helping. LSM, the first pass at LSM, stacking landed in 5.1, which got talked about earlier. LSM, which was for scoping the ability of the capability that lets you change to any user, being able to define a mapping of what users you're actually able to transition to and from, because if you give that capability out, suddenly you can become anybody on the system, which seemed a bit strong for being able to make UID transitions. And then for stack, getting rid of uninitialized stack variables, the GCC plugin for that learned how to initialize scalers, because before it was only initializing things that passed by reference. So if you were ignoring warnings, when building a kernel, and the warning would say, hey, you forgot to initialize this scalar on the stack, and you ignored it, it would be an uninitialized variable, and now with that plugin, it will warn, but you will actually get it initialized. And in 5.1, or 5.2, sorry, more conversions, PID FD, we've had the free list randomized for the slab allocator, but not the page allocator, and that's been added now. Stack variable auto-initialization is now natively available in Clang, so it's not a GCC plugin, but it's a Clang option if you're building with Clang. PowerPC gains similar to the SMAP protections. More speculation things. User fault FD gained assist control knob to make it a privileged only operation, because you could get the kernel to arbitrarily stall on user space accesses, which would give you a lot of temporal control over making attacks, so that's been, you have the option of turning that off now, because it was really only KVM that needed it, and then a temporary memory map for doing kernel text poking, my hope is to start to use that to have, to turn more of the kernel area read-only at rest, and we can go in and update things, similar to how we update the text segment, we can update data segment in a similar fashion. So, expecting for 5.3, we're at RC5 now, so it's a couple more weeks, and then we'll have 5.3, so I'm pretty sure all these things are going to happen. The implicit fall-through markings are now done, so we're actually building with the warning enabled, and again, you can see about 10% of the cases found were actually missing a break, so this is held all the way through. We've fixed a lot of bugs, and more conversions, more PIDFD. So on x86, a common target for trying to bypass SMAP and SMEP was to manipulate CR4 and CR0, the turn-off SMAP and SMEP, so now we pin those bits on. So if you are attempting a simple ROP chain of some kind, you can't just use the native functions that exist in the kernel to do that anymore. You'd actually have to have built your own, so it makes that more difficult. It's no longer trivial. And we've gained heap auto-initialization, so we have that for stack now, so you can actually boot a 5.3 kernel and say, I want to treat all k-malloc as if it were a kz-alloc, so that everything you get out of the allocator has been zeroed. This has not been... The performance on that has not been optimized yet. The idea was to get that feature landed and then say, okay, what are the use cases where this is obviously stupid, where you allocate some memory, zero it, and then immediately fill it with something, zero it, and then zero it again. Finding those cases and getting rid of them, getting rid of that redundancy is work for the future, and adding more k-free sanity checking, because you could hand k-free any kind of random pointer, and it would attempt to do things with it. And for 5.4, more PIDFD, the kernel lockdown LSM, hopefully we'll go in, and we've gained some other helpers for string manipulation because frequently people pass the wrong size to various string copies when they're just dealing with an array, and you can actually know at compile time the size of that array. So you can use a helper instead. Various features that are coming. There's the open patch set for improving path name resolution, because there's all kinds of weird esoteric ways to bypass and escape containers and namespaces and whatever else, and that's going to address a lot of those things. I'd like to split out integer overflow detection so that we can start getting rid of that as an entire class of problem. Link time optimization and control flow integrity are working on Android, the ARM64 builds with Clang. You can actually do this slowly getting into upstream as we trickle bits and pieces and fight bugs in LLVM and other things. There's a lot of other stuff here that's getting worked on. And move on to talking about the challenges. We've got the conservatism from just the people not wanting to make changes. The kernel took 16 years to get Simlink restrictions in upstream, so that defense can drive in the United States now. And there's some responsibility issues. As kernel developers, we have to accept the need for these changes and we have to sort of accept the technical burden and make that sacrifice and have patience on explaining to out-of-tree developers how the kernel is developed. It's a very evolutionary process. We've obviously got the technical and resource challenges, so if you want to help out with anything here, please let me know. And that's it. You can get a copy of the slides. It's the Wiki page for the project. And that's it. Any questions? I'm not sure how over time I am. So I think you mentioned CR4 and CR0 opinion on x86. So what would that mean in terms of, like, unlocking RIProtect, for example, if I wanted to do that within a kernel module? Presumably I would be able to do it. You write your own routine. Got it. The disabling RIProtect is explicitly not supported as declared by the x86 maintainers. But I, too, have a kernel module that does that because when I boot up, I load that module and I turn off all kinds of things in the kernel that I can't get at normally, and then I unload the module and turn off module loading. So, yeah. Thanks. It will be out-of-tree, just so you know. Your module will stay out-of-tree if you do that. I'll talk about it in about 45 minutes. Thinking specifically about the memory corruption exploit mitigation piece, one of the things that's been observed many times is anybody can build a lock that they themselves cannot pick. What is the kernel self-protection project sort of vulnerability researcher, like, making sure these things really work strategy? I write really simplistic tests for all these mitigations. The LKDTM tool basically just tries to do all the wrong things and make sure that the kernel freaks out about it. At the same time, I tend to, you know, try to work with people who actually do the research. Like, I, you know, work with Project Zero internally at Google and say, you know, Jan comes and says to me, hi, I noticed there's a problem over here. And I go, oh, no. So, yeah. Like, having a small test framework to actually make sure that those things do what they're supposed to do seemed like obvious to me. Like, yeah, I turn the config on. It doesn't tell me anything. Like, I can look at the config. It doesn't say anything. But if I actually go and, you know, all right, here I am, I'm going to set my reference counter to, you know, one below intmax and increment it twice. Did I get the warning I was expecting out of the kernel? Can I continue to increment it? To actually do those tests. So, yeah, the LKDTM module in the kernel, which no one should use on a production system, will break the kernel in all kinds of interesting ways. And if you see other ways that it's not covering, which I'm sure there's lots, send me a patch. Blow things up. Yeah, it would be good to have more people doing research on breaking the kernel and letting us know, you know, for the purpose of that. My goal is to make all the books on Linux exploitation irrelevant. And now someone's got to write a new book. And I'll try to fix those things. And then we'll have rest. Anyway, thank you.