 Thanks. So this, as said, this is about the Kernel Self-Protection Project and sort of the umbrella for lots and lots of other people doing work. If you want to follow along with slides, they're at that URL, or they should be. And I'll just dive in. So for this presentation, you know, this is like security summit and context is kind of important. The fine security from my narrow view is more than just access control, and it's more than attack surface reduction, more than bug fixing, more than protecting user space. All of these things are important, but this is about protecting the kernel from attack primarily. And then, of course, you have to say, well, what are we actually protecting? What are we trying to secure? What is the justification for the work that we're doing? And we've got two billion Android devices. There's now a Linux supercomputer on the International Space Station. We've got a lot of stuff running Linux, as everyone here is well aware. Some scary notes on the two billion Android devices, of course, is that the vast majority of them are running a 3.10 kernel, which is extraordinarily old. Luckily for us, 318 is quickly catching up. So we're still gone from ancient to very old. But this sort of underscores a problem with bugs, like the security problems that kernel self-protection tend to focus on are flaws that violate the trust boundary between user space and kernel space. So if you think a bug is, you know, the lifetime of a bug when it was introduced in 3.10, has it been fixed in your phone? I don't know. And then another question is, well, how is this our problem, isn't that the vendor's problem? It's like, well, in a way, yes, it's a vendor's problem for not fixing your phone. On their hand, maybe we could have designed the kernel better to begin with. So we have fewer of these, or there's a smaller lifetime, which then begs the question, well, what does bug lifetime look in the upstream kernel? And John Corbett took a look at this in 2010 with the exact same question, like how long does it take us to find and fix a bug? And so he went through a handful of CVEs, and I should say CVEs certainly don't represent all security flaws in the kernel, but almost all CVEs are legitimate security flaws, so at least you can have a minimum baseline to look at. So John found in 2010 that there was an average lifetime between when a flaw was introduced in the kernel and to when it got fixed of about five years, which is kind of a giant time span. And I went and looked later, and I continually update this now, looking through the Ubuntu CVE tracker includes when bugs were introduced, as well as when they were fixed. And the hard work John did in 2010 was trying to figure out going with every CVE and then trying to figure out when the flaw was introduced. So I am cheating and leaning on the Ubuntu kernel and security teams who do that research now. Anyway, I was starting to get worried with each presentation I made on this, talking about bug lifetime, that slowly the amount of time, the average lifetime was creeping up and had reached six years, though luckily it has started to come back a little bit, and I can sort of show that in a hard to read graph. You've got kernel version on one side, so this is the past. That's the beginning of Git history. The top is 419. The red ones, the three red ones over here, really nasty CVEs, the critical CVEs that we've had. The rest are high, and I left off the medium and the low because it makes the graph even harder to read. And then, so the lifetime, the bar, the length of the bar, the bottom of each bar is when a flaw was introduced, and the top of it is when it was fixed. So you can see a bug with a long lifetime has a huge length. And the problem was, in doing this research, you'd usually stop at beginning of Git history, and you'd go, eh, it's been there forever. So all these bars that go plummeting to the bottom here, they were introduced at or before the beginning of Git history, and what I've actually been enjoying seeing is that we're starting to see the tails of these lifting up out of Git history, which I hope is true and not just an artifact of it being when it is now. So I'm hopeful that we're actually making progress on here, and we still have ones that have a long tail, but we're starting to see the averages coming down again, which is nice. And people ask me, so what, it's five years or six years or whatever, no one saw it until right before it got fixed. It's like, eh, it doesn't seem to be true. People whose job it is to find these flaws and exploit them tend to find them well before the upstream community. And we have proof of this if occasionally an attacker will boast about having found it forever ago and post something about how they've been using it, haha, we're all stupid. OK. But most attackers are not publicly talking about how they found their ode, so it's a real issue, but it's not one that we can gauge particularly well, which I, in some doom and gloom, somewhere back in like 4.8 or 4.7, I said, everyone here has a major flaw in your Linux and you don't know where it is, and I generally say that. And it turns out that was Dirty Cow, and so we just don't know what's next, right? There is some other thing not yet on this graph, and I'm, I'd like to help us defend against them. The good news and hopefully why I'm seeing the graph getting smaller is that our bug fighting continues. We're finding bugs. We've got a lot of static checkers. We've got a lot of dynamic checkers. I include the kernel in that itself because it starts looking for operational behaviors that it doesn't like, and we're fixing them. I always say when people don't believe me, I said, go ask Greg Hartman how many patches are landing in the Linux kernel stable tree, but instead of going and asking him in person, I went and looked up 4.14.78, which is an LTS release. There are 7,529 commits, which is about 96 bug fixes per stable release, and that is general bug fixes, some subset of that are known to have security effects, but it's possible that there are unknown security effects fixed in there as well. So we keep making bugs. And they exist whether or not we know about them, which is the real problem. So whack-a-mole, while important, is not a particularly complete solution. Constantine talked about an analogy in the 2015 Linux Security Summit using the 1960s car industry as an example of things are designed to run well. You can talk about, we're driving down the highway and you're not getting sprayed in the face with gas and oil and whatever. So it's great. And that's sort of in a lot of ways where he was talking generally, but even in Linux, things run really nicely for us. But failure modes are not particularly well handled. So we want to try and handle those failure cases better. With user space getting more and more locked down, we're starting to see containers and other things painting quite a target on the kernel itself. So attacking the boundary. And then I always like to remind people, hey, lives depend on Linux. It's a little bit scary. But we've seen cases where flaws get used by oppressive regimes to go hunting for dissidents and crazy things like that. So well, most of the time we just have to worry about phones and other things. Sometimes those effects are far-reaching. So I want to try to get the kernel into as defensive a posture as we can get it reasonably. So as I've said, killing bugs is nice. There is some truth to saying that security bugs are just normal bugs. We should fix them like everything else. That's sort of underscored by the fact that some security flaw may not affect me the way it affects someone else. They're using NFS, and I'm not. And the flaw was in NFS, stuff like that. And we don't have a great idea of necessarily which bugs attackers are going to be using. I mean, some are way better than others. And bugs might be an out-of-tree code where, again, there is sort of an echo of not our problem. But if we could actually provide an infrastructure that was safe, those bugs wouldn't have a problem. So the focus is more on killing bug classes. If we can stop an entire class of bug from happening, we just don't have to worry about it ever again. And in fact, if we provide an infrastructure that has a bug class removed, out-of-tree code can't hit it. A good example of this is with format strings, with printf. The percent n, if you get a format string abuse of some kind, the kernel doesn't support percent n. So you can't turn a format string flaw into a write primitive. And that means that's still true. Even if you have a million lines of out-of-tree code, none of that code can call into the printf function and become a flaw. But like bugs, we're always introducing new classes of bugs. So we can't kill all bug classes either. Killing exploitation is really great. If we can just remove all of the systems that are in the kernel that provide attackers with easy-to-use ways to do their attacks, it would be nice to get rid of that. The tricky part here, of course, comes with this caveat that sometimes when we do this work, we make development in the kernel more difficult. So accepting the need for this is sort of accepting the understanding that there will be overhead in this. Going back to the car analogy, if you've put a titanium bar in the door or whatever so that when you get hit on the side, you're safer, the person designing the window and all the electronics and that is kind of sitting there going, well, it would be a lot easier to do this if I didn't have this bar across the side of my door. It's the same thing even in software development. Sometimes we complicate things in an effort to make them more secure. And my hope is to convince people that it is OK to do that. So as part of that, I started the self-protection project to try and collect people into one place and centralize discussion and give people a place to focus on things and what work there was to do. And I can't do it even a fraction of it all on my own, but there's a lot of people out there that were interested in this stuff, so I tried to get them together in one place. And I wanted to try and focus us initially on kernel self-protection. There's a lot of work in the kernel to protect user space from itself. And that's important work, and I want to get to it. And if someone's got time and interest to work on it, sure. Let's do it. But I have been trying to focus on the self-protection bit just because I felt like the kernel was pretty far behind in that work. I used to say we were slow and steady. This is upstream kernel development. It's not revolutionary. It's evolutionary. But Alexander Popov and his slides on Monday had a significantly better motto. And which is flexible and persistent, which is much better, I like that. So that's sort of the justification for why we're doing what we're doing. I can go through what we have done in the last year in the years worth of kernel releases. So starting with 4.14, we've been doing reference counting protections, which needed opt-in to replace use of atomics with ref counters. So we had a brief stall in 4.14 where we were sort of bike shedding about how to document it and deal with it. So only three went in, but they're more coming. A randomization plug-in from GR Security got an automatic mode where if you see a structure with all function pointers, you can randomize it. Got free list obfuscation. So as an attacker, if you manage to get an overflow into the heap free list area, this makes the attack much more unreliable. The struct leak plug-in for automatically initializing variables that hadn't been initialized when they are passed into a function. For example, if you specify some buffer and you pass it to a device driver that is supposed to fill it in, and then you copy it out to user space. Well, if the driver didn't do it right, then you're just copying kernel memory out into it. So this would look for all the pass by reference variables and initialize them first. VMAP stack, similar to what we had on x86, landed in ARM64. And that gets us unallocated pages between stacks in the kernel. So if you try to run off the end of the stack, it will fault, solves a bunch of things. Another issue was setFS was specifying how you could copy memory in and out of the kernel, would check the boundary condition. And setFS would move it to include all of kernel memory for doing internal work. And this has been a source of bugs in the past, so now we just sort of check. On return to user space, did we forget to reset it? So we got coverage at a number of architectures. So we've picked up speed again with the reference count conversions. 4.15, to jog your memory, was when we got Spectre and Meltdown protections with PTI and RET Palin, replaced a whole infrastructure in the kernel timer list was we changed the implementation. This doesn't affect the kernel at all. Everything is operating almost exactly as how it was before. But the removal of this field out of the structure removes a target for attackers that was being used in several examples in real life exploits. So this is a good example of the kernel is just designed in a way that made an attack easier and didn't matter to the kernel. All we had to do was redesign things a little bit, and everything was, in fact, simpler. And there wasn't an account. So we gained fast reference counteroverflow protection. So the ref count protections that were going in had two implementations. This was the faster one based on what GR Security had been doing. And in a surprise move, Linus decided that %p, which prints out pointers to dmessage and sysfs files and other things, should just never actually print out pointers anymore. This was certainly on my list of things to do, but was not one I was going to try to fight for for quite some time. And he decided that we should not be info leaking quite as hard as the kernel was. So now %p just prints out a hashed version. So you can compare two values as being the same, but you have no idea what it means. So it would still work for debugging, but it would be significantly less useful as an attacker. More ref count conversions. We got PTI on Rn64. We had hard, excuse me, a hardened user copy already ported from GR Security. But this, and that kept, we couldn't copy things that were outside of the stack or outside of a slab allocation out of heat memory. And the white listing actually narrows that even further, where you say, I only want this specific field out of the larger slab allocation to be copied in and out of user space. So that narrowed how much could, how much of the kernel's memory would be exposed to user copy flaws when those bugs happen. And around here was also when the blue-born attack, which was a Bluetooth attack against user space and kernel came out, and it was a straight stack overflow, a solved problem. And the blue-born write-up says, yep, here's how I attacked it on all these devices that just didn't have stack protector enabled. Like, ah, why? So in looking at that, of course, it was once again sort of an infrastructure problem in the kernel. The kernel's build system didn't understand how to query what compiler was being used early enough that you could determine what sort of protections you could get. So the default was to build kernels with stack protector off. But that's insane. So this sort of fixed it, got fixed better in a following version. But now, when you build a kernel, you just get the stack protector if your compiler supports it, or you can specifically turn it off. So going from default off to default on was simply a matter of fixing all the logic and dealing with the weird bugs about compilers and build systems and everything else. But that doesn't change anything about the kernel's internal systems, just the build system, really. 4.17, we started VLA removal. So variable length arrays lead to a number of problems on the stack. We could have stack exhaustion problems. They're also not very, they have a poor performance characteristics. So we started grinding through lots and lots of VLA's to remove, which were difficult to deal with in certain situations. Clearing the stack on fork. When you would start a new process, your stack would be allocated from memory and wouldn't be cleared. So if you had an info leak, especially for something that's deep in the stack and you could arrange for an info leak out of somewhere on the stack, you could see what was in some location in the stack earlier. But it didn't make any sense since it would be much easier to just wipe the stack when you initially fork. Kernel. And this existed before, but it was behind a debug flag. And people said, well, this is clearly going to be expensive. Every time we fork, we're gonna write zeros to multiple pages in the kernel memory that's too expensive to turn on by default. So I turned it on by default and it sped things up, it seems. It looks like that may have just been cache priming. Like you brought in all this memory into the cache. A part of memory that you're absolutely writing to constantly as you're running and doing things. So maybe my performance metrics are terrible. I don't know, but it looked like a slight win. So that's nice. I really, really like security fixes that improve performance. More fixes on dealing with our limit, stack our limit on exec and how stack is laid out as a part of the stack clash fixes. Also related to stack clash was map fixed, no replace. So from, the kernel uses this internally when laying out an elf file when it execs. But you can also get at this from user space. Normally when you say map fixed, you say I want to put this in one specific location in memory. And there wasn't a way cleanly to say but don't do it if there's already something there. Like don't overwrite an existing allocation or memory out of VMA. And this was a problem in the Elf loader if you would get your libraries or other parts really close to the stack you'd just start clobbering the existing VMA and you'd run things together. So this solves that problem where if you get into the situation where you've somehow convinced the machine to misbehave, get the VMAs to overlap it will actually fail now. An interesting thing for 4.17 also was on the syscall entry, one of the ways to do strange things with speculation and cache stuff was you could actually find gadgets that were using registers that weren't part of the syscall but as the attacker you could populate the register contents from user space, make a syscall and that register would sort of walk along into the kernel and be available for speculation gadgets. And the idea with this was you effectively just clear all the registers that weren't used during a syscall and XORing a register by itself is incredibly fast. So no measure performance change on this and just kills that whole side of a way to do speculation gadget manipulation. And then we got more speculation control fixes, the SSB 4.18, continue grinding away on VLAs. There's a whole class of multiplication overflows of memory allocators where you try to allocate so large a memory that the math wraps around and you get a very small allocation and you do linear overflows and other things. So we added a bunch of overflow detection helpers and did a big pass on allocators doing open coded multiplication. We certainly didn't fix all of them but we did have a couple reports come in to security at kernel.org about exploitable overflows like this that overlapped with the fixes that were going in. So it was a nice validation of this massive tree wide change. We were in fact catching real bugs that other people were finding. Then we got SSB on arm 64. 4.19, 33 more VLAs removed. All the rest are gonna be gone in 4.20. We got them all done. So we had ads retract, multiply and divide, overflow helpers, we didn't have shift left overflow helpers, so that got added, or helper, that got added in 4.19. And the L1TF defenses, you can read up on that. Those are fun. And this is a user space defense which was in temp. We solved quite some time ago now the temp races with simlinks and hardlinks races also. But this meant that we killed a large class of bugs about temp files following simlinks by not allowing the simlinking following to happen if the users didn't match. But this shifted to finding software that would open a file in temp with ocreate without o exclusive, which meant that it would happily open someone else's file that they had made world-writable for you and then your program would write all sorts of information into it or however you were using it and the person who so helpfully created the file for you could then read it. So this effectively says it has a similar rules to the simlink restrictions that have been in for a while. It's sort of an implicit o exclusive. If the users don't match, treat it as if you had asked for it to be exclusive when you're in a temp directory in a world-writable directory. And this should hopefully close the last of the temp file races that still crop up. And we got the unused register clearing on arm 64. So coming for 4.20, as far as I can see all the VLA's are gonna be removed and we're gonna tend to add warnings, not errors to the kernel build. Maybe in a couple of years we can turn it up to a full error, but adding warn on VLA because we strive to make sure that the kernel builds without warnings. This will keep new VLA's from appearing. That'll be good. And fingers crossed the stack leak plug-in that Alexander Popov has been working on and ported from GR security will land for x86 and arm 64. Linus has yelled at us several times about these in past pull attempts. We think we're good again. So hopefully that'll go in. That will, that's a poisoning mechanism where on syscall exit, whatever portion of the kernel stack had been used gets wiped with a poison value. The idea being that when you come back in for your next syscall, uninitialized values are no longer under control of whatever the prior syscall contents set up on the stack. They've all been wiped out and so your uninitialized values are now initialized with a known poison value. This sort of matches the poisoning that we've had on heap and the buddy allocator for a while. And then we've got lots and lots of soon coming and not so soon coming features. Talked a bunch about the uninitialized variables. We're gonna see that from the compiler side. Integer overflow detection, this exists in the kernel address sanitizer. The, I mean, one of its sort of cousins, the undefined behavior sanitizer. Finding and getting rid of those. Link time optimization so that we can get a view of the entire kernel build at once so that you can do things like control flow integrity. Which is now in the pixel three phone. So forward edge, CFI, protecting against function pointers, function pointer overwrites, things like that. Gustavo's been working on getting the switch fall through markings finished so we can turn on no implicit fall throughs. We don't end up with busted switch statements. Getting a per task stack canary on non-X86. And those six pieces are all very compiler dependent, you'll notice. And we've got a whole bunch of other things that are less dependent on compilers and somewhat dependent on hardware and other research. So like memory tagging is hardware support. That leads to string and mem copy size allocation checks. Exclusive page frame owner tends to be mostly an issue of performance but that will solve an entire class of exploits where while you can't execute user space memory from the kernel in most hardware, you can still execute the mapping of the user space memory in the kernel's memory only if you know the physical location of it. But that's a real attack as well. Getting some fine-grained KSLR, SMAP emulation. Anyway, you can read the whole list here. But if anything in here is interesting to you, please join us. We're not done with the NAVs. It's nice to get other people's help even if it's testing, documentation, any piece, anything you're interested in or just telling us we're doing it wrong. Some people can show up and do that too. We get a lot of that. So our challenges really are the same challenges I've talked about before. We've got a lot of sort of cultural conservatism. Please leave the code alone. It's good enough. And people wanting to not accept the responsibility of, oh yeah, but that tree, that phone has lots of out of tree code running on it. Yeah, but it's still Linux. The sacrifice of needing to deal with the new overheads and the patience of both dealing with how long it sometimes takes to convince people or get stuff in and vice versa. And then there's the technical complexity, innovating in spaces where there is no obvious solution and collaborating with people. So yeah, we just need developers, reviewers, testers, backporters, whatever you've got. You can reach me there. Those are the slides again. If you want us to read more about KSPP, that's the Wiki link. And join our mailing list and say hi. And if you want to hang out on IRC, there's Lennox Harden on a free note. Any questions? So there's another major operating system out there that uses Hypervisor to kind of offer kernel protection integrity. Do you see that as being in Lennox's future at all? Is that the Hypervisor Magic Bullet? I do, that is the Hypervisor Magic Bullet. So one of the things that we're getting to the point where people are really making a lot more noise about, I'm sorry, I went too far, wanting to be able to sort of check the state of the kernel from somewhere else because they want yet another boundary in this. And things like the Samsung Knox had, we'll check the UID of the cred process from above or keep it mapped out. And getting a general way to do that, so we can declare, I want to protect these things from the kernel so we can give it to the Hypervisor to take over that responsibility. And while there are separate pieces for doing this in other forks and other stuff, there isn't a common way to get this communicated to the Hypervisor and I would like to see that. No one has really stepped forward to work on it and it's not been something that's been driven up my to-do list necessarily, but I'm all for it. Questions? Hi, thank you in case. It's a question with my Debian hat on. We are trying in Debian to enable GCC plugins, but for now we didn't manage to really do it because of out of three modules and we don't have really the infrastructure to actually make the plugins available for later. Is there something which would be possible to help us on that? Because as far as I understand GCC plugins are quite sensitive to the GCC version it was built with. So it's not always practical to use the GCC plugin later. Okay, so it sounds like you're attempting to ship the output of the build for later use. Well, yes, for out of three modules in Debian basically people when they try to install a module it's built on the user machines and not in the build the network. So they have to have the GCC, they can't rebuild the GCC plugin locally they need to use the binary build. I see, okay, nothing immediately jumps to mind but that sounds like something that we definitely want to get solved. I approached the Ubuntu kernel team also about turning on general GCC plugin support in the kernel because I suspected there were probably some problems we haven't yet uncovered but it sounds like you found them. But yeah, I mean the intention was that it should be possible to do those builds and that it would be, I mean always knew it would be compiler sensitive but the hope would be that you've got the source and you've got whatever compiler you're about to build it with and things would be okay. Sounds like there's more work to be done but yeah, I'd like to see more about it because we should definitely solve it. Okay, thanks. No questions? If not, let's thank you. Thank you.