 It's 11.30 now by my watch, so I guess I should start. Hi, I'm Mark Rutland. I work at Arm Limited doing kernel work. I'm here to talk to you about hardening features in mainline. And evidently, I did choose a sufficiently strongly worded title to lure you all in. So we have a problem with mainline today. And that problem is in that, as many other projects, we have bugs. In the 4.8 merge window, we fixed over 500, of which a large number of those were in 4.7 and earlier. And that's not something that's going to change, because bugs are practically unavoidable. We do lots of things today, like reviewing. Some people use static analysis. Some people go as far as testing. And very, very few people actually do formal methods and actually prove algorithms they use. But for varied reasons, these are not actually going to help in practice to rule out all bugs. They're all certainly very valuable. And this is not something that we should stop. But some of those bugs that got through that we believe are unavoidable have security implications. This is a screenshot from cve.mitre.org a couple of weeks ago showing Linux kernel CVs. There are about 1,500 of them. And that's probably a low estimate, because not every bug gets a cve. And by security implications, I mean, these are things that allow an attacker to get the kernel to do something it should not and allow them to do something that we don't want them to. So I might install a game on my phone. It might get root privileges. And from that, it might start dialing premium rate numbers. That's not good. And even though we're fixing them, that's not sufficient. Because people are finding these bugs and attacking them before we even realize that they exist. So in this case, a rather colorfully-named individual or group claimed to have been exploiting a bug for over two years before it was fixed upstream. And that's not because people were lazy and sitting on a bug. It took two years for people to realize that that bug existed. And that might seem like a rare case. But unfortunately, it's not. This is a screenshot from a case cook's talk at the Linux security summit this year, showing the lifetimes of various critical and severe bugs, I believe, from the Ubuntu bug tracker for the kernel. The red ones are the most severe. The orange ones are very severe. I've cut off the rest of the graph. The y-axis here is kernel releases. So the longer the line, the longer the bug lasted before it was solved. So from introduction to being fixed. And the average for this set is over five years, which should be slightly terrifying. Because if we consider that in general, we have bugs. And we're going to have more bugs. Many of those are going to be exploited in the wild. And by the time we fix them, the products those kernels we're in are already end of life. So show of hands, who here has an Android phone that's five years old or more? Good. There are probably vulnerabilities in those phones that we do not know about yet. And that's not a problem with Android. That's a more general problem. Which brings us to hardening. What is hardening? Well, it's about making bugs more difficult to exploit. We don't know which particular bugs exist in a given kernel yet. And we probably won't for five or more years. But we do see recurring classes of bugs. So we see things like dereferencing a null pointer or accessing memory control by user space when we didn't mean to. And these happen again and again and again so often that we can more or less assume that some of the bugs we don't know about are going to fall in these buckets. So rather than trying to fix these individual bugs and play whack-a-mole, what we're instead going to try and do is put in place protections that protect against the general failure case. And the good thing about this is it will protect us against all instances of these bugs, even the ones we don't yet know about. This definitely doesn't replace bug hunting. We still need to fix bugs. This style of protection is never 100% effective. And while it makes the kernel less exploitable, it doesn't really fix the underlying issue. Just because you've detected that you've dereferenced a null pointer doesn't actually help you because you're still going to crash. We still need to fix the bug that led to that. But at least your phone won't be dialing a premium rate number. So it's making it less bad. There's been a lot of talk about this happening in mainline recently. So security work in this area has been going on for a number of years in various other projects, but for various reasons. Many protections haven't made it into mainline. There's a lot of work going on at the moment. And there's slow but steady progress. But it's going to take a number of years before we actually have a reasonable set of protections. By reasonable, I mean approaching the type of thing that's available elsewhere. And most phones aren't running a kernel from five years in the future yet. So all these protections that are being worked on at the moment aren't very useful for everyone here. But we do have some features today that have existed in the kernel for either years or months, which you can simply turn on, which are maintained, and improving. So if there's a bug in them, we'll fix them. And since they're easy to turn on, you can just turn them on and get protection effectively for free. Unfortunately, not many people seem to be turning these on, judging by bug reports that I get. So let's talk about them. I'm going to cover a few simple cases that you can turn on without having to worry too much. There are some more advanced things in the kernel, but they're a bit out of scope because I don't have enough time to talk about them. So one class of protection is what I call strict kernel memory permissions. Historically, the kernel has mapped all memory as readable, writable, and executable. And it's had to do this for various reasons. I mean, architecture, specific constraints on the way you use the MMU or the way you boot, and so on. And it's just simpler to do that. But it does lead to unfortunate situations like being able to modify kernel code or modifying const data or executing data, all of which are things that we never really expect to do and are highly indicative of bugs, but they're very useful primitives if you're an attacker. So we want to think better than that. And what we want to do is get the MMU to enforce these permissions for us, because we should be able to map that code as read-only. We should be able to map constant data as read-only and also non-executable. And we should be able to map data as non-executable. And if it's done in the MMU, that's effectively free, because the hardware is handling it for us. This kind of thing, because of those architecture-specific constraints and so on, requires some architecture-specific code. But we don't have to care about that. There's a feature in the kernel called DebugROData, which is hilariously misnamed. It's not just a debug feature, and it's not just about making things read-only. Although that was its original purpose. So there is a few things it has to do, like padding data out to fit on page boundaries as required for the architecture. But otherwise, there are basically no changes required. If you turn this feature on, you don't have to go and modify your code. You don't have to go and modify library code. You don't have to go and modify core kernel code, because it should just work. Since it's being handled by the hardware, there's practically no runtime overhead. There is theoretically overhead in things like TRB pressure and a few additional misses. But unless you're writing micro benchmarks, it's very unlikely that you will notice. It's available in several architectures today. And in 4.9, it will be mandatory on both ARM64 and x86 because we consider it a fundamental security feature. Given that, I would advise that you might also want to consider this a fundamental security feature and turn it on on earlier kernels. There's a related feature called debug set module RONX, which is also hilariously misnamed and is practically identical, apart from the fact it works on modules rather than on the usual kernel mapping. For some reason, it's available on one less architecture than debug RO data, which I noticed this morning, since I think debug RO data was available on PA risk. I have no idea why. But I'll poke the mailing list shortly on that front. So I would advise you to turn on these features if at all possible. And if you have an issue, please report a bug, because these are in mainline. They should just work, and we will fix bugs. Completely unrelated but also useful is stack smashing protection. There's a class of attack known as stack smashing, which works on the principle that your stack consists of a return address, maybe some other data required by your calling convention for the architecture, and then all your local variables. And on most architectures, while the stack, on most architectures, the stack grows downwards, and these buffers will grow upwards. So if you copy some data to a buffer on the stack and that data is too large to fit in your buffer, you'll end up overwriting subsequent data on the stack, which happens to include the return address. This means that if an attacker knows what your stack frame layout will look like, they can control where you will return to. And if they can do that, they can branch to any code under their choosing, and they can use that to launch further more advanced attacks, which is less than optimal. So there is a very imaginative protection for this known as stack smashing protection, whereby we insert a secret value known as a canary between data and this flow control information. And we have the compiler do this. So at function entry, it writes a secret value to this location. And immediately before returning, it checks that that value is still there, which means that were you to try and overflow this buffer, you'd have had to have clobbered this value. So it's very likely that we'll be able to detect when that happens. There's a small amount of arch-specific bootstrap required for this, but otherwise no changes required to your code, and so on, because it's all done by the compiler. There are some obvious constraints of this. It works very well for detecting these linear overflow style cases. But if you're just writing to arbitrary pointers, you can still write to an arbitrary location on the stack, and the attacker can still overwrite a return address and get control flow control. If the attacker knows the canary because we've otherwise leaked it, they can spoof it. And if you have other data after this buffer before the canary, that can still be corrupted, and we won't detect it. And that might be the basis of another attack. So it's just worth bearing in mind that this is not a 100% effective solution. There are two options for this. One is Stack Protector Regular. It protects any function with a very small character buffer on the stack, but nothing else more. It increases the kernel size by a trivial degree and requires GCC so old that you're almost certainly going to have enough GCC to use this. And it's available on several architectures. There's also Stack Protector Regular, which protects a more compelling set of cases. So any case where you say take a pointer to a value on the stack and then pass it into another function, GCC will decide to use a canary in the function that did that. Similarly, if you have any array on the stack, it will insert the canary and so forth, which happens to affect about 20% of kernel functions. Increases the kernel size by about 2%, which typically won't break the bank, requires a relatively recent GCC and is available on all the same architectures. So there's also this thing I call user kernel memory segregation. There's a flame war about what to call this, but that's the phrasing I'm going to use. So typically, the kernel shares an address space with user space in the in hardware. An address can, a pointer can encode an address to user space or to kernel space, and there's no fundamental difference between the two. The same load and store instructions can be used for either case. And for efficiency sake, the kernel always has user space mapped in user threads, which means that if you accidentally dereference an address that happens to be controlled by user space, the hardware won't notice and it will happily give you the value. So if by other means an attacker can convince you to dereference this address, you won't notice, and that can be used as the basis of a number of attacks. Similarly, if you branch into this space owned by user space, the hardware won't detect it and will let you go happily along. So if an attacker puts a buffer of code in a user space address and then uses a stack smashing exploit to branch to that, they can now do whatever they want with kernel privileges very trivially. So this is a really powerful primitive for kernel exploits and again, less optimal. But obviously these two portions of the address space are logically distinct. We have the kernel address space and the user address space. User space today can't access kernel memory if you have an MMU and typically the kernel doesn't actually need to access user memory, aside from cases like copy to user, get user and so on. So what we can do is get the MMU to help us because all those architectures we have in MMU today allow us to change the page tables dynamically and that's what we use to switch processes in the next. Some architectures have more fine-grained control where they allow you to apply permissions to certain pages or sets of pages. So what we can do is explicitly un-map user space or prevent access to it when we're in the kernel by say disabling access upon entry, enabling access upon return to user space and for these user access primitives like get user, just enabling access temporarily and disabling access afterwards. And that means that we'll catch most of these unintentional user memory accesses or user memory branches. Obviously that doesn't come for free because we have to run some instructions to go and change the address space, to go and change the page tables and so on. So add a small amount of latency to kernel entry and exit and also to the user access primitives. More recently some hardware has got to the point where it can do this for us automatically. So on ARM we have a feature called privilege execute never which is a bit in the page table you set to say I never want this page to be executed with kernel privileges. X86 has a similar thing called SMEP I think supervisor mode execution prevention. We make use of these automatically whenever they're present, there's not a config option for them. So those prevent arbitrary code execution from a user space buffer, but a attacker can still branch to another piece of kernel code of their choosing. So it doesn't prevent arbitrary execution, it just limits one case. They can still reuse existing code or perhaps branch to other code that happens to be mapped by the kernel if it was executable for some reason. More recently MMUs have gained the ability to do this for data accesses as well. So privilege access never on ARM64, supervisor mode access prevention on X86. They require some architecture specific code in all these user access primitives, but otherwise nothing else, your code will continue to work as usual. They prevent these accidental use of user space data, but obviously they don't do anything else. So you can't now trust data you got out of get user and friend, you still have to do all the usual checks on those values. There are a few options for these and they're all architecture specific because they happen to work at this low level and there's no common infrastructure for this. For 32-bit ARM, there's config CPU software domain pan, and aside from very old hardware, it protects all of the user space memory on some very old hardware without the vector-based address register. The low one megabyte can't be protected and it requires using domains which happens to require using short descriptor rather than long descriptor on 32-bit, but that appears to be what most people are using today. On ARM64, we have ARM64 pan. We patch in the necessary instructions when we detect that the feature is present in hardware. So you can turn this feature on and you don't need to care about if you actually have the feature, your kernel will still run at the cost of a few knobs in a few cases. But obviously it's only actually effective when the hardware has the feature. The x86 map feature is very similar in that they patch in instructions as necessary. You can turn the option on, it will work on any piece of hardware, but will only be effective when the hardware has this feature. And in all cases, you don't need to go and change your code to use this, so you may as well turn these on. And this is going a lot faster than it did when I practiced this, so I'll give you my bonus slides. There are a couple of options you can use that aren't necessarily hardening features, but very useful for testing your code and making sure that it works very well. So there's this feature known as KASAN, or Kernel Address Sanitizer. It's not a hardening feature, but I would recommend that you use it when testing your code prior to release, because it finds things that, it will find bugs that are very difficult to discover otherwise. It has the compiler instrument to the code to provide byte granular use after free or after bounds detection. So if you free a buffer and then access it, or if you go one pass the end of an array, this will detect it, because this check is added by the compiler and software. It requires a recent-ish GCC. You probably want GCC five or later to actually get protection of stack local variables and of global variables, otherwise it only applies to dynamic allocations. And the difference between these two is that outline simply branches to an outline function to perform each check, whereas the inline version does it inline and is optimized further by the compiler. So outline keeps the kernel small, but inline is faster. That's pretty much the only trade-off there. That's available on ARM64 and X8664. It has to allocate a fairly large region of VA space to perform these checks, so there's a fairly, so it requires a pre-allocation of a fair amount of memory and also slows things down a bit. For those two reasons, it's not available currently on 32-bit architectures because I don't believe there's sufficient VA space. And similarly, it's not something you'd want to put on in production. And when it detects failures, it will print a nice warning telling you where the access was, where the memory allocation came from, and so on. And if you ask it to, it will also panic the kernel. There's also UBSAN or Undefined Behavior Sanitizer, which detects undefined behavior at runtime. This is surprisingly cheap, and Undefined Behavior allows many more things than you expect, even when you think you know all the things it expects, it allows. So there's a common complaint that UBSAN comes up with false positives, and whether or not it's a false positive is arguable. You require a recent GCC for this, 4.9 plus, so not all distributions have a recent enough GCC for this, but it's fairly easy to get hold of one. And it's available on a few architectures today and could be fairly easily extended to others that GCC has a user space UBSAN support for. And that's the bulk of my talk. So, questions? I have another microphone here, so I'll just pass this down. Last slide, I didn't get a point of what is behavior and what is undefined behavior? What does it features? So these are undefined behavior per the C standard, so signed integer overflow and this kind of thing, where the behavior, where the C standard permits a variety of things to happen more than you'd expect. So for instance, for signed integer overflow, obviously you might expect that you have a two's complement number, but the C standard being written a very long time ago didn't have that, had to support other hardware and didn't have that requirement, and so permits a number of optimizations which are counterintuitive in that case and could lead to unexpected behavior and therefore exploitable behavior. So this relates to kernel inside code or also userline code? So all these features are inside the kernel, they are only inside the kernel, yeah. There is a, so you can use UBSAN for user space, there is a user space version of this as well, and there is also a user space version of address sanitizer. Thanks. Thanks for your talk. I would just add that Kassan, kernel address sanitizer, is not a security feature. There was a very remarkable thread at openworld.com called address sanitizer local route. Yes, not a hardening feature. But it's definitely just a testing feature, sorry, I should have called it out more explicitly. I think we should pay attention to that because it introduces some additional ways for attackers. Yeah, so there are several, so one of the larger complaints about address sanitizer in user space is that the library behind it accesses, the library behind it uses load of environment variables which it doesn't check and then does things based off that. So if you had a program that was running as root that was compiled with ASAN, you could use it for privilege escalation. That kind of thing doesn't happen in, that specific case does not happen in the kernel because it's all self-contained within the kernel, but there are potentially, yes, other cases with that because it has to go and access what's known as this shadow region to track memory allocations and if it gets its address calculation wrong and access is something it shouldn't, that would be quite bad and there are other potential problems there, yes. And I would like to add that Kasan is wonderful with fuzzing. Yes, so I use KASAN with Trinity and Vince Weaver's Perfuzzer to find bugs. They find them amazingly quickly and then you can spend several weeks trying to track down what the underlying issue was. But they have definitely found exportable bugs. I've definitely found exportable bugs using KASAN and so on before I sent my patches to the mailing list. So I would very strongly advise everyone to test with it. Thanks again for your talk. Any other questions? Sir, I still didn't get the point with checking undefined behavior in runtime, one note, static analysis and how we can actually check undefined behavior in runtime. So there are certain cases where it is non-trivial to prove if you have undefined behavior in a piece of code at compile time because you might have a piece of code for which, you might have a piece of code in one compilation unit and that might take in, so you might have a function in that compilation unit and it has a parameter of how much to left shift a variable by. Now, obviously a left shift of more than the width of the type I think is undefined or for some types. Regardless, that's definitely an undefined case. The compiler can see that that is possible to happen but it doesn't know the set of callers of that function so it can't tell you that that is actually the case and if it were to warn in each and every one of those cases, we would have an incredibly large number of warnings up with the kernel to the extent that people would just ignore them and turn them off. The nice thing about checking it at runtime is that you now have a definite case where it happened where you can show someone to get it fixed. Does that help? Yeah, yeah, makes sense. Thanks. Could you pass the microphone forward, say? What are the most promising new features that will be added to the kernel soon on this topic and how much resistance is there to actually getting it mainlined? That is a very difficult question. I need to drop a bomb. So I think actually some of the more promising things are actually some of the most boring. So actually just turning on these features by default and making the mandatory would go a really long way. So for instance, the debugger OData becoming mandatory in v4.9 I'm very excited about but it's very boring because it's been there for a number of years. So there are features coming such as moving thread info off the stack. Has I think just gone in for v4.9 for x86. I've been working on that for arm 64 a bit slowly. That prevents a number of attacks where a stack overflow can be used to gain privilege or take over the system. There's some work that Catalan Marinus is doing on effectively implementing a software version of pan for arm 64 for hardware without pan very similar to what Russell King did for the software domain pan on 32 bit but that's not quite ready yet. Yeah, I think those are the most exciting things. I think resistance is slowly going away for many of these things. Given the number of people actually working on these features now, lots of the complaints about these not looking like kernel code and doing things wrong are actually being solved quite quickly and also the people's opinion mainline is slowly shifting to the extent that yes, we need to do something here. We understand though obviously that doesn't apply to everyone, any other questions? I'll have to say thanks.