 I'm Jeff. Here's how you can contact me on Twitter. This is a personal Twitter feed. So it will contain both security things and cat videos. I've been with Android security for about two years now. Yeah, and on Android, I focus mostly on system hardening. Sometimes that involves the kernel. Other times it doesn't. And I'm a software engineer. So we're going to be discussing mostly kernel bugs that we've been saying over the last couple years. But before we get too much into kernel bugs, we have some good news. The first is that most of the kernel bugs that we're going to be discussing today were not directly reachable by untrusted code due to Android security model. So if you wanted to reach one of these bugs, you'd have to chain exploits and gain privilege in a more privileged process. And then we've substantially cut down on even that attack surface in Android Nougat. And then the other piece of really good news that kind of ties into some of the work coming from the kernel self-defense project is that some of the new defenses that are being added address our biggest category of kernel bugs. So the next piece of good news is that the kernel only represents one of many lines of defense. And a lot of the other efforts on Android are focused on preventing malicious code from ever reaching your Android device. OK, let's get into some kernel bugs. So for our agenda, we're going to look at kernel bugs. We're going to discuss the cause of kernel bugs. So in other words, we're going to categorize them by type. We're going to look at the reachability of kernel bugs. And then we're going to discuss mitigations both by cause and reachability and then gaps. And then we'll also look at some future work. First to note, so other people have been hammering on this, but I feel I would do the same, which is that kernel bugs have a long life. And simply finding and fixing kernel bugs is not adequate. Some devices may never get patched. And when a bug is reported to the upstream kernel, it's likely not the first time that that bug has been discovered. So who knows who else knows about that bug, how long they've known about that bug, and how they've been using it. So we need to look at beyond just finding and fixing bugs and into an ecosystem that's resilient against security vulnerabilities. Yeah, so we have a lot of people that are looking at bugs. And so we want to make better use of their time rather than just playing whack-a-mole on bugs. We also want to use this data to prioritize mitigation development and adoption. And so that's what this talk is really going to be about, is let's look at bugs and let's use that data to then prioritize the work that's going on. So a little bit about our data set. So it includes bugs from all of 2014, all of 2015, and up through April of this year. It includes low, moderate, high, and critical severity vulnerabilities. So as a side note, you could reconstruct most of this data from Nexus security bulletins, which contain moderate through critical vulnerabilities. So I've included the low severity bugs here for a couple of reasons. The first is that our severity ratings may change over time. So things that were considered low previously may no longer be considered, could, for example, be considered moderate now. The other reason is that, and this is the more important reason, which is that the reachability of bugs is considered in our severity ratings. So bugs that are only reachable to a privileged process will receive a lower severity rating than bugs that are reachable by, for example, third-party apps. But the location and cause of these bugs, even though they're not reachable, are still useful data points for us to consider. So I've included them here. And then the kernel is of interest because it provides many of the security features that Android relies on. So these security features include the basic application sandboxing that's an essential part of Android's security model, and also the other security features here. In fact, the security features provided to user space by the kernel have been so effective that they're increasingly making the kernel itself really the only viable target. So our data actually reflects that. So if you look by year, you can see that the ratio of kernel bugs to user space bugs is increasing. So yeah, there's a few reasons for that. I think the primary reason is actually that we've been locking down user space so effectively. So really gaining code execution in a root process is significantly less useful than it used to be. And that's primarily due to SE Linux enforcement and the greater availability of Linux capabilities. And Paul mentioned this earlier, but this year we had a critical point where a majority of Android devices are now in SE Linux global enforcing mode. So because of that, attackers are increasingly having to go straight for the kernel in order to disable SE Linux and circumvent other kernel provided security mechanisms. The other reason, well, partially because of the value of kernel bugs, we pay more for kernel bugs. And so we're also rewarding security researchers accordingly. So it's likely that people are looking for kernel bugs because we pay more for them. Yeah, so this is an interesting data point. So Android does, in fact, inherit bugs from the upstream kernel. But our data shows that most of Android's kernel security vulnerabilities live in device drivers. Some of those device drivers do, in fact, come from the upstream kernel. So for example, a Wi-Fi driver, but many of them are introduced by SOCs as well as other manufacturers. This graph isn't, it's not intended as a name and shame for vendors, particularly because there doesn't seem to be one vendor who's doing really well and another one that's doing really poorly. They're really all doing poorly. And so I don't, which would be a really useful data point. Yeah, does anyone have that? I can tell you that it is, in fact, much less than 85%. But yeah, that would be a useful data point. Yeah, so some other interesting things come out of this data. I think the first is that, and we've kind of experienced this, which is that maintainers say, oh, bugs that you didn't inherit from upstream are not upstream's problem. But I think, not I think, the reality is that this is what most of Linux systems look like. And it's not limited to Android devices. If I, most Linux systems are running at least some non-upstream code, and that includes your typical desktop distributions. If you want to build the kernel for your Raspberry Pi, for example, you can't download an upstream kernel and build it and run it on your Raspberry Pi because it requires some device customization. So anyway, the point being that this is the reality, and so this is what we are trying to fix. And what I think is important is that kernel defenses will protect both code that comes from upstream as well as out of tree vulnerabilities. And that's a really important point, because what we saw is that we do, in fact, receive there are bugs in the core kernel, and we need to protect against those bugs as well. So let's take a look at the types of bugs that we're seeing. So by far, our biggest problem is bounce checking. And that's primarily what I'm going to focus on when we're looking at bugs by cause. And we'll also take a brief look at the next two categories. So null point or dereference is up there as well as information leak. Another interesting way of looking at this data is breaking it down between bugs that we've received from vendor drivers and bugs that came from upstream. So there's a couple of really interesting points that I want to make about this. The first is that, well, they look very different. So missing bounce checks, which is a pretty simple bug to fix and also to exploit, is not as prevalent in the core kernel. But I'd also like to point out that the same categories exist in both. So we do actually have a problem with missing bounce checks from the core kernel. I think the only one that's missing is it looks like we didn't have any integer overflows from the core kernel. And that's not because there aren't any integer overflows in the core kernel. It's because they didn't impact Android devices. So anyway, so yeah, the other interesting point is that I thought was that race conditions were the largest cause of bugs in the core kernel. So what's interesting is that the type of bugs in the core kernel tend to be harder to exploit than the types that we're seeing in vendor drivers. But bugs from upstream are far more useful because they become universal exploits. Does that make sense? So a lot of these bugs, while silly bugs, are very, may only work on a single device, whereas a lot of the bugs that we're seeing from upstream are actually the ones that are desirable to attackers because they'll work on any device. So if it's not clear, missing bounce checking when you're copying information from user space is really sloppy programming. So but it's happening. So let's look at some of the mitigations for here. And this was one of my pieces of good news. And that's that mitigations for bounce checking have either landed or in the process of landing upstream. And so code that's happening, an activity that's happening in the kernel self-defense project and in the upstream kernel in general, are addressing our largest class of bugs. So hardened user copy protects against incorrect bounce checking. But I'd like to point out that hardened user copy is actually incomplete because it's easily circumvented. The kernel can actually just read user space addresses. And so we need an additional protection to prevent the kernel from directly accessing user space without going through the proper copy to from user functions. And so pan emulation, so pan is privilege access never, specifically addresses that issue. And that's currently up on the mailing list. I don't think it's landed, right? OK. Yeah. So K said it landed for ARM32. And it's probably going to land for ARM64 in the next release. So what's interesting about pan emulation is that it was actually included in hardware for ARM version 8.1. The problem is that no one is using ARM version 8.1 yet. We're all still on ARM 8.0. And so this really kind of bridges that gap and allows us to have this feature before we even get the new architecture update. So other mitigations that protect against incorrect bounce checking, so stack protector protects against stack buffer overflows. And then I've included some other features which don't directly address bounce checking, but they do make exploitation of, for example, heap overflows more difficult. Let's see here. Oh, and the other note was that for KSLR, so that obviously randomizes the location of the kernel. But suddenly that makes information leaks, such as kernel pointer leaks, more important. And so in order to increase the value of KSLR, this community needs to be looking at how to make pointer leaks more difficult and other information leaks from the kernel. So for null pointer dereference, we already have our protection against that. It makes null pointer the LSM minimum address for in mapping makes null pointer dereferences unexploitable. It just turns them into crashes. And we will also, pan emulation will also make null pointer dereferences non-exploitable. So I've kind of struggled with how to frame this. And I finally decided on calling it code review. And the point being that the upstream code review process actually catches a lot of these bugs. And it's missing or needs to be improved for out-of-tree code. A lot of obvious security bugs could have been caught with better code review process. Yeah, so yeah, I'll talk about that a little bit. Yeah, well, so I ultimately decided not to include that here. For a couple reasons, I think the main reason that we see on Android is that a lot of these drivers aren't for features or that they're not for functionality that you can just go out and buy. So for example, it might be for a Wi-Fi chip or a sensor module or something like that, which is not something you can just go buy and use. And so the value of having it in the upstream kernel where no one can actually use it seems somewhat limited to me. I would also note that, as I said, some of these drivers are actually upstream. And being upstream did not necessarily improve the code quality of them, if that makes sense. Yeah. Yeah, so the question was about getting vendors to test their code, fuzz their code. And the answer to that is yes. It's happening. It's happening both within Google and also from vendors. And so it's safe to say that vendors are aware of this problem. So it's not like they don't know that it's a problem. And so security practices are improving. I haven't, but other members of the team have. So for example, members of my team are taking advantage of some of the fuzzing infrastructure that they're putting in and working on improving the quality of these drivers. Yeah, so anyway, what I wanted to look at was actually technical enforcement of better code as opposed to getting people to do a better job, both of which are important, though. So some technical changes that could be put in place, I would like to see better compiler changes, which actually catch bugs. So another note was that pan emulation came up again, which is that it actually forces developers to make proper use of the copy to and from user functions. And then kaSAN is kind of what we're talking about, using testing to improve quality of code. So kaSAN actually makes finding and fixing bugs more efficient. So we're actually going to move on from bug cause. Any more questions with regard to bug cause? OK. Yeah, so we're going to look at a tax surface reduction now. So it's actually not a very popular topic, and yet it's an incredibly effective protection. So it has multiple benefits. So making bugs unreachable obviously protects users, and that's the primary focus here, is actually protecting users. The other benefit of reducing the attack surface is that we have people that are looking for bugs and using, for example, fuzzing infrastructure. Well, with the tax surface reduction, we can actually focus more of our resources on fewer entry points to the kernel, basically allowing us to be more thorough in our process, in our bug hunting process. Reducing the attack surface also makes incident response. For those who care about incident response, more efficient. So for example, if someone reports a vulnerable iOctl, it takes a lot of time to actually figure out the reachability of that vulnerable iOctl to see which process can reach it, which code paths can reach it. So with policy languages like SetCom or SCLinux, determining reachability of bugs actually becomes a lot easier. Also with the attack surface reduction, so on Android, in general, we have a really good separation between developer tools and regular use case. So from the kernel's perspective, we want to move developer tools that are provided by the kernel also into developer settings, such that they're not reachable by default. Yeah, so let's look at categorized bugs now by driver. Something that's interesting is that they all seem to be a problem. One of the most interesting points of this is that, well, yeah. So my note here is important, which is that many of these bugs are only reachable by a privilege process. So yeah, let's actually break this out by bugs that are reachable by apps. So Wi-Fi and GPU are still our top problems, but you'll see now that the Perf sub-system is also making a major contribution. So based on discussion on Perf on the mailing list, I would like to point out that the Perf bugs were all introduced in vendor code. And so the Perf maintainers, I'm sure, would like me to mention that, that they actually, they legitimately have good security practices and are actively fuzzing their infrastructure. So from a bug fixing versus mitigation standpoint, I'd like to point out that we do fix all bugs. So whenever a bug is reported, we fix it. But we also still want to add mitigations for that type of bug to make future bugs either less severe or unreachable. So as I was digging through the Wi-Fi bugs, an interesting trend emerged, which is that, and I think Wi-Fi is actually really interesting because a lot of these Wi-Fi drivers are in the upstream kernel. They're not just in certain devices. And it's that all of the bugs that were reachable by untrusted applications should have been protected by a capability check. And so when I went through and categorized the cause of bugs, I had things like missing bounds checks or missing permissions checks. Well, a lot of these bugs were actually multiple causes or it took a missing permission check and then an integer or then a missing bounds check in order to exploit. So in cases where multiple bugs were the case, I just had to choose one that I categorized by. But anyway, so this brings up a really important security concept, which is that relying on in-code checks, particularly scattered all throughout a code base, is bad security design. So we actually want security checks to be done up front and we want them to be auditable. So the other fun discovery that we made when it came to Wi-Fi driver bugs is that they were reachable by local UNIX sockets, which I have no idea why you should be able to reach the Wi-Fi driver through a UNIX socket, but you could. And so when we're looking at mitigations, we're gonna look at how we added much stronger policy around all socket types. So looking at kernel entry point, I don't think anyone's too surprised to see this. Maybe the scope of it is surprising, which is that Iocdols were by far the largest problem and really any syscall, well, yeah. What I'd like to point out is that syscalls that are commonly modified by vendors were the largest problem. Yeah, so let's see here, yes. So I'm actually gonna address that, which is that some of the mitigations that went into actually not just Android M, but also Android M were heavy restrictions on debug FS. So because as I said, these bugs go all the way back to 2014, many of these have been addressed by the mitigations that we're discussing. Does that make sense? Yeah, okay. So yeah, so for mitigations, because the Iocdols syscall was such a major problem, it's actually the big reason why we added Iocdol command whitelisting, which is so that we can cut back on which Iocdols are accessible and we can actually audit which Iocdols are accessible. So we added it to, probably the biggest one was Wi-Fi. So originally hundreds of Iocdol commands were accessible and we brought that down to 29 whitelisted network socket Iocdols. For Unix sockets, we did the same thing, although it went down from hundreds to, I think we have eight that are whitelisted. And then we also disallowed Iocdols on all other socket types, including generic sockets and netlinks sockets. GPU is actually really interesting because I didn't really expect that we'd be able to use Iocdol whitelisting on GPUs because why would you have Iocdol commands that aren't being used? But it turns out that at least when I looked at the KGSL Iocdols, which are from Qualcomm, that generally only about half of the commands were needed. So when I looked into that, because that seemed strange to me, it appears that the reason for that has to do with compatibility. So you have a version of your driver and then you have a version of your library and they don't necessarily assume which version you're using of the library. And so they have to keep all of these Iocdols exposed for compatibility reasons. What's nice is that on Android, that's not the case. The library is part of the system image. Apps and other processes have to use the version of the library provided by the system. So we could actually do whitelisting on Iocdol commands. And it turns out that when we looked at bugs that were being reported and bugs that were reachable, they were roughly equivalent to the number of Iocdol commands that were exposed. In other words, if we cut off 50% of the Iocdol commands exposed, we would generally cut off about 50% of the bugs that were reachable. Other mitigations, so we restricted access to perf, access to perf event open is disabled by default. And for developers that need access to perf, they can re-enable it through developer settings. We removed all access to debug FS, which as someone pointed out is a reasonable response. We also removed most app access to sys. So any files that apps need in this sys file system have to be whitelisted. And then we backported set comp. I believe we backported it all the way to 3.4, the 3.4 kernel. And so set comp is a requirement for all devices. There's no exemptions. And yeah, we actually used it on the platform as well to constrain some of the media server processes. So mini-jail shout out for that is if you've never used set comp before but are considering using it, I mean I probably constrained a couple processes in an afternoon, it was really simple. So the impact of the mitigations, so because most of the bugs are driver specific or device specific, we have to actually look at per device. So the device I chose was Shamu. So for Shamu, 100% of the Wi-Fi bugs were blocked. 50% of the GPU bugs were blocked. 100% for debug FS and Perf. So again, showing the effectiveness of attack surface reduction. These were all the largest areas of bugs that we had and we were able to just remove access to those without actually impacting what users see. So gaps, so there's a lot of functionality in the kernel that not everyone needs access to and what we really need is we need the capability to turn access on or off when we need it and there's actually been a lot of pushback on adding these type of controls. So no one wants their feature turned off even if there are use cases where people don't need that feature. And so yeah. So the other thing that I'd like to see is argument inspection for set comp. So the way set comp works right now is you can do filtering on the first, it's architecture dependent, but on the first few arguments that are passed into a function, sometimes those are pointers and doing argument filtering on pointers is not very useful. We need to know what they're actually pointing at. And so case talked about argument inspection, so that's something that we would like to see. So for future work, so this kind of hearkens back to last year's keynote speech where he compared computer safety with the car industry years ago. We need more and we need better safety features and with it in mind that sometimes these may cause inconvenience for developers, we still need them. And we can balance what developers have access to with what goes on to systems like Android or Tizen or other operating systems. All right, so on all my graphs we always had a large other section. So looking at other potentials for attack surface reduction. So Android's libc is called Bionic and what's interesting about Bionic is that it breaks out only a subset of the syscalls provided by the kernel. And so any syscall that's not broken out by Bionic is a candidate for removing access to. So the other, I'll go ahead and touch this third rail. So all, not all, most of our bugs are caused by memory safety. And so if we could look at a way of either making C more memory safe and there actually has been work on this or the ability to use something that is memory safe in the kernel. So for example, if we could restrict vendors to only using a memory safe language then that would fix a lot of our bugs or it wouldn't even fix them. They just wouldn't be able to occur. All right, so where do we go from here? So a lot of really, really good work is happening in the kernel self-protection project and it's being done by people in this room. The other people I'd like to point out that have been really helpful and have been getting some really great features in are people at ARM and Leonardo have been getting some really, really great security features in and Android is gonna benefit greatly from those people's hard work. From an open source perspective, Google really wants, and Android really wants these features to go into the upstream kernel and it's because in the upstream kernel lots and lots of people, everyone gets to benefit from security features as opposed to just Android trying to go its own way and customizing the kernel itself. And really that's responsible open source development is participating with the community. Yeah, so other areas that I think are have some more low hanging fruit, attack surface reduction still has a lot of low hanging fruit. So please make contributions to AOSP and then as always, continue to find and fix bugs. It's gonna be really exciting as those bounce checks, mitigations get submitted into Android. It's gonna be really exciting to see that category just fall off the map but then we're obviously gonna have new categories which become our largest source of bugs. And so continuing to find bugs in those places and submit fixes will both help us obviously fix individual bugs but then also give us data on what areas we need to work on. And with that, questions? Yeah, Casey? So a couple of slides. Uh-huh. Can you name some of them? Um, so debug FS was, well, the ones that I did name are the ones that stick out right away. Debug FS is something that's useful on debug builds and so that's where it should be accessible. Perf was actually a really good example of something that actually is important and that we do want access to but not all the time. And so we added a toggle into Android and there's discussions on how to do something similar in the upstream kernel. Looking at that long list of, or the list of system calls that were available through Bionic. So a good example was Sysvi PC. That's something that we completely removed access to in Android and we actually, that's a restriction that we now place on across the ecosystem is no Sysvi PC. Um, so yeah, anyone have any other? Anyway, yeah, this is an area that has some low hanging fruit and so we're looking at it. Yeah, yeah, so the question was about getting some help from compilers. So the GCC plugin infrastructure just got accepted and so into the upstream kernel and so that's going to make creating those sorts of plugins a lot easier. And so that's some good news. Are there any specific ones that have gone in that any other questions? All right, yeah, so this is kind of a fun talk because I feel like we're looking at the dark past and also looking at the bright future, especially as we see a lot of these really, really good kernel defenses making their way in that actually address some of our largest issues. Yeah, it's exciting. So thank you.