 Okay So before we get started I have some acknowledgements. We're going to discuss a whole bunch of vulnerabilities that apply to the Linux kernel in Android specifically so Part of the acknowledgements are the people who reported these bugs and the people that triage them and then also a few other people helped with the Both kind of the content of these slides as well as some of the technical work that we'll be discussing later So another note is that all of this data is public data So Android produces a monthly security bulletin and so you can actually go and look at all the bugs that we're discussing I Did want to discuss a caveat here, which is that the data that we're going to be Discussing is comprised of vulnerabilities reported to Google from a bunch of different sources and including From Google or from Android vendor vendors some from the upstream kernel and then from external researchers And so we're going to try to infer some Patterns and problems based on this data, but it's important to point out that the data Almost certainly has some bias vulnerability research tends to Be somewhat fashionable. So for example the last year people have been looking at Shared resource attacks like speculative reading so like spectrum meltdown type vulnerabilities But we see we see similar things in and the subsystems that people are looking at Looking into and the types of bugs that they're finding. So just want to have that caveat that While we're going to try to infer some some things from this data The data itself probably has some bias in it and then of course Android is an open-source project So you can submit patches and bug fixes Okay, so why are we discussing the kernel? So Currently the kernel accounts for about a third of the security bugs that we have on Android The kernel is part of Android's trusted computing base. And so it's an important area to look at especially now When people want to compromise Android, it's often easiest to go straight to the kernel as opposed to in the past where people would try to exploit privileged system processes So what's what's what's working well in terms of protecting the kernel? Attack surface reduction is is working really well. I've managed to get a quote out of Project zero where where they basically say that some of the attack surface reduction that we've been doing is what they view as most effective in terms of mitigating these types of vulnerabilities And then I'm going to try to show some data which I think backs up their statement the nice part about about attack surface reduction Is that access controls which is mostly what what I'm going to be discussing are hard mitigations So we can apply we can apply this type of mitigation without actually knowing What types of vulnerabilities there are or? Or knowledge of specific exploitation techniques Okay, so So when we look at attack surface reduction I actually want to look at all of the kernel vulnerabilities and then which one of those are made inaccessible to untrusted processes so This data is looking very specifically at kernel vulnerabilities that are reachable from user space But unreachable to unprivileged processes on Android an unprivileged process would be like a third-party app or We also include some of the Media processes as being unprivileged as well Another type of data that is excluded here is data that or it are The other type of vulnerability that's excluded here are vulnerabilities that require you to have already completely compromised the security of the system so We we call these type of vulnerabilities you have to root the device in order to root the device So we're excluding that because those are those are uninteresting So looking at the different attack surface reduction mechanisms that we use so I'm what I'm going to do is I'm going to break this graph into The the components that add up to it I've been told that this is somewhat confusing. So what it is is if you imagine that 90% of our vulnerabilities are Mitigated through access control mechanisms then in this graph we we show you the different access control mechanisms and what percentage of Vulnerabilities they mitigate against So obviously se linux is the largest one, but I Well, the only way this makes sense is that there's overlap in these So a great example is that when we look at kernel vulnerabilities Specifically then Unix permissions Everything that was mitigated through Unix permissions was also mitigated by se linux That's not true of all vulnerabilities on Android, but that is true for kernel vulnerabilities so So I wrote down some an example for each one of these that you can look up if you want but in the case of My example for se linux. This one was a debug FS node Which you could exploit with a buffer overwrite for Unix permissions it was a Dev slash dev slash sound node so an audio driver bug That you needed the correct Unix permissions and the correct se linux permissions and then finally for for capabilities on Android untrusted processes can create Unix sockets or not Unix sockets Raw sockets so in this case in order to reach this bug you had to be able to create a raw socket Which an unprivileged process may do and you needed cap cap net raw in order to reach it so This bug would be unreachable to unprivileged process due to files or due to capabilities Another kind of note on on Some of this data is that it is somewhat conservative in that in this previous slide where I've got about 90% are mitigated If we couldn't actually determine reach ability which sometimes we can't because we get bug reports and no proof of concepts along with them and sometimes We just have to Be conservative and say we think this is reachable from a third-party app Because we get lots of bugs So also starting with Android Oreo, which is The data that we're using here. We we also had a Setcom process which was applied to all apps and so that also blocked access to a few bugs this particular CVE that I mentioned here is there was a vulnerability in the move pages syscall and We don't use that syscall on Android. So just blocking it to everyone Prevents access to this particular particular vulnerability So I guess the summary from this section is that the the kernel provides us with some pretty good tools to protect the kernel from user space and attack surface reduction works really well and on Android, we've got some pretty good data showing that We will also it's not like we're done Applying attack surface reduction or privilege principle of each lease privileged from user space to the kernel So hopefully these numbers will also continue to get better with time Looking at the the unprivileged reachable bugs The biggest problem is the GPU which is one of the few Hardware drivers that are actually accessible to apps on Android So but it's also important to note that a bunch of the reachable bugs are reachable are bugs that we Get from the upstream kernel so in other words a lot of a lot of the We get a lot of bugs from poorly written vendor drivers In the case of GPU on Android that is a problem but in general we're able to block those by by doing sandboxing so for example only The audio how should be able to access audio drivers and therefore if there's a vulnerability in the audio driver You can't reach it as a third-party app So there's also a couple of other really nice user space to kernel mitigations that have been introduced recently so this graph just shows the root cause of user space reachable bugs on Android and of course the biggest Problem that we have is that people either don't check bounds or they check them in correctly Allowing you to read out of bounds or write out of bounds and so that's why we're excited about Harden user copy which was introduced in the upstream kernel in the last year ish and then Backport it to all of Android kernels because it Yeah provides a mitigation against this this area But specifically by hardening the copy to or from user functions We also have something called pan privilege access never never allow Privilege access never anyway, whatever it's called what it's used for is that it Prevents the kernel from directly accessing a user space processes memory So there's a couple of reasons why that's that's really useful The first is that it forces all Communication to and from the kernel to actually go through those hardened those newly hardened copy to and from user functions but the other reason is that the kernel directly accessing a user space processes memory is really really racy and so we want to prevent that because if the kernel is trying to access this process is memory and the process is directly Changing it at the same time then obviously that could cause kernel bugs. And so when we were rolling out pan We hit multiple instances of this issue and And we know that this is a problem in partner in partner devices and in other Android devices other than the ones We directly work with I did have one one interesting story where we created a test that said in our compatibility test suite which all which all Android devices have to pass that Said that this was a requirement and we we had a an OEM that she'll She'll remain nameless. He said but I can't turn this on it causes my kernel to crash So we let them know that that was working as intended and they needed to fix that so I keep caveatting everything with User space accessible kernel vulnerabilities and unfortunately not all kernel volans are reached from user space So we've been discussing about the two-thirds of kernel bugs that are reachable from user space But about a third are not So let's let's get into these a little bit So I actually broke them down from where they are reachable from as well as the root cause of the vulnerability so Well, yeah, but that's actually so someone said it's Wi-Fi So part of the reason why I wanted to caveat earlier that Bug finding tends to be somewhat trendy is because that could be the case here, right? maybe If we ran the same analysis in a month, maybe it would be Bluetooth or maybe it would be USB but The the point I want to make here is Mostly that all of those lovely mitigations that we're talking about access controls hard and user copy pan things like that Those are just completely irrelevant here. We have kernel bugs and we have no we have absolutely no mitigations to prevent those kernel bugs and you know, yes, the Wi-Fi driver is a bit of a dumpster fire, but You know, I fully expect people will be looking at USB and find The same thing if the same amount of resources were we're put into that The other thing is is there's gonna be a nice talk on CIS caller. That's happening later things like Automated fuzzing through CIS caller also not looking here So I guess one of the positive things is that Kernel bugs that are reachable through the Wi-Fi firmware would first require that you have code running in the Wi-Fi firmware to reach To reach those right so so in some ways there there is some access control going on here And then of course the other the other issue and the reason why I wanted to discuss the lack of using harder lack of hardened user copy is that Clearly missing an incorrect bounds check is an even larger problem in this subset of bugs We really need something doing bounds checking on the heap Yeah And again looking at kind of the the trendiness of bug finding a large subset of these bugs are The crack vulnerabilities, which are just weaknesses in the WPA protocol one thing that I will say from an In Android perspective is that the only safe assumption on a network is that the network is untrusted So if you are relying on WPA or or you know encryption on one hop You were already in bad shape I Could probably create an access point call it Starbucks and half the phones in the room would connect to it Right like WPA is not keeping you safe. I think it's good that we patch it, but that's The only safe assumption is that the network is is untrusted Okay, so kind of a summary from this section is that user space to kernel We've got a lot of good tools there. Those are provided by the upstream kernel and they actually are fairly effective However about a third of the kernel bugs are reached by other vectors and it would be nice if we had both access good access control mechanisms as well as good good Like bounce checking for example or yeah the and then finally I wanted to Talk briefly about memory unsafety So So this is again about all kernel bugs not just User space or other vector reachable bugs, but clearly when we actually look at at Bounce checking the the the major problem is the heap and so we have some protections for the stack, right? we've got a stack protector and and Some other things that are going to be discussed later But really what we need our protections for it for the heap so that if you do something like Overwrite a function pointer. You don't just you haven't just immediately taken over control of of the kernel So with that Sammy is going to discuss some of the work that we're doing there, right? So I'm gonna say a few words about CFI control flow integrity which is the latest medication we added to Android kernels in Android 9 CFI helps protect against code reuse attacks It tries to accomplish this by adding runtime checks to ensure that the program's control flow stays within a pre-computed graph In practice LLVM CFI implementation, which we use Focuses only on protecting the forward edge for C programs. This basically means indirect branches Luckily the kernel has plenty of those so it's a decent starting place With CFI LLVM at the check before each indirect branch to Ensure that the target address points to the beginning of a valid function with the correct type This limits the number of potential targets where the kernel can jump Before we continue into more details, let's take a look at how effective LLVM CFI actually is in the kernel First of all CFI is a soft mitigation That's not alone going to prevent an attacker from exploiting a sufficiently bad kernel bug However Together with other current and future mitigations, it will make exploiting bugs more difficult This is a graph generated from an actual Android device kernel Which shows the number of potential call target CFI allows for each indirect call Without CFI an attacker who is able to modify a function pointer can jump anywhere but with CFI more than half of all indirect calls can branch all into a handful of functions now and 80% have at most 20 possible targets Of course due to the limitations of the function signature base approach We still have for the two most common function types in the kernel More than 1,000 possible functions where the kernel can jump But this applies to less than 1% of all indirect calls in the kernel and it still Limits the attackers options and it prevents them from jumping to an arbitrary gadget for example In order for the compiler to determine valid call targets, it needs to see the entire program Or at least all the relevant parts of the program in the kernels case the compiler won't see stand alone as simply code for example LVM solves the visibility problem by Requiring link time optimisation or LTO Where each compilation unit is first compiled into LLVM specific bitcode which is at link time combined and inspected all at once Unfortunately LTO somewhat complicates matters when it comes to the kernel Not only do we need to switch to an LTO over a linker, but we also need to use LLVM's integrated assembler for all inline assembly Because of this most of the issues we run into when adding LTO support to Android kernels were actually toolchain compatibility issues Some changes to kernel build spirits were needed but Those were greatly simplified by the upstream thin archives work which already removed all the intermediate linking steps We did have to use few LLVM tools for generating symbol tables for bitcode files for example, but But there were not many changes Here we have a simplified view of how LTO works with Clang In the kernel we have some code that's translated directly into object files, but the vast majority is compiled into bitcode Everything is added into a thin archive, which is then passed to the linker The linker look as the archive combines all the bitcode optimizes composite into native code and everything is linked together in the end we asked for feedback about LTO from kernel maintainers last year and Many of them expressed concerns about Possibly unsafe optimizations that might break the kernel's memory model for example during the past several months, we have tested LTO kernels extensively and On actual devices and we have not run into any issues that could be attributed to LTO In fact, we are confident enough in LTO that first Android devices running and LTO kernel Will ship later this year Once the problems with LTO were sorted out adding CFI support was relatively simple The biggest challenge was fixing all the benign CFI failures in the existing code See compilers don't currently enforce similar restrictions to CFI's runtime checks So there was a fair amount of coding the kernel that tripped CFI Which we had to first fix Another complication were kernel modules the compiler obviously Doesn't see all the modules, especially if they're compiled out of tree We adapted LLVM's cross-DSO CFI support to handle kernel modules Each module now has its own CFI check function which determines the valid code targets for that specific module And the kernel looks up the correct check function to call based on the target address Obviously this needs to have as little overhead as possible Which brings me to the point that everyone has in their mind when it comes to security communications, which is performance In our tests a Kernel compiled with LTO and CFI actually performed slightly better than the base kernel Obviously due to LTO's more aggressive optimizations And here we have an example of a CFI failure that we run into Here's a single function pointer that's used to call a large number of functions all with different argument types The compiler is naturally perfectly fine with this But the CFI runtime check fails We fixed this upstream a while ago with a cleaner solution that doesn't use mismatching function pointers This is what a compiler injected runtime check looks like on ARM64 Before an indirect call the compiler adds a call to a CFI check function, which validates the target address It's passed a hash of the expected type information and if the check fails it simply never returns And this is why it doesn't return it's when the CFI check fails we first print out the target address to help us pinpoint the issue and When CFI is enabled in normal mode it panics the kernel immediately We also added a permissive mode Which changes the panic into a warning instead which makes it easier to debug these failures Especially if they occur in early boot for example But it should be noted that the permissive mode has absolutely no security benefits and it should only be used during testing or device bring up Our CFI implementation is available right now in Android kernels 4.9 and 4.14 Which you can find from AOSP It's only for ARM64 at the moment you also need a reasoned enough clang and been used to compile the kernel and For anyone interested in testing this these are the config options you need to enable and finally if you know it's about future work Since LLVMC finally protects forward branches forward edge We are looking into other solutions for also protecting return addresses better We previously looked into LLVM safe stack which Works, but due to memory overhead core concerns, so we are now focusing on the newer shadow call stack medication instead And because of the numerous problems we run into with the gold linker We're also looking into replacing it with LVM's LLD Which hopefully reduces the compatibility issues we run into a little bit and that's all I have. Thank you So I had a question about previous topic about attack surface reduction I've noticed one thing that in that one of the things that Cost bugs in kernel was binder the IPC mechanism, right? So how does the attack surface reduction in binder work? Is it just adding explicit checks in Code that connects on the other side of the IPC or is it something more like with acilinux or something like that? Do you mean? Vulnerabilities from binder to the kernel or from binder to other processes no no because the talk was about kernel So from binder to kernel. So there's basically no attack surface reduction from processes to binder And I think we had one binder vulnerability in the last year So since all apps have to be able to use binder that that particular vulnerability was accessible so that that that was included in the group of of Unprivileged reachable vulnerabilities any other questions Did you guys look at GR securities wrap? Design implementation at all. Yes, we did but Since we are using clang for our kernels. So we decided to go with LVM's medications instead. Hi there I really appreciate the data. It's nice to see it presented like that You mentioned twenty five percent of the unprivileged reachable Bugs were from GPU drivers What kind of help support instructions? Smack on the head. Do you give to said vendors? So that they can improve the security of their GPU drivers Yeah, so a couple of things that we're doing first of all I like having the slide up there because I can kind of use that to go tell people that hey, this is a problem so a couple of things that So a couple things that we can look at one nice thing about attack surface reduction is by Reducing the available attack surface. We can actually focus resources on the remaining surface better, right? So we can put For example fuzzing resources there Or code review resources there whereas if we had the entire kernel that would be a bit overwhelming but other other topics that I am hoping to Be able to use this data for are things like maybe GPU access maybe the GPU should not be directly accessible from unprivileged apps, right? So we've moved all of these other things out of Our unprivileged sandboxes and that's kind of one of the few that's hanging around So maybe we need to start looking at that as well So the Android team has done a lot of work to make it much easier to do update kernels Which is you know fantastic for the ecosystem? Have you noticed that having any impact on the number of bugs in terms of you know either good bad ugly? What do you mean by update kernels? You know faster being able to Get more update kernels such as opposed to being stuck on an older version just because OEMs just don't update them and requiring newer ones With things like project trouble and everything like that. Yeah, so the the kernel requirements now Which did not used to exist are for example. I think when you launch if you launch a device with Android P You have to have a 4.9 kernel or newer on it What that doesn't mean is that doesn't mean that they will then ever update the kernel version for that device so I You're right in that No longer will devices be launched with these ancient kernels the way they sometimes were in the past But yeah, there's there's no requirement that they actually update the kernel version on on a device once it's been launched So but yeah, it helps and from from our perspective We're getting all these nice mitigations and upstream kernels And so it's nice to know that we have to do less work and that partners have to do less work because those mitigations are just built Right into the kernel One of the reasons why I like to present data like like I've been presenting Particularly at this conference is that I know people are able to take that data and actually use it to justify work And I know that for example people from arm have been able to say Oh, we can see that the the heap is a real problem and that Buffer overruns in the heap are a major issue So let's let's use that data and justify putting resources towards that Any other questions? Alrighty. Thank you