 Okay everyone, so now please join me in welcoming Eric, who's a PhD student at the FU in Amsterdam and he will talk about ASLR. Please give him a warm round of applause. Hello, like the Herald said, I'm Eric, a PCA student at the FU in Amsterdam, at the FUSAC group and I'll be presenting work that we have done in the group today. But the work I'm presenting, most of the work has been done by Ben and Kafe and by Stefan, who showed that the attack that I'm presenting is applicable to to all 22 micro CPU micro architectures that he's tested, so quite a student. I tried to sneak this slide in all my talks, but this time is especially apt because this talk is about finding them. So this talk is about attacking ASLR, which is short for address space layout randomization. It's an exploit mitigation technique, which as far as deployment concern is one of the success stories. Since it's been introduced, it's been widely adopted and it makes exploitation somewhat more difficult. The way ASLR makes it more difficult is that it changes the location of code and data. Usually every time the processor runs, so that an attacker cannot rely on certain addresses to be the same all the time. So on the modern 64-bit architectures, the address space usually is 48 bits, which means you can address about 256 terabytes of memory. Of course, this memory is not, you cannot write everywhere or read everywhere because your computer probably doesn't have that much memory. So in reality, only a very small portion of the memory is allocated to a process and so it's quite easy to change the location of this memory. So it makes life for the exploit tiny bit more difficult because it's very useful to know the location of data. For example, if you want to override the return address on the stack, then it's nice to know where you can jump to. Of course, if you don't know, you might jump into nowhere and then the program crashes. However, not much is needed to to afford this mitigation. You just need to leak the location of the memory. So I really like this background name. So you can try to reuse the bug that you can use to first leak information and then exploit or if that's not possible, you'll have to find another bug which allows you to leak this location or maybe you don't have to. So this presentation is about an attack which uses a side channel from JavaScript on processes in the hardware itself to discover information about locations of data or code in memory. So the modern CPU architecture is a wondrous abstraction layer. So even if you as a programmer write machine code, there's lots of stuff you don't have to worry about, especially stuff to make your programs fast. Memory accesses are very slow compared to your CPU on modern computers and that's why there's a cache mechanism built in. So other things are also abstracted away. For example, if your program does a memory access, the data is written to the cache, but where is it written? Your program gives a virtual address to, gives off a virtual address to the CPU, which then the CPU needs to translate that to a physical address, which is done by a component called the memory management unit. So the memory management unit has a small cache of mappings from pages to, from physical memory to, from virtual memory to physical memory, but if that's not, if an address is not in the cache, it has to do a page table walk. And the page table walk is what we are going to try to attack and we'll use, we'll measure the effect that page table walk will have on the L3 cache, the last and biggest cache in the CPU to find out what's happening in the page table walk. So we're talking about doing a timing attack from JavaScript and to measure whether memory gets accessed, so which means that we need a pretty good timer to be able to do this. Luckily for us, the browser standards committees have come up with an API to just do that. So you can take a timestamp, do an operation, and then take another timestamp, and then you get a very crisp time measurement. Until someone published a paper which, which showed basically that you can do a last level cache attack on the CPU and discover some things. So the browser makers made the measurements much more granular, so every microsecond or so you get a little bump and then for one microsecond nothing, nothing changes. But all is not lost for the attacker because you can now do, yeah, you can turn the coarse-grained timer into a fine-grained timer. What you can do for example is wait for this bump to happen and then quickly do an operation and then start a counter. And then depending on how long the operation takes, the longer the operation takes, the smaller the count is when the jump happens. So in Chrome they chose to vary the length of the time when this happens, so, but still you can do multiple measurements and then you take an average and then you can still get a good measurement. However, we can do better. So the browser makers decided to make this a bit more difficult, but where the, where the standards commit, when the browser standards committee take us, they also give us, so they decided to implement an object called the shared array buffer, which allows multiple threads, which are called web workers in JavaScript, to work on a single piece of memory. And they decided to enable this by default, which is actually after we published the attack. So I don't, yeah, they basically have given up on, given up on preventing nanosecond scale time measurements in JavaScript. So the shared array buffer can be used for other things, but I'll not talk about this today. So how can we, how can we measure time using shared memory? Well, it's quite simple. One thread is used for doing the timer measurement. And the other thread does the operation. And then so when the, when the thread, which is the operation, the timer thread waits until the thread, which does the operation is ready to do the operation. And then it, yeah, it sets a variable and does, starts the operation. Meanwhile, the counter thread, the timer thread sees that, that the shared buffer has changed and will start counting. And then when the operation is done, the second thread changes the buffer again. And then the counter thread stops. So this gives a very crisp measurement. So now we have a nanosecond scale timer. And we can do side channel attacks from JavaScript. So we'll be doing a timer attack by, on the last level cache. And when the CPU accesses, accesses memory, everything is on this, on the granularity of a cache line, and which is 64 bytes. Within, for example, the level three cache, a certain physical address maps onto a certain cache set. And this cache set can, for example, on a four core desktop Intel machine, contain 16 different cache lines. And I'll, I'll talk about a modern Intel machine, but the concept translates to, yeah, also to other micro architectures. So there, per core, there are 2048 cache sets, which are called a slice. And, yeah, so they're on a four core machine, you have four slices. And these slices are shared among all the cores, but it's just the way Intel has organized their cache. So to get the cache set ID, so which cache set within the slice is used, you take the physical address and then you discard the six bits, which are basically telling you which byte within the cache line you use, and then take the 11 bits next. And that is the cache set. So it's basically a round robin mapping of the physical map memory. The cache slice is some complicated hash function and for this attack we don't need it. So we're lucky. The important thing to remember is that the cache, if two cache sets, cache lines in physical memory map to the same cache set, they have the same offset, yes, so the same physical address if you only regard the bits, which are, yeah, so they only regard the lowest bits so that every one, they have the same address, modulo 128k kilobytes. So from which follows that they also must have the same address modulo four kilobytes. And which happens to be the size of a memory page, which is the base unit of memory management on almost all architectures I guess in use. So now we know this, we can do a cache side channel attack and there are multiple attacks possible and we'll probably use the most simple one called effect and time. So the code to do the effect and time attack is also quite simple. You have an effect function which uses a buffer and just accesses cache lines which map into a certain cache set and we can just do that by accessing the cache line which at a certain offset of a page which we've just seen. And at this point all the cache lines should be filled with our data. So then we proceed to do an operation, we take a timestamp, we do an operation and we time how long this takes. If this operation needs to do something with a physical memory cache set which maps into this a cache line in physical memory which maps into this cache set, it will take longer because we'll have to do a memory access and memory accesses are really slow compared to the CPU or cache hits on modern computers. So this way we can see if this operation depends on the physical memory location which maps into this cache set. So how does this apply to the page table walk which we'll attack. So page tables are a mechanism for processes to address a really large address space while only having a relatively small amount of physical memory. So it's basically a tree structure with tables at every level which divide up the address space into equal parts. So the first level on Intel which is called the fourth level because there are four levels. So the fights address space up into chunks of 512 gigabytes. The next level which so there could be 512 entries but they don't have to be. They fight the address space into one gigabyte chunks then into a megabyte chunks and then lastly into four kilobyte chunks which is the granularity of a page. So each entry in there points to a memory page a physical memory page. So what the page table walk process does is it takes the address and we'll look at this, we'll use this, we'll use a binary representation of this address because that's easier, makes it easier to show the process where all where the one bits are black and the white bits are zero. So say there's a TLB miss so we'll have to do the page table walk. Now there's a special register in the CPU which points to the the first page table. Then the page table, the hardware looks at the nine most significant bits in the address and then uses that as an index pointer into the table. Then it does the same with the next level and with the next level and with the next level up until the level where we know the actual page and then the 12 bits at the end will be used to point into this page. So this is a 4k page but and so we we can use this page to do a side channel attack but what we but the observation is that the other the page tables themselves are also pages. So yeah each of them are pages and so we can also do a side channel attack on these pages. So let's take a look of what we can discover this way so we can find out that a certain page gets a hit. Now there are eight possible entries which would cause this hit so we don't know that much but so of all but if we look at these pages we can see that six bits are the same. So we can we now know that there is a sequence of six bits in this address that has this value that we discovered by doing the cache timing side channel. Now there are four levels of pages and there is a final location into the page so we get five cache colors. We assume we know the last level the last location inside the page because that's in practice pretty easy to reverse engineer and even if you can't there are other side channels to find this out. In fact we use this final location to do the side channel attack on all the other locations. So or actually we try to not get in the way. So now we have found four cache lines which may be used for the page table walk. So we can see that there are four chunks of three bits that we know nothing about and what we also don't know is which cache line is used for which level of page table so that would give us about 16.6 bits of entropy left so that's not a lot that we have gained we still need to find 16 bits. However there is kind of a trick so we have a technique called sliding where we allocate a large enough buffer and then we just probe pages one after another time so we just add yeah try the next page try the next page try the next page looks a bit like this so for the last level cache page table we just try the next page the next page the next page and then when it switches we know okay this this was a page cache line boundary so the next for the next page we know that the lowest bits of this entry are all zero because it's just went over the boundary this technique we can also do for the the second level page tables this time we add two megabytes two megabytes two megabytes two megabytes still not that that's problematic to do from JavaScript and then we get the the whole yeah the whole second level page table entry so how much entropy is left so well we have got two chunks of three bits and we don't know which page table which cache cache line belongs to which page table level so we're left with seven bits and this is so I think for this there's not a lot to do about this if we want to have good timers in JavaScript but we can actually do more because in practice stuff stuff doesn't yeah yeah in practice stuff isn't optimal so yeah so actually nowadays we're starting to be able to allocate one gigabyte allocations in JavaScript which is probably because you want to run a real-time environment in there or something but then for the last level it's it's kind of too much 512 gigabytes allocations and then you might have to do this up to up to eight times so maybe not however actually for example on fire for Firefox on Linux if you allocate a certain type of object called an array buffer firefox doesn't initialize the memory and it just asks the kernel for the memory and then just leaves it there and what the Linux kernel does is it doesn't initialize it and so it doesn't have to map in pages in the page table structure so it just and it doesn't use up any memory as long as you don't touch it and we don't have to touch it we just have to go over and touch one page or a few one page at the very end so this seems seems to yeah so actually on Linux you can it turns out you can allocate huge chunks of virtual memory and actually within seconds two minutes you can calculate a whole you can basically do the sliding attack and flip the cache line on the highest level page table in chrome chrome does initialize memory which is a bit unfortunate for us but but what it does is it divides memory up into up in heaps and when the heap is full or it decides okay maybe for security reasons i'll need to create a new one it's just to increase ASLR actually it tries to allocate a huge well it it tries to leave a huge gap between the previous heap and the new heap which means we can move forward quickly in the address space so using this method of creating new heaps or we can uh we can uh recover the third level address bits which would leave us with three bits of entropy left and on windows one or zero i am not sure um and uh but um but doing the attack on the fourth level will would take a lot of time and maybe the person uh because chrome needs to initialize and free lots of memory and it just takes time and your laptop gets hot so maybe the user will click away and uh no ASLR for you uh recovery for you so uh this attack uh was implemented on the skylake machine but uh uh has been verified that it works uh on the 22 machines by uh testing the the side channel in a on a native c uh with a native c program uh so yeah so time uh for a demo video so so here we have obviously the browser so these uh these bits are the role measurements and what you see here are signals uh detected by a sulfur which would which kind of try to is tries to find the the most likely uh the most likely values uh for the attack so looks really pretty matrix style and then yeah the solvers try to get confidence on what they're measuring up to a certain point and then they decide okay it's clear and then uh attack attaching gdb to verify the address we can see that the address uh is a location of memory that we know and because and because of that we left left a marker there so that's uh so yeah so uh in conclusion it's uh possible to to recover quite a lot of uh address information from javascript using a hardware side channel alone on the memory management unit and apparently browser vendors have given up on this or any other side channel attacks because you can't have uh you you can't have multiprocessing with shared memory without this and apparently that's the way the direction we're going so yeah any questions as always please line up at the microphones uh we'll start with number one please yes uh have you looked at actual browser bugs and uh looked at how many are exploitable with just a single leak pointer i mean usually you need at least two you're a code pointer and some somewhere to to write to actually gain a control of the execution flow so in in our attack we um also have a way to leak a code pointer so um we first leak a data pointer and then we create create uh lots of uh javascript pages uh jitter javascript uh code and then we um uh yeah so so leaking the um these most significant pages is really hard to uh to do for code but the the last level the the ones uh yeah the the page tables which are um um yeah which uh point to the actual pages and uh i think also the levels the level one level above is actually pretty easy to well easy it's to do it's doable to to leak using this technique could we get mic number four please uh so you just criticized the uh browser uh creators for the um well not trying to mitigate all these security issues uh what would you as uh as an ASLR hacker recommend the browsers to do what all the measures they could take that they have not yet taken so um yeah so there's also a discussion of whether ASLR uh whether yeah so one one of the things that make uh make this attack quite easy is that uh browser makers have have tried to prevent against another problem called use after free vulnerabilities by allocating more memory every in a different region every time because if you can free memory and then uh allocate yeah if you allocate something else and then the a bug in the browser will use it as if it were the old uh old object for example then you can do usually can do quite bad stuff so i guess so so yeah so you can think of mitigations against this technique but they might work against uh the mitigations against use after free so the question also becomes uh do we uh yeah so so is it still worth it um so yeah also because uh there are only seven bits that are not inherent to the architecture so yeah the question is in an environment where you can run JavaScript is it is ASLR uh yeah it might help a bit and sure uh Ben and Kave spend lots of hours on implementing this so an expert writer might uh yeah uh yeah it will still be extra effort if there's no easier way but uh yeah the question yeah i'm not sure though could we get a question from the internet please given that ASLR is only meant to help protect against remote attacks how useful is your approach when an attacker cannot exercise the MMU so the attacker would always exercise MMU but i guess uh uh choosing the location is is harder and um uh and the timing is of course over the network way more difficult so i don't really yeah so but the thing is that ASLR is used for local attacks and this uh against local attackers and this shows that it's uh that that are inherently inside the architecture problems with this and that it's not that useful apparently can we get mic number one please so um recently i saw a vulnerability about the um security procedure Loki hard which was for iOS and in your presentation i saw of course the apple cortex sorry the samsung cortex a7 which is the iPhone 5 processor um what i'm wondering is if i create an array in java script containing my shellcode um am i able to with this attack to get the address of that specific array or is that impossible so so so yeah so usually i guess shellcode itself won't be executable in an array but so so you will get location of memory that you control completely if that's what you are that's actually what i meant yeah cool um is there a puc available or is this kept closed source and i don't think we usually don't release the attack uh code we do describe we have to describe we describe it in the paper so uh it's uh yeah uh it should be reproducible using the paper right thank you and we'll just stay at mic one i didn't get the graphic you showed us in javascript what were the colors and but for you trying to stay there in the y and x axis uh that's okay you say stop or in your video for example or oh oh yeah so um so i i'm not completely sure uh about uh about about the first one might be raw measurements the the second one uh is are are the the four i think well are the the page tables that that that we try to the the the lookups that we try to find yeah so you yeah and more from mic mic one please hello um i was just wondering if you guys actually tested this technique on public clouds or any any multi-tenant architecture so this attack is uh uh when the browser so um yeah we we tested it also for a client a browser client i mean like i would like to continue actually if if the if it's okay we have a lot of time all right so uh i was i actually i read a couple of things about this my question was not really specifically about the aslr but uh more like how how could be like the mitigation techniques for like public cloud where each tenant has the right access to the cache line and how is that do you think the possible to to disclose the the cache line so yeah there there have been quite a few attacks uh using uh uh yeah using this technique from uh also just natively like you would do in a uh in a uh vpn environment for example uh it's the guy pps environment oh sorry you're on so there is a browser plugin from some uh researchers at to grads called javascript zero which aims to provide mitigation against uh side channel attacks from javascript uh have you heard of it and if so have you uh tried if it provides protection against the attacks you showed so i think um yeah so i haven't tried it but what i described i would say uh it would provide protection because it it it allows you to disable um disable stuff you don't want among which the shared array buffer um so so uh so so in principle the shared array buffer wasn't here before this year anyhow and uh so so i doubt lots of public code makes use of it um so so i think it's easy to uh to disable stuff you don't want there's so javascript they're basically adding stuff all the time uh so to make it suitable for for gaming and all kinds of sensors that you want and so and most code doesn't make use of it for example i wear i only use javascript on my in my browser when uh the page doesn't load without but uh yeah so i'm all for disabling lots of stuff that we don't need but uh the direction of the browser seems to go the other direction so i guess we'll see uh how it will turn out in the end so okay do we have any more questions no then please thank our speaker again