 We are here with a motto and the motto this year is works for me And I think who many people how many people in your programmers raise your hands or shout or Whoa, that's a lot. Okay. So I think many of you will work on x86 Yeah And I think you assume that it works and that everything works as intended and I mean what could go wrong? Our next talk the first one today will be by Clementine Maurice who Previously was here with row hammer J s something. I would call scary and Moritz live who has worked on the Armageddon Exploit back. What is it? Okay? So the next I would like to hear a really warm plus for speakers For the talk what could what could possibly go wrong with insert x86 instruction here. Thank you Thank you all for being here this morning Yes, this is our talk. What could possibly go wrong with insert x86 instruction here? So just a few words about ourselves. So I'm Clementine Maurice I got my PhD last year in computer science, and I'm now working as a postdoc at Graz University of Technology in Austria You can reach me on Twitter or by email, but there's also I think lots of time before the Congress is over Hi, and my name is Moritz lip. I'm a PhD student at Graz University of Technology And you can also reach me on Twitter or just after our talk and in the next days So about this talk so the title says this is a talk about x86 instructions But this is not a talk about software. Don't leave yet I'm actually even assuming safe software and the point that we want to make is that safe software does not mean safe execution And we have information leakage because of the underlying hardware, and this is what we're going to talk about today So we'll be talking about cash attacks. What are they? What can we do with that and also special kind of cash attack that we found this year So doing cash attacks without memory accesses and how to use that even to bypass kernel ASLR So again the talk says it's a talk about x86 instructions, but this is even more global than that We can also mount these cash attacks on arm and not only on x86 So some of the examples that you will see also applies to arm So today we'll Do have a bit of background but actually most of the background will be along the lines because this covers really huge chunk of Our research And we'll see mainly three instructions. So move and how we can perform these cash attacks. What are they? The instructions seal flush. So here we'll be doing cash attacks without any memory accesses Then we'll see prefetch and how we can bypass kernel ASLR and lots of translations levels And then there's even a bonus track. So it's this this will be not our works, but even more instructions and even more attacks Okay, so let's start with a bit of an introduction So we will be mainly focusing on Intel CPUs and this is Roughly in terms of course and caches how it looks like today. So we have Different levels of course a different course. So here four course and different levels of caches So here usually we have three levels of caches We have level one and level two that are private to each core Which means that core zero can only access its level one and its level two and not level one and level two of for example core three And we have the last level cache. So here if you can see the pointer So this one is divided in slices. So we have as many slices as core. So here four slices But all the slices are shared across core. So core zero can access the whole last level cash less zero one two and three We also have a nice property on Intel CPUs is that this level of cash is inclusive And what it means is that everything that is contained in level one and level two will also be contained in the last level Cash and this will prove to be quite useful for cash attacks So today we mostly have Set associative caches What it means is that we have a data that is loaded in a specific set and that depends only on its address So we have some bits of the address that gives us the index and that says okay The line is going to be loaded in this cash set. So this is a cash set Then we have several ways per set. So here we have four ways and The cash line is going to be loaded in a specific way and that will only depend on the replacement policy and not on the address itself So when you load a line into the cash usually the cash is already full and you have to make room for a new line So this is where the replacement replacement policy This is what it does. It says okay. I'm going to remove this line to make to make room for the next line So for today we're going to see only three instructions that I've been telling you so the move instruction It does a lot of things but the on the the the aspect that we're interested about it that can access data in the main memory We're going to see CL flush what it does is that it removes a cash line from the cash from the whole cash And we're going to see prefetch here prefetch a cash line for future use So we're going to see what they do and the kind of side effect that they have and all the attacks that we can do with them And that's basically all the SMB unit for today. So even if you're not an expert of x86, don't worry It's not just slides full of assembly and stuff Okay, so on to the first one. So we will first start with the morph instruction and Actually, the first slide is full of code However, as you can see there the morph instruction is used to move data from registers to registers from the main memory and back to the main memory And as you can see there are many moves you can use but basically it's just to move data and that's all we need to know In addition a lot of exceptions can occur So we can assume that while those restrictions are so tight that nothing can go wrong When you just move data because moving data is simple However, while there are a lot of exceptions the Data that is accessed is always loaded into the cache. So Data is in the cache and this is transparent to the program that is running However, there are side effects when you run these instructions and we will see how they look like with the morph instruction So you probably all know that data can either be in CPU registers in the different levels of the cache that Clementine showed to you earlier in the main memory on the disk and Depending on where the memory and the data is located. It leads a longer time to be loaded back to the CPU and This is what we can see in this plot. So we try here to Measure the access time of an address over and over again Assuming that when we access it more often it is already stored in the cache So around 70 cycles most of the time we can assume when we load an address and it takes 70 cycles It's loaded into the cache however When we assume that the data is loaded from the main memory We can clearly see that it needs a much longer time like A bit more than 200 cycles So depending when we measure the time it needs to take to load the address We can say the data has been loaded to the cache or the data is still located in the main memory and This property is what we can exploit using cache attacks. So we measure the timing differences on memory accesses and What an attacker does he monitors the cache lines, but he has no way to know what's actually the content of the cache line So we can only monitor that this cache line has been accessed and not what's actually stored in the cache line and What you can do with this is you can implement covert channels So you can allow two processes to communicate with each other evading the permission system what we'll see later on In addition, you can also do side channel attacks So you can spy with a malicious attacking application on benign Processes and you can use this to steal cryptographic keys or to spy on keystrokes and Basically, we have different types of cache attacks and I want to explain the most popular one the flush and reload attack in the beginning So on the left you have the address space of the victim and on the right you have the address space of the attacker Who maps a shared library or an executable that the victim is used into his own address space Like the red rectangle and this means that when this stator is stored in the cache. It's cached for both processes now The attacker can use the flush instruction to remove the data out of the cache So it's not in the cache anymore. So it's also not cached for the victim Now the attacker can schedule the victim and if the victim decides. Yeah, I need this data It will be loaded back into the cache and Now the attacker can reload the data measure the time how long it took and then decide Okay, the victim has access to data in the meantime or the victim has not access the data in the meantime and By that you can spy if this address has been used The second type of attack is called prime and probe and it does not rely on the shared memory Like the flush and reload attack and it works as the following So instead of mapping anything into its own address space the attacker Loads a lot of data into one cache line here and Fills the cache and now he again schedules the victim and the schedule can access data that maps to the same cache set so the cache set is used by the attacker and the victim at the same time and Now the attacker can start measuring the access time do the addresses he loaded into the cache before and When he accesses an address that is still in the cache It's faster, so it measures a lower time and if he if it's not in the cache anymore It has to be reloaded into the cache So it takes a longer time and he can sum this up and detect if the victim has loaded data into the cache as well So the first thing we want to show you is what you can do with cash attacks is you can implement a covered channel and This could be happening in the following scenario So you install an app on your phone to watch your favorite images you take to apply some filters and In the end you don't know that it's malicious because the only permission it requires is to access your images which makes sense So you can easily install it without any fear In addition you want to know what the weather is outside So you install a nice little weather widget and the only permission it has is to access the internet because it has to load the information from somewhere so What happens if you're able to implement a covered channel between do these two applications Without any permissions and privileges so they can communicate with each other Without using any mechanisms provided by the operating system, so it's hidden It can happen that now the gallery image can send the image to the internet which will be uploaded and Exposed for everyone. So you maybe don't want to see the cat picture everywhere and While we can use this with those prime and probe flush and reload attacks We will discuss a cover channel using prime and probe. So how can we transmit this data? So we need to transmit once and zeros at some point so the sender and the receiver agree on one cache set that they both use and the receiver probes the set all the time and When the sender wants to transmit a zero. He just does nothing so the Lines of the receiver in the cache all the time and he knows okay He's sending nothing so it's a zero on the other hand if the sender wants to transmit the one He starts accessing addresses that map to the same cache set So it will take a longer time for the receiver to access its addresses again, and he knows okay the sender just send me a one and Clementine will show you what you can do with this cover channel So the really nice thing about prime and probe is that it has really low requirements so it doesn't need any kind of shell memory and for example if you have Two virtual machines you could have some shared memory via memory deduplication The thing is that it's highly insecure. So cloud providers like Amazon easy to they disabled that Now we can see use prime and probe because it doesn't it doesn't need this shell memory Another problem with cache cover channel is that they are quite noisy So you have other application that are also running on the system They are all competing for the cache and they might like evict some cache lines especially if it's an application that is very memory intensive and you also have Noise due to the fact that the center and the receiver might not be scheduled at the same time So if you have your center that sends all the things and the reserve is not scheduled then Some part of the transmission can get lost So what we did is we tried to build an error-free cover channel so we take we took care of this all these noise issues by using some error detection to Resynchronized the center and the end and the receiver and then we use some error correction to Correct the remaining error. So we managed to have a completely error-free cover channel Even if you have a lot of noise So let's say another virtual machine also on the machine send serving files through a web server also doing lots of memory intensive Tasked at the same time and if the cover channel stayed completely error-free and around 42 75 kilopytes per second Which is still quite a lot all of these between virtual machines on Amazon EC2 and They're really need seeing that we we we wanted to to do something with that And basically we managed to create an SSH connection really over the cache. So they don't have any Network Between between them, but just they're sending the zeros and the ones and we have an SSH connection between them So you could say that cash cover channel nothing, but I think it's a real threat And Yeah, if you want to have more details about this work in particular It will be published soon at NDSS So the second application That we wanted to show you is that we can attack crypto with cash attacks So in particular we're going to show you an attack on AES and a special implementation of AES that uses T-Tero. So that's a fast software implementation Because it uses some pre-computed lookup tables So it's known to be vulnerable to side channel attacks since 2006 by Osweget Al and it's a one-round known plain text attacks So you have P or plain text and K your secret key and The the AES algorithm what it does is that is compute an intermediate state at each round R and on the first round the accessed table indices are just P X or K Now it's a one. It's a known plain text attack What it means is that if you can recover the access table indices you've also managed to recover the key because it's just a go So that would be bad right if we could Recover this accessible indices well we can With cash attacks. So we did that with flush and reload and with from and probe so on the X axis you have the plain text byte values and On the Y axis you have the addresses which are essentially which T table the T table entries So a black cell which means that we've monitored The cash done and we've seen a lot of cash hits. So this basically the black areas to show We are that the T table entry has been accessed and Here it's a toy example. The key is all zeros, but you would have Basically just a different pattern if the key was not all zeros and as long as you can see this nice diagonal or a pattern Then you have recovered the key So it's an old attack 2006 it's been ten years Everything should be fixed by now and you see where I'm going It's not So on Android the bouncy castle Implementation uses by default at the T table. So that's bad and Also many implementation that you can find online uses pre-computed values. So maybe be wary about this kind of attacks The last application we wanted to show you is how we can spy on keystrokes So for that we will use flush and reload because it's a really fine current attack We can see very precisely which cash line has been accessed and a cash line is only 64 bytes so it's really not a lot and we're going to use that to spy on keystrokes and we even have a small demo for you So what you can see on the screen? This is not on Intel x86 It's on a smartphone on the galaxy s6, but you can also apply this cash attacks there So that's what we want to empathize So on the left you see the screen and on the right we have connected the shell With no privileges and permissions so it can basically be an app that you install from the app store and on the right we are going to start our spy tool and On the left we just opened the messenger app and whenever the user hits any key on the keyboard Our spy tool takes care of that and notices that also if you press is the space bar We can also measure that if the user decides, okay I want to delete the word because he changed his mind We can also register if the user pressed the backspace button So in the end we can see exactly how long the words where the user typed Into his phone without any permissions and privileges Which is bad? So Enough about the moth instruction. Let's head to seal flush So this seal flush instruction What it does is that it invalidates from every level The cache line that contains the address that you pass to this instruction So in itself, it's kind of bad because it enables the flush and reload attacks that we showed earlier That was just flush reload and the flush part is done with seal flush But there's actually more to it. How wonderful So there's a first time in leakage with it So we're going to see that the seal flush instruction has a different timing depending on whether the data That you that you pass to it is cached or not So imagine you have a cache line that is on the level one By include with the inclusion property. It has to be also in the last level cache Now this is quite convenient and This is also why we have this inclusion property for performance reason on Intel CPUs If you want to see if a line is present at all in the cache You just have to look in the last level cache So this is basically what the seal flush instruction does it goes to the last level cache sees. Okay. There's a line I'm going to flush this one and then there's something that is okay. The line is also present Somewhere else. So then it flushes the line in level one and all level two So that's low Now if you perform seal flush on some data that is not cached Basically does the same goes to the last level cache See that there's no line and they can't be any this data can't be anywhere else in the cache because it would be in the last level Cache if it was anywhere So it does nothing and it stopped there. So that's fast So how exactly fast and slow am I talking about? So it's actually only a very few cycles. So we Did these experiments on different micro architecture? So centipridge IV bridge and has well And so it the different colors correspond to the different micro architecture. So the first thing that is already Cannophan is that you can see that you can distinguish the micro architecture quite nicely with this But the real point is that you have really a difference so the solid the solid line is when we perform the measurement on Seal flush with the line that was already in the cache and the dashed line is when the line was not in the cache and In all micro architectures, you can see that we can see a difference. It's only a few cycles. It's a bit noisy So what could go wrong? Okay, so Exploiting these few cycles. We still managed to perform a new cache attacks that we call flush and flush So I'm going to explain that to you So basically everything that we could do with flush and reload we can also do is flush and flush We can perform covered channels and side channel attacks Is stealthier than previous cache attacks? I'm going to go back on this one and it's also faster than previous cache attacks So how does it work exactly so the? Principle is a bit similar to flush and reload So we have the attacker and the victim that have some kind of shun memory. Let's say a shared library It will be shared in the cache The attackers start by flushing the cache line Then let's the victim perform whatever it does. Let's say encryption the victim will load some data into the cache automatically and Now the attacker wants to know again if the victim access this precise cache line and instead of reloading it is going to Flush it again and since we have this timing difference Depending on whether the data is in the cache or not it gives us the same information as if we reloaded it Except it's way faster So I talked about stealthiness. So the thing is that basically these cash attacks and that also applies to Rohamma, they're already stealth in themselves because There's no antivirus today that can detect them But some people thought that we could detect them with performance counters Because they do a lot of cache misses and cache references that happen when the data is flushed and when you reaccess memory Now what we thought is that yeah, but that also not the only Program steps to lots of cache misses and cache references. So we would like to have a slightly pedometric So these cash attacks they have a very heavy activity on the cache But they're also very particular because they are Very short loops of code if you take flush and reload this just flush one line Reload the line and then again flush reload that very short loop and that creates a very low pressure on the instruction Which is kind of particular for of cache attacks So what we decided to do is normalizing the cache events so the cache misses and cache references by events that have to do with the instruction TLB and There we could manage to detect cache attacks and Rohamma without having false positives So the system metric that I'm going to use when I talk about stealthiness So we started by creating a cover channel first we wanted to have it as fast as possible. So We created a protocol to evaluate all the kind of cache attack that we had so flush and flush flush and reload and primary probe and We started with a packet size of 28 doesn't really matter We measure the capacity of our cover channel and flush and flush is around 500 kilobytes per second Whereas flush and load was only 300 kilobytes per second. So flush and fresh is already quite an improvement on the speed Then we measured the stealth so at this speed only flush and flush was stealths And now the thing is that flush and flush and a flush and reload as you've seen they shared some similarities So for the cover channel, they also shared the same center on it as receiver is different And for this one at the center was not stealth for both of them Anyway, if you want a fast cover channel, then just try flush and flush that works now Let's try to make it stealthy completely stealthy because if I have the standard that is not stealths Maybe that will give away the whole attack. So we said, okay Maybe if we just slow down all the attacks then there will be less cash cash hits cash misses And then maybe all the attacks are actually stealthy. Why not? So we tried that We slowed down everything so flush and middle and flush and flush are around 50 kilopytes per second now from improv is a bit slower because it takes more time to prime and probe basically But still even with this slow down only flush and flush has its receiver stealths and We also managed to have the center cells now So basically whether you want a fast cover channel or a stealths cover channel flush and flush is really great Now we wanted to also evaluate if it wasn't too noisy to perform some Sight-channel attacks. So we did these side channels on the ASD table implementation the attacks that I've shown you earlier So we computed the number of encryption that we needed to determine the upper four bits of a key bite so here the lower the better the attack and Flush and middle is a bit better. So we need only 250 Encryptions to recover these bits, but flash and flash comes what? Comes quite close with 350 and from improv is actually the most noisy of them all needs five Close to five thousand Encryptions, so we have around the same performance for flush and flush and flush and reload And now let's evaluate the stealthiness again. So what we did here is we perform 256 million encryptions in a synchronous attack. So we really had the spy and the victim scheduled and We evaluated the stealthiness of them all and here only flush and flush again is stealths and While you can always load on a cover channel You actually can't really slow down the side channel because in a real-life scenario not going to say hey victim Wait for me a bit trying to do an attack here That won't work So there's even more to it, but I will need again a bit of background before continuing So I've shown you the different levels of caches and here I'm going to focus more on the last level cache so we have Here our four slices. So this is the last level cache and We have some bits of the address here that corresponds to the set But more importantly, we need to know where in which slice an address is going to be and that is given that is given by Some bits of the set and the track of the address that are passed into a hash function that says in which slice The line is going to be Now the thing is that this hash function is undocumented by Intel Wouldn't be fun otherwise So we have this as many slices as core then documented hash function that maps a physical address to a slice and While it's actually a bit of a pain for attacks It has it was not designed for security Originally, but for performance because you want all the access to be evenly distributed in the different slices for performance reasons So the hash function basically does it takes some bits of the physical address and output key bits of slice So just one piece if you have a 2-core machine 2 bits if you have a 4-core machine and so on Now let's go back to see a flush see what the relation with that So the thing that we noticed is that still flush is actually faster to reach a line on the local slice so if you have If if you're flushing always one line and You run your program on core 0 core 1 core 2 and core 3 you will observe that One core in particular on when you run the program on one core the CL flush is faster And so here this is on core 1 And you can see that call 0 2 and 3 It's it's a bit slower and here we can deduce that so we run the program on core 1 And we flushed always the same line and we can deduce that the line belong to slice 1 and What we can do with that is that we can map physical address to slices and that's one way to reverse engineer this addressing function That was not documented Funnily enough, that's Not the only way what I did before that was using the performance counters To reverse engineer this this function, but that's actually a whole other story and if you want more detail on that There's also not equal on that so the next instruction we want to talk about is the brief edge instruction and The brief edge instruction is used to deli CPU. Okay, please load the data I need later on into the cache if you have some time and In the end there are actually six different brief edge instructions brief edge t0 duty to which means CPU please load the data into the first level cache or in the last level cache whatever you want to use but We spare you the details because it's not so interesting in the end However, what's more interesting is when we take a look at the Intel manual and what it says there So using the brief edge instruction is recommended only if data does not fit in the cache So you can deli CPU please load data. I want to stream into the cache so it's more performant Use of software brief edge should be limited to memory addresses that are managed or owned within the application context So one might wonder what happens if this address is not managed by myself Sounds interesting Prefetching to addresses that are not mapped to physical pages can experience non deterministic performance penalty For example specifying a null pointer as an address for prefetch can cause long delays So we don't want to do that because our program will be slow So let's take a look what they mean with non deterministic performance penalty Because we want the right code software, right But before that we have to take a look at a little bit more background information to understand the attacks So on modern operating systems Every application has its own virtual address space So at some point the CPU needs to translate these addresses to the physical addresses actually in the DRAM and For that we have this very complex looking data structure so we have a 48-bit virtual address and Some of those bits mapped to a table like the VM level for table with 512 entries So depending on those bits the CPU knows at which line he has to look and If there is data there because the address is mapped He can proceed and look at the page directory point the table and surround for the town. So Is Everything is the same for each level until you come to your page table where you have four kilobyte pages so it's in the end not that complicated but It's a bit confusing because you want to know a physical address So you have to look it up somewhere in them in the main memory with physical addresses To translate your virtual addresses and if you have to go through all those levels It needs a long time so We can do better than that and that's why Intel introduced additional caches Also for all of those levels So if he wants to then translate an address Take a look at the ITLB and for instructions and the data tlb for data If it's there you can stop otherwise you go down all those levels and if it's not in any cache You have to look it up in the DRAM in addition The address space you have is shared because you have on the one hand the user memory and the other hand you have mapped the kernel for convenience and performance also in the address space and if your user program wants to access some kernel functionality like Reading a file it will switch to the kernel memory There's a privilege escalation and then you can read the file and so on so That's it. However you have drivers in the kernel and If you know the addresses of those drivers you can do code reuse attacks and as a counter measure they introduced address space layered randomization also for the kernel and this means that when you have your program running The kernel is mapped at one address and if you reboot the machine It's not on the same address anymore, but somewhere else So if there is a way to find out at which address the kernel is loaded You have circumvented this counter measure and defeated kernel address base layered randomization So this would be nice for some attacks in addition, there's also the kernel direct physical map and What does this mean? So it's implemented on many operating systems like OS X Linux also on the Xen Hypervisor and BSD but not on Windows But what it means is that the complete physical memory is mapped in additionally in the kernel memory at the fixed offset So for every page that is mapped in the user space There's something like a twin page in the kernel memory, which you can't access because it's in the kernel memory However, we will need that later Because now we go back to prefetch and see what we can do with that so prefetch is not a Usual instruction because it just tells the CPU. I might need that data later on if you have time load it for me if not The CPU can ignore it because it's busy with other stuff So there's no necessity that this instruction is really executed, but most of the time it is and And a nice interesting thing is that it generates no false So whatever you pass to this instruction your program won't crash and It does not check any privileges So I can also pass and kernel address to it and it won't say no stop You accessed an address that you are not allowed to access so I crash. It just continues Which is nice The second interesting thing is that the operand is a virtual address So every time you execute this instruction the CPU has to go and check Okay, what physical address does this virtual address correspond to so it has to do the lookup with all those tables we've seen earlier and As you probably have guessed already the execution time varies also for the prefetch instruction And we will see later on what we can do with that So let's get back to the direct physical Map because we can create an oracle for address translation So we can find out what physical address belongs to the virtual address Because nowadays you don't want that the user to know because you can craft nice row hammer attacks with that information and More advanced cache attacks. So you restrict this information to the user But let's check if we find a way to still get this information. So as I've told you earlier If you have a bad page in the user space mapped you have the twin page in the kernel space And if it's cached it's cached for both of them again So the attack now works as the following from the attacker you flush your user space page so it's not in the cache for the also for the kernel memory and Then you call prefetch on the address of the kernel because as I told you you still can do that because it doesn't create any faults So you tell the CPU, please load me this data into the cache Even if I don't have access to this data normally and If we now measure on our user space page the address again And we measure a cache hit because it has been loaded by the CPU into the cache We know exactly at which position since we passed the address to the function This address corresponds to and because this is at a fixed offset We can just do a simple subtraction and know the physical address again so we Have a nice way to find physical addresses for virtual addresses and in practice this looks like this following plot So it's pretty simple because we just do this for every address and at some point we measure a cache hit so there's a huge difference and Exactly at this point. We know this physical address corresponds to our virtual address The second thing is that we can exploit the timing differences It needs for the prefetch instruction Because as I told you when you go down this cache levels at some point you see it's here or it's not here so it can abort early and With that we can know exactly when the prefetch instruction apported and know how the pages are mapped into the address space so the timing depends on where the translation stops and Using those two properties and those information we can do the following on the one hand We can build variants of cache attacks. So instead of flush and reload. We can do for a flush and prefetch for instance We can also use prefetch to mount row hammer attacks on privileged addresses Because it doesn't do any faults when we pass those addresses and it works as well In addition we can use it to recover the translation Levels of a process which you could do earlier with the page map file But as I told you it's now privileged so you don't have access to that and by doing that you can buy bus address base layered randomization in Addition as I told you you can translate virtual addresses to physical addresses Which is now also privileged with the page map file and using that it Re-enables return to direct exploits which have been demonstrated last year On top of that we can also use this to locate kernel drivers as I told you it would be nice if we can Circumvent KSLR as well and I will show you now how this is possible So with the first oracle we find out all the pages that are mapped and for each of those pages We evict the translation caches and we can do that by either calling sleep which schedules another program or Access just a large memory buffer Then we perform a syscall to the driver So there's code of the driver executed and loaded into the cache and then we just measure the time prefetch takes on this address and In the end the fastest average access Time is the driver page So we can mount this attack on Windows 10 in less than 12 seconds So we can defeat KSLR in less than 12 seconds, which is very nice and In practice the measurements looks like the following So we have a lot of long measurements and at some point you have a low one and you know exactly that this driver region and Address the driver is located and you can mount those red to direct Attacks again, however That's not everything Because there are more instructions in Intel Yeah, so the following is not our work But we thought that would be interested because it's basically more instructions more attacks more fun So there's the RDC instruction and what it does that is requested random seat to the hardware random number generator So the thing is that there is the fixed number of pre-computed random bits and that takes time to regenerate them So as everything that takes time you can create a cover channel with that There's also F add and F mark which are floating point operations Here the running time of This instruction depends on the operands Some people managed to bypass Firefox same origin policy with an SVG filter timing attack with that And there's also the jump instructions So in modern CPUs you have branch prediction And brunch type a target prediction with that there it's actually been studied a lot You can create cover channel you can do such an attacks on crypto. You can also bypass channel a SLR and Finally there are TSX instructions which is an extension for hardware transactional memory support Which has also been used to bypass canal a SLR. So in case your natural canal a SLR is dead you have lots of Different things to read Okay, so on the conclusion now So have you seen it's actually more a problem of CPU design than really the instruction set architecture The thing is that all these issues are really hard to patch They are all linked to performance optimizations and we are not getting rid of performance optimization That's basically a trade-off between performance and security and performance seems to always win There has been some propositions to Against cash attacks to let's say remove the CFR instructions The thing is that all this quick fix won't work because we always find new ways to do the same thing without these precise instructions And also we keep finding new instructions that leak information so it's really Let's say quite quite a big topic that that we have to fix this So thank you very much for your attention. If you have any questions, we'd be happy to answer them Okay Thank you very much again for your talk and now we will have a Q&A and we have I think about 15 minutes So you can start lining up behind the microphones. They are in the gang race in the middle Except I think that one. Oh, no, it's a backup so it will work and while we wait I think we will take questions from our signal angel if there are any Okay, there aren't any so microphone questions. I think you have front Hi Can you hear me? Try again. Okay. Can you hear me now? Okay. Yeah, I'd like to know what exactly was your stealthiness metric Was it that you can't distinguish it from a normal process or so? Wait a second We have still Q&A. Could you quiet down a bit that would be nice So the question was about the stealthiness metric basically we so we we use the metric with the cash misses and cash references normalized by the instructions you'll be Events and we just found the threshold under which pretty much every every benign application was below this and Rohamma and catch attacks were after that so we we fixed the threshold basically that microphone Hello, thanks for a talk. It was great First question did you inform Intel before doing this talk? No, okay The second question. What's your future plans? Sorry, what's your future plans future plans? Well, what I did that is interesting is that we keep finding these more or less by accidents or manually, so having a Good idea of what's the attack surface here would be a good thing and doing that automatically would be even better Great. Thanks Okay, the microphone and the back over there the guy in white. Hi One question if you have like a demon that randomly Invalidate some cash lines, would that be a better countermeasure than disabling the cash? If invalidating cash lines would be better than Disabling the whole cash. So I'm Yeah, if you know which cash lines have been accessed by the process You can invalidate those cash lines before you swap those processes But it's also a trade-off between performance like you can also if you switch processes flush the whole cash And then it's empty and then you don't see any activity anymore But there's also the trade-off of performance with this Okay, maybe a second question if you there are some arm architectures that have Random cash line invalidations. Did you try those if you can see establish a channel there? If they're truly random, but probably you just have to make more measurements and more measurements and then you can average out the noise and Then you can do this attacks again It's like with prime and probe where you need more measurements because it's much more noisy So in the end we'll just need much more measurements So on arm, it's supposed to be pure the random at least it's what in the manual But we actually found nice ways to evict cash lines that we really wanted to evict So it's not actually that pseudo random. So even let's say if something is truly random It might be nice, but then it's also quite complicated to to implement I mean, you probably don't want to random number generator of trust for the cash Okay, thanks Okay, and then the three guys here in the microphone in the front. Oh My question is about a detail with the keylogger You could distinguish between space backspace and alphabet which is quite interesting But could you also figure out the specific keys that were pressed there? If so, how yeah, that depends on the implementation of the keyboard But what we did we use the n-word stock keyboard which is shipped with the Samsung So it's pre-installed and if you have a table somewhere in your code which says okay if you press this Exact location or this image. It's an a or this isn't be then you can also do a more sophisticated attack So if you find any functions or data in the code, which directly tells you okay, this is this character You can also spy on the actual key characters on the keyboard. Thank you Hi, thank you for your talk My first question is what can we actually do now to mitigate this kind of attack by for example switching off TSX or using ecc rum So I think the very important thing to protect would be like crypto And the good thing is that today we know how to build crypto that is resisting to such an attack So the good thing would be to stop employing Implementation that are known to be vulnerable for ten years Then things like keystrokes is way harder to to to protect so Let's say crypto is manageable. The whole system is clearly Another problem and you you can have different types of countermeasure on the hardware side But that would mean that Intel and arm actually want to fix that and that they know how to fix that I don't even know how to fix that in hardware Then on the system side if you prevent some kind of memory sharing You don't have flush and we don't anymore and probably is much more noisier Miss much more noisy. So it would be a new in improvement Thank you Do we have signal angel questions? No, okay, then more microphone. Hi. Thank you. I wanted to ask about The way you establish the side channel between the two processes because it was it would obviously have to be timed in way to Transmit information between one process to the other. Is there anywhere that you documented the the whole, you know It's actually almost like the seven layers or something like that Is there any ways that you documented that it would be really interesting to know how it work You can find this information in the paper because there are several papers on Cover channels using that. So the NDSS paper is published in February, I guess But the Armageddon paper also include includes the cover channel and you can find more information About how the back heads look like and how the synchronization works in the paper. Thank you One last question. Yeah. Hi You mentioned that you used Oswix attack for the AS side channel attack Did you solve the round AS round detection? And is it different to some scheduler manipulation? So on this one, I think we only did some synchronous attack So we already knew when the victim is going to be scheduled and we didn't have anything to do with schedulers All right. Thank you. Are there any more questions? No, I don't see anyone then thank you very much again to our speakers