 So first I'll start with our motivation. What motivated us to do this work? What was our end goal? And then we will go into the co-location detection, which was the first problem that we had to tackle. And after that we will get into the details of our key recovery attack and finally conclude with some discussion. So cloud computing. Cloud computing is outsourcing storage and computation needs to remote servers servers rather than local machines. In these cloud servers users share users rent virtual machines also called VMs and share the physical system. Sharing the physical system sounds extremely efficient, right? But it also sounds dangerous. So what about the security between these systems? Isolation. In the cloud Different operating systems as named here as guest OSs run in a sandbox environment inside the VM and the isolation between these VMs are maintained by the virtual machine manager also called the hypervisor. However, since the low-level resources are shared between these VMs, this can turn the scenario into a victim and spy where one of the VMs can spy on the other one. So is this isolation secure? Well, according to the recent literature, it is not. I would like to briefly mention three of the works. The first one. Hey, you get off my cloud. This is a seminal work by Ristampart et al. It was performed in 2009 and they showed that for the first time it is in fact possible to co-locate with a specific target in a public cloud. In this case, Amazon EC2. The second work is cross VM side channels and they're used to extract private keys. This is by Zang et al. It shows the feasibility of performing a cross VM attack destroying the isolation barriers between the virtual machines and it assumes a shared L1 cache, which is a strong assumption. And finally, PrimaProp on LSE by Liu Yerong and other researchers. It targets the last level cache. So this way, we can have an attack that works across cores and across VMs. And it attacks square of multiple exponentiation with and as well as sliding with no exponentiation. So in short, we have three motivation. We would like to collect fine-grained information on commercial cloud and in this case, we chose Amazon EC2 since it's the largest public cloud. In fact, it is so large that it has 10 times more computing power than the next 14 class services. Second, we would like to recover RSE keys and we selected Lipchik rate 1.6.2 as our target. It's the recently patched library and it is still widely used in the wild. So how do we know that it is still widely used by performing a quick scan in one of the Amazon EC2 regions? We discovered that 55% of the TLS hosts actually use outdated libraries that were not updated in the last two years. And finally, we want to do this in bulk. We do not rely on faulty random number generators or any misconfigurations. Before going into the details of co-location detection, I would like to make the distinction between the targeted attack and the bulk key recovery. In the targeted attack, the attacker goes after one specific target and therefore he has to make sure that he is in fact co-located with that target. But in the bulk key recovery scenario, the attacker only spins up multiple instances on the cloud and does not care who he is co-located with. So, as I mentioned earlier, RISCIM Partedal also performed co-location experiments in 2009. And when they did that, Amazon EC2 was much, much, much, much smaller. They launched multiple instances on the cloud and they checked if any of these instances were in fact co-located with each other. How do they perform the check? They use ping times between the instances to see if it points to possible co-location. They use the IP addresses of instances and hypervisors and they use disk drive performance to create a bottleneck and send signal between the VMs. Let's see how things are in 2016. In 2016, Amazon EC2 was much larger. Here, the pings are constant time, slower SSDs are replaced with high-speed, slower HDDs are replaced with high-speed SSDs and provisioned in a way that there is no bottleneck even though multiple users are heavily using their storage. And hypervisor IP is hidden. So, we can say that Amazon did their homework and they closed all the non-site channel attacks. So, we need new methods and therefore we developed two co-location detection methods. The first one is last-level cache covert channel. Now, remember that last-level cache is shared between cores. So, using this covert channel, we can communicate with VMs running on different cores. And the second one is software profiling in LSE. And since these are architectural site channels, they are difficult to prevent. So, let's start with the last-level cache covert channel. In most Intel caches, Intel CPUs, each core has its own private L1 and L2 cache and they have a shared L3 cache. So, by using prime and probe on a predetermined cache sets, different VMs co-located on the same machine can in fact create a covert channel. And figure here shows the histogram of last-level cache access times and the memory access times. As you can see, there is a clear distinction between the two and we can use this difference in access times to communicate between VMs. Now, drawbacks of this covert channel is it is noise-prone. Remember that we are doing this in public cloud, so we have many neighbors. In fact, on the instance type that we have used, we had a tank core machine with 20 threads running in parallel which meant that we had up to 20 co-located neighbors. Also, the variety of the load makes the noise harder filter. By variety, I mean that one of our neighbors could be running Apache web server while the other is doing media streaming. And we have observed during our experimentations that 40% of the last-level caches are highly noisy and not suitable for easy communication. Our second detection method is software profiling. We again use prime and probe to profile a portion of the last-level cache. We first create a baseline profile when the targeted code is not running and then we create another profile when the code is in fact running and compare these profiles. So, as I mentioned, we only profile a portion of the LLC. Keep in mind that in modern Intel CPUs, the last-level cache ranges between 3 megabytes and 30 megabytes. So scanning the whole last-level cache is not going to be practical. But luckily for us, with 4 kilobytes of regular pages, regular memory pages, we can get 12 bits of the address without being translated by the memory management unit. And since 6 bits of these 12 bits are byte offset and the remaining 6 are the set address, we have 32 candidates that the targeted data can go to. And since we have a tankware machine, we have 10 cache slices, which means that we have 320 set candidates where any targeted code can reside in. So we tested this method with our simplification of LibJCrypt and a yes-implementation, c-implementation of OpenSell. This is the result that we obtained when we targeted the multiplication function of RSA. Two graphs here show the difference of access times to specific cache sets, our candidate cache sets, x-axis represents the set number starting from 1 up to 320 and the y-axis shows the difference of clock cycles from the baseline profile. And as you can see, in both of these figures, which are taking from different virtual machines running in the same physical system, we can see that two of the cache sets were in fact used by the multiplication function and we can feel this use in both of them. So to recap things in co-location, for the targeted scenario, we can use the last level cache code channel or we can use the software profiling. On the bulk key recovery scenario, we do not need to use co-location detection. All the attacker has to do is to pin up multiple instances and collect traces from all of these instances. However, we can still use the software profiling to detect vulnerable software in the cases where you want to attack a specific version of a specific library. So let's now that we fixed the co-location detection problem, let's look into the details of the RSA key recovery attack. So we use prime and probe in the last level cache and how do we gain control over the last level cache? We use huge pages. Huge pages are a CPU feature that are allowed almost in any Intel CPU and all the public class that we have worked on so far. They are two megabyte memory pages. They reveal 20 bits of the address and since this address is the page offset, it is not translated by the memory management unit, meaning that when this address goes to the CPU and to the memory, the 21 bits of it will be constant and it will not be changed. Using this, we can create eviction sets. However, there is still one more problem. What about the cache slice selection? In Intel CPUs, in modern Intel CPUs, the last level cache is divided into different slices. So if you have a 10-core machine, you have 10 different slices in the last level cache while any of the slices can be accessed by any of the cores, it still provides an additional layer of abstraction for us. So without knowing the slice selection algorithm, we don't know which set slice pair our data will reside in. Even if we get the set number, we still don't know which slice it sits. And this can cause a slowdown of the attack significantly because instead of making 20 accesses to a primary set in a 20-way associated CPU, we would have to make 200 accesses to make sure that the data is in fact evicted. Our solution was to reverse engineer the cache slice selection algorithm for Intel Xeon 2060-17 version 2. It's a 10-core CPU. So using the slice selection algorithm, we could in fact create eviction sets easily and we only need 20 accesses which makes our attack fast and reliable. And this is the slice selection algorithm that we have recovered for the specific CPU. So our target crypto system, its Lipchikrips RSA implementation, it has 2048 bits of modular size, 5-bit sliding window and message blinding. Is this state-of-the-art? Well, it has been patched in February 2015 after the cache leakage was found and now the table accesses satisfy the constant execution flow. But as I mentioned before, it is still widely used. So our attack steps will be as follows. First, we will find the cache traces of the sliding window multiplicant accesses and then we will observe several explanations so that we can combine these traces and reduce the noise and we will align and process these observations. And alignment is an important problem here because we do not assume synchronicity with the victim. And finally, we run our error-correcting code to fix any remaining errors. So identifying the correct cache line for the multiplicant tables. Each slice has 2048 cache sets and since we have 10 slices in the last double cache, we have 20,000 cache sets. We are assuming that the victim is using regular 4 kilobyte pages which reveals 12 bits of the address which means that 32 can disinitialize and 320 in total. And we know that sets can be noisy since this is public cloud. In fact, they are so noisy that in some sets all we can see is this. And by this I mean y-axis represents the reload time difference between the base profile and the x-axis is just a time slot. So each vertical line here represents an access that was made by one of the co-located VMs, to possible 20 co-located VMs to this specific cache set. And this is another example. In here we have a much, much less noisy cache set. Even though there are some vertical lines, meaning that some other VMs access this set, it is much better than the previous one. So in our experiments we have observed this. Before the decryption starts, as you can see there are two peaks, co-located VMs made two accesses to this cache set. However, since the multiplication table is not in use by the RSA, there is not much else. But as soon as the first secret exponent is calculated, then we see heavy use of the set and that goes same for the second secret exponent. This confirms that the targeted cache set is in fact holds the multiplicant values. This is a row trace. Well, these are row traces that we have obtained in Amazon EC2. We have 11 different traces here. They all belong to the operations with the same secret key. And each vertical line represents an access, also meaning a reload time that is higher than the baseline profile. As you can see, it's completely unaligned. This is because we don't assume synchronicity with the target. And after alignment, we observe this. As you can see, it's still too noisy. Needs further processing and some filtering and noise reduction. We observe this where the red is the expected correct trace and the recovered trace is the blue. There is still very little remaining noise. In here, for example, you can see that both lines have four accesses, but in here we have three. We have only one access here, but we have two here. So it's still a bit noisy. So how do we perform the final key recovery? We combine traces and we know which cache set holds to which table value by measuring the distance from the table initialization. So that we know which cache sets hold X to the 5, X to the 7, X to the 9, etc. And then we recover D from noisy DP and DQ. And in order to do that, we use this algorithm that uses the public key of the target as well as the knowledge of the noisy DP and DQ to recover D in an efficient way. So in conclusion, we show that collocation can still be achieved in public clouds, in the largest public cloud in 2016. Caches provide a powerful site channel. It is, in fact, so powerful that we can perform key recoveries, even in cloud. As for countermeasures, there are many proposed hardware countermeasures. Most obvious is usage, less usage of cache, which is performance-wise not feasible. And as for software countermeasures, recent patches of well-maintained libraries actually do protect against these types of attacks. But as I mentioned before, software updating process has also a cost and users are not keen on updating their software. As for the crypto library authors, the constant execution flow is the most important thing here. There should be no secret-dependent branches or memory accesses or cache accesses that will leak any information. Before some press reaction, I would also like to point out that we have been in contact with Amazon Web Security Team before making this work public. They were very open and cooperative and they acknowledged that there is underlying physical leakage that is wide open. And shortly after this work, they encouraged their customers to use well-maintained and recently updated libraries to protect against these attacks. But as you would expect, 55% still not using new libraries. And a little bit. Thank you very much.