 cache attacks from the network. And the speaker, Michael Kurth, is the person who discovered the attack. And it's the first attack of its type. So he's the first author of the paper. And this talk is going to be amazing. We've also been promised a lot of bad cat puns. So I'm going to hold you to that. A round of applause for Michael Kurth. Hey, everyone. And thank you so much for making it to my talk tonight. My name is Michael. And I want to share with you the research that I was able to conduct at the amazing MUSE group during my master thesis. Briefly to myself, so I pursued my master's degree in computer science at ECH Zurich and could do my master's thesis in Amsterdam. Nowadays, I work as a security analyst at Infoguard. So what you see here are the people that actually made this research possible. These are my supervisors and research colleagues, which supported me all the way along and put so much time and effort in the research. So these are the true rock stars behind this research. So but let's start with cache attacks. So cache attacks were previously known to be local code execution attacks. So for example, in the cloud setting here on the left-hand side, we have two VMs that basically share the hardware. So they time-sharing the CPU and the cache. And therefore, an attacker that controls VM2 can actually attack VM1 via cache attack. Similarly, JavaScript. So a malicious JavaScript gets served to your browser, which then executes it. And because you share the resource on your computer, it can also attack other processes. Well, this JavaScript thing gives you the feeling of a remoteness, right? But still, it requires this JavaScript to be executed on your machine to be actually effective. So we wanted to really push this further and have a true network cache attack. So we have this basic setting where a client does SSH to a server. And we have a third machine that is controlled by the attacker. And as I will show you today, we can break the confidentiality of this SSH session from the third machine without any malicious software running either on the client or the server. Furthermore, the CPU on the server is not even involved in any of these cache attacks. So it's just there and not even noticing that we actually leak secrets. So let's look a bit more closely. So we have this nice cat doing an SSH session to the server. And every time the cat presses a key, one packet gets sent to the server. So this is always true for interactive SSH sessions, because as it's said in the name, it gives you this feeling of interactiveness. When we look a bit more under the hood what's happening on the server, we see that these packages are actually activating the last level cache. More to that also later into the talk. Now the attacker, in the same time, launches a remote cache attack on the last level cache by just sending network packets. And by this, we can actually leak arrival times of individual SSH packets. Now you might ask yourself, well, how would arrival times of SSH packing packets break the confidentiality of my SSH session? Well, humans have distinct typing patterns. And here we see an example of a user typing the word because. And you see that typing E right after B is faster than, for example, typing C after E. And this can be generalized. And we can use this to launch a statistical analysis. So here on the orange dots, if we are able to reconstruct these arrival times correctly, and what correctly means is we can reconstruct the exact times of when the user was typing, we can then launch this statistical analysis on the interarrival timings. And therefore, we can leak what you were typing in your private SSH session. Sounds very scary and futuristic, but I will demystify this during my talk. So all right, there is something I want to bring up right here at the beginning. As per tradition and the ease of writing, you give a name to your paper. And if you're following InfoSec Twitter closely, you probably already know what I'm talking about because in our case, we named our paper Netcat. Well, of course, it was a pun. In our case, Netcat stands for network cache attack. And as it is with humor, it can backfire sometime. And in our case, it backfires massively. And with that, we caused a small Twitter drama this September. One of the most liked tweets about this research was the one from Jake. And yeah, these talks are great because you can put a face to such tweets. And yes, I'm this idiot. So let's fix this. Intel acknowledged us with a bounty and also a CV number. So from nowadays, we can just refer it with the CV number. Or if that is inconvenient to you, during that Twitter drama, somebody sent us a nice alternative name and also including a logo, which actually I quite like. It's called Neocat. Anyway, lessons learned on that whole naming thing. And so let's move on. Let's get back to the actual interesting bits and pieces of our research. So a quick outline. I'm firstly going to talk about the background, so general cache attacks, then DDO and RDMA, which are the key technologies that we were abusing for our remote cache attack. Then about the attack itself, how we reverse engineered DDO, the end-to-end attack, and of course, a small demo. So cache attacks are all about observing a micro-architectural state, which should be hidden from software. And we do this by leveraging shared resources to leak information. An analogy here is safe-breaking with the stethoscope, where the shared resources actually air that just transmits the sound noises from the lock on different inputs that you're doing. And actually works quite similarly in computers. But here it's just the cache. So caches solve the problem that latency of loads from memory are really bad, which make up roughly a quarter of all instructions. And with caches, we can reuse specific data and also use spatial locality in programs. Modern CPUs have usually this three-layer cache hierarchy, L1, which is split between data and instruction cache, L2, and then L3, which is shared amongst the course. If data that you access is already in the cache, that results in a cache hit. And if it has to be fetched from main memory, that's considered a cache miss. So how do we actually know now if a cache hits or misses? Because we cannot actually read data directly from the caches. We can do this, for example, with Prime and Probe. It's a well-known technique that we are actually also used in the network setting. So I want to quickly go through what's actually happening. So the first step of Prime and Probe is that the packer brings the cache to a known state, basically priming the cache. So it fills it with its own data. And then the attacker waits until the victim access it. The last step is then probing, which is basically doing priming again, but this time just timing the access times. So fast access, cache hits, are meaning that the cache was not touched in between. And cache misses results in that we know now that the victim actually accessed one of the cache lines in the time between Prime and Probe. So what can we do with these cache hits and misses now? Well, we can analyze them. And these timing information tell us a lot about the behavior of programs and users. And based on cache hits and misses alone, researchers were able to leak crypto keys, guest-visited websites, or leak memory content. That's with Spectrum Meltdown. So let's see how we can actually launch such an attack over the network. So one of the key technologies is DDIO. But first, I want to talk to DMA because it's like the predecessor to it. So DMA is basically a technology that allows your PCIe device, for example, the network card, to interact directly on itself with main memory without the CPU interrupt. So for example, if a packet is received, the PCIe device can just put it in main memory. And then when the program or the application wants to work on that data, then it can fetch from main memory. Now with DDIO, this is a bit different. With DDIO, the PCIe device can directly put data into the last level cache. And that's great because now the application, when working on the data, just doesn't have to go through the costly main memory walk and can just directly work on the data for all or fetch it from the last level cache. So DDIO stands for Data Direct Io Technology. And it's enabled on all Intel server-grade processors since 2012. It's enabled by default and transparent to drivers and operating systems. So I guess most people didn't even notice that something changed under the hood. And it changed something quite drastically. But why is DDIO actually needed? Well, it's for performance reasons. So here we have a nice study from Intel, which shows on the bottom different times of nicks. So we have a setting with two nicks, four nicks, six and eight nicks. And you have the throughput for it. And as you can see with the dark blue, that without DDIO, it's basically stopped scaling after having four nicks. With the light blue, you then see that it still scales up when you add more network cards to it. So DDIO is specifically built to scale network applications. The other technology that we were abusing is RDMA. So it stands for Remote Direct Memory Access. And it basically offloads transport layer task to silicon. It's basically a kernel bypass. And it's also no CPU involvement. So application can access remote memory without consuming any CPU time on the remote server. So I brought here a little illustration to showcase with the RDMA. So on the left, we have the initiator. And on the right, we have the target server. A memory region gets allocated on startup of the server. And from now on, application can perform data transfer without the involvement of the network software stack. So you emit the TCP IP stack completely. With one-sided RDMA operations, you even allow the initiator to read and write to arbitrary offsets within that allocated space on the target. I quote here a statement of the market leader of one of these high-performance nicks. Moreover, the caches of the remote CPU will not be filled with the accessed memory content. Well, that's not true anymore with DDIO. And that's exactly what we attacked on. So you might ask yourself, where is this RDMA you're used? And I can tell you that RDMA is one of these technologies that you don't hear often, but are actually extensively used in the back ends of the big data centers and cloud infrastructures. So you can get your own RDMA-enabled infrastructures from public clouds like Azure, Oracle Cloud, Huawei, or Alibaba. Also, file protocols like SMB and NFS can support RDMA. And other applications are high-performance computing, big data, machine learning, data centers, clouds, and so on. But let's get a bit into detail about the research and how we abused the two technologies. So we know now that we have a shared resource exposed to the network via DDIO. And RDMA gives us the necessary read and write primitives to launch such a cache attack over the network. But first, we needed to clarify some things. So of course, we did many experiments and extensively tested the DDIO part to understand the inner workings. But here, I brought with me two major questions which we had to answer. So first of all is, of course, can we distinguish a cache hit or miss over the network? We still have network latency and packet queuing and so on. So would it be possible to actually get the timing rights? Which is an absolutely must for launching a side channel? Well, the second question is, then, can we actually access the full last level cache? This would correspond more to the attack surface that we actually have for attack. So the first question we can answer with this very simple experiment. So we have on the left a very small code snippet. We have a timed RDMA read to a certain offset. Then we write to that offset, and we read again from the offset. So what you can see is that when doing this like 50,000 times over multi-different offsets, you can clearly distinguish the two distributions. So the blue one corresponds to data that was fetched from my memory, and the orange one to the data that was fetched from the last level cache over the network. You can also see the effects of the network. For example, you can see the long tails, which correspond to some packages that were slowed down in the network or were queued. So on the side note here for all the side channel experts, we really need that right, because actually with DDIO, reads do not allocate anything in the last level cache. So basically, this is the building block to launch a prime and probe attack over the network. However, we still need to have a target, what we can actually profile. So let's see what kind of an attack surface we have. Which brings us to the question, can we access the full last level cache? And unfortunately, this is not the case. So DDIO has this allocation limitation of two ways. Here in the example, out of 20 ways, so roughly 10%. It's not a dedicated way, so still the CPU uses this. But we would only have access to 10% of the cache activity of the CPU in the last level bit. So that was not so well working for a first attack. But the good news is that other PCI devices, let's say a second network card, will also use the same two cache ways. And with that, we have a 100% visibility of what other PCI devices are doing in the cache. So let's look at the end-to-end attack. So as I told you before, we have this basic setup of a client and a server. And we have the machine that is controlled by us, the attackers. So the client just sends this package over a normal ethernet nick. And there is a second nick attached to the server, which allows the attacker to launch our DMA operations. So we also know now that all the packets or all the keystrokes that the user is typing are sent in individual packets, and which are activated in the last level cache through DDIO. So but how can we actually now get these arrival times of packets, because that's what we're interested in? So now we have to look a bit more closely to how such arrival of network packages actually work. So the IP stack has a ring buffer, which is basically there to have an asynchronous operation between the hardware, so the nick, and the CPU. So if a packet arrives, it will allocate this in the first ring buffer position. On the right-hand side, you see the view of the attacker, which can just profile the cache activity. And we see that the cache line at position 1 lights up. So we see an activity there. Could also be on cache line 2. We don't know on which cache line this will actually pop up, but what it's important is what happens with the second packet. Because the second packet will also light up a cache line, but this time different. And it's actually the next cache line as from the previous package. And if we do this for three and four packets, we can see that we suddenly have this nice staircase pattern. So now we have a predictable pattern that we can exploit to get information when packets were received. And this is just because the ring buffer is allocated in a way that it doesn't evict itself. It doesn't evict if packet 2 arrives. It doesn't evict the cache content of the packet 1, which is great for us as an attacker because we can profile it well. Well, let's look at the real life example. So this is the cache activity when the server receives constant pings. You can see this nice staircase pattern. And you can also see that the ring buffer reuses locations as it is a circular buffer. Here it's important to know that the ring buffer doesn't hold the data content, just the descriptor to the data. So this is reused. Unfortunately, when a user types over SSH, the pattern is not as nice as this one here because then we would already have a done deal and just could work on this. Because when a user types, you will have more delays between packages. Generally, also, you don't know when the user is typing. So you have to profile all the time to get the timings right. Therefore, we needed to build a bit more of a sophisticated pipeline. So it basically is a two-stage pipeline which consists of an online tracker that is just looking at a bunch of cache lines that he's observing all the time. And when he sees that certain cache lines were activated, it moves that Windows forwards the next position that he believes an activation will have. The reason why is that we have a speed advantage. So we need to profile much faster than the network packets of the SSH session are arriving. And what you can see here on the left-hand side is a visual output of what the online tracker does. So it just profiles this window, which you can see in red. And if you look very closely, you can see also more lit up in the middle, which corresponds to arrive network packets. You can also see that there is plenty of noise involved. So therefore, we are not able just to directly get the packet arrival times from it. That's why we need a second stage, the offline extraction. And the offline extractor is in charge of computing the most like-list occurrence of client SSH network packet. It uses the information from the online tracker and the predictable pattern of the ring buffer to do so. And then it outputs the inter-packet arrival times for different words, as shown here on the right. Great. So now we're again at the point where we have just packet arrival times, but no words, which we need for breaking the confidentiality of your private SSH session. So as I told you before, users or generally humans have distinctive typing patterns. And with that, we were able to launch a statistical attack. More closely, we just do a machine learning of mapping between user typing behavior and actual words so that in the end, we can output the two words that you were typing in your SSH session. So we used 20 subjects that were typing free and transcribed text, which resulted in a total of 4,500 unique words. And each represented as a point in a multidimensional space. And we used really simple machine learning techniques like the K nearest neighbor algorithm, which is basically categorizing the measurements in terms of Euclidean space, to other words. The reason why we just used a very basic machine learning algorithm is that we just wanted to prove that the signal that we were extracting from the remote cache is actually strong enough to launch such an attack. So we didn't want to improve in general these kind of mapping between users and their typing behavior. So let's look how this worked out. So firstly, on the left-hand side, we used our classifier on raw keyboard data. So it means that we just used the signal that was emitted during the typing, so when they were typing on their local keyboard, which gives us perfect and precise data timing. And we can see that this is already quite challenging to mount, so we have an accuracy of roughly 35%. But looking at the top 10 accuracy, which is basically the attacker can guess 10 words. And if the correct word was amongst these 10 words, then that's considered to be accurate. And with the top 10 guesses, we have an accuracy on 58%. That's just on the raw keyboard data. And then we use the same data and also the same classifier on the remote signal. And of course, this is less precise because we have noise factors. And we could even add or miss out on keystrokes. And the accuracy is roughly 11% less. And the top 10 accuracy is roughly 60%. So as we use the very basic machine learning algorithm, many subjects and a relatively large word corpus, we believe that we can showcase that the signal is strong enough to launch such attacks. So of course, now we want to see this whole thing working. As I'm a bit nervous here on stage, I'm not going to do a live demo because it would involve me doing some typing, which probably would confuse myself and, of course, also the machine learning model. Therefore, I brought a video with me. So here on the right-hand side, you see the victim. So we'll shortly begin with doing an SSH session. And then on the left-hand side, you see the attacker. So mainly on the bottom, you see this online tracker. And on top, you see the extractor, and hopefully the predicted words. So now the victim starts this SSH session to the server called father. And the attacker, which is on the machine son, launches now this attack. So you saw we profiled the ring buffer location, and now the victim starts to type. And as this pipeline takes a bit to process these words and to predict the right thing, you will shortly see slowly the words popping up in the correct, hopefully the correct order. And as you can see, we can correctly guess the right words over the network by just sending network package to the same server and with that, getting out the crucial information of when such SSH packets were arrived. So now you might ask yourself, how do you mitigate against these things? Well, luckily, it's just server-grade processors, so no clients and so on. But then from our viewpoint, the only true mitigation at the moment is to either disable the DDIO or don't use RDMA. Both comes quite with the performance impact. So DDIO, you will talk roughly about 10% to 18% less performance, depending, of course, on your application. And if you decide just to don't use RDMA, you probably rewrite your whole application. So Intel on their publication on disclosure day sounded a bit different, therefore. But read it for yourself. I mean, the meaning untrusted network can, I guess, be quite debatable. And yeah, but it's what it is. So I'm very proud that we got accepted at Security and Privacy 2020. Also, Intel acknowledged our findings. Public disclosure was in September. And we also got the bug bounty payment. Increased performance has forced Intel to place the last-level cache on the fast IO path in its processors. And by this, it exposed even more shared micro-architectural components, which we know by now have a direct security impact. Our research is the first DDIO side channel vulnerability. But we still believe that we just scratched the surface with it. Remember, there's more PCIe devices attached to them. So there could be storage devices. So you could profile cache activity of storage devices and so on. There's even such things as GPU Direct, which gives you access to the GPU's cache. But that's a whole other story. So yeah, I think there is much more to discover on that side. And stay tuned with that. All is left to say is a massive thank you to you and, of course, to all the volunteers here at the conference. Thank you. Thank you, Michael. We have time for questions. So you can line up behind the microphones. And I can see someone at microphone 7. So thank you for your talk. I had a question about when I'm working on a remote machine using SSH, I'm usually not typing nice words, like you've shown. Usually it's weird bash things like dollar signs and dashes and I don't know. Have you looked at that into that as well? Well, I think so I mean, of course, what we wanted to showcase is that we can leak passwords, right, if you would do sudo or whatsoever. The thing with passwords is that it's kind of its own dynamic. So you type passwords differently that you type normal keywords. And then it gets a bit difficult because when you want to do a large study of how users would type passwords, you either ask them for their real password, which is not so ethical anymore, or you train them different passwords. And that's also difficult because they might adapt different style of how they type these passwords than if it would the real password. And of course, the same would go for command line in general. And we just didn't have the word corpus for it to launch such an attack. Thank you. Microphone 1. Hi. Thanks for your talk. I would like to ask the original SSH timing paper It's like 2001 or something like that? Yeah, exactly, exactly. And do you have some idea why there are no circumstances on the side of SSH clients to add some padding or some random delays or something like that? Do you have some idea why there is nothing happening there? Is it some technical reason or what's the deal? Well, so we also were afraid that between 2001 and nowadays that they added some kind of a delay or batching or whatsoever. I'm not sure if it's just a trade-off between the interactiveness of your SSH session or if there is a true reason behind it. But what I do know is that it's oftentimes quite difficult to add these artificial packets in between because if it's not random at all, you could even filter out additional packets that just gets inserted by the SSH. But other than that, I'm not familiar with anything why they didn't adapt or why this wasn't on the radar. Thank you. Microphone 4. How much do you rely on the skill of the typer? So I think of a user that has to search each letter on the keyboard or someone that is distracted while typing. So not having a real pattern behind typing. Are we actually absolutely relying that the pattern is reducible? As I said, we're just using this very simple machine learning algorithm that just looks at the Euclidean distance of previous words that you were typing and the new word or the new arrival times that we were observing. And so if that is completely different, then the accuracy would drop. Thank you. Microphone 8. As a follow-up to what was said before, wouldn't this make it a targeted attack since you would need to train the machine learning algorithm exactly for the person that you want to extract the data from? So yeah, so our goal of the research was not like to do next level, let's say, machine learning type of recognition on your typing behavior. So we actually used the information on which user was typing, so to profile that correctly. But still, I think you could maybe generalize. So there is other research showing that you can categorize users in different type of typers. And if I remember correctly, they came up that you can categorize each person into like seven different typing, let's say, categories. And I also know that some kind of online trackers are using your typing behavior to re-identify you, so just to serve you personalized ads and so on. But still, I mean, we didn't want to go into that depth of improving the state of this whole thing. Thank you. And we'll take a question from the internet next. Did you ever try this with a high latency network, like the internet? So of course, we rely on, let's say, a constant latency because otherwise it would basically screw up our timing attack. And so as we are talking with RDMA, which is usually in data centers, we also tested it in data center kind of topologies. It will make it, I guess, quite hard, which means that you would have to do a lot of repetition, which is actually bad because you cannot tell the users, please retype what you just did because I have to profile it again, right? So yeah, the answer is no. Thank you. Mike, one, please. If the victim paced something into the SSH session, would you be able to carry out the attack successfully? No, this is so if you paste stuff, this is just sent out as a batch when you enter. OK, thanks. Thank you. The angels tell me there is a person behind Mike's six whom I am completely unable to see because of all the lights. So as far as I understood, the attacker can only see that some package arrived on their neck. So if there is a second SSH session running simultaneously on the machine under attack, would it it's already interfere with this attack? Yeah, absolutely. So even distinguishing SSH packets from normal network packages is challenging. So we use kind of a heuristic here because the thing with SSH is that it always sends two packets right after. So not only one, just two. But I omitted this part because of simplicity of this talk. But we also rely on these kind of heuristics to even filter out SSH packets. And if you would have a second SSH session, I can imagine that this would completely, so we cannot distinguish which SSH session they want. Thank you. Mike, seven again. You always said there was you were using two connectors, or like, how do you call it, NICs? Yes, exactly. Is it has to be two different ones, or can it be the same, or how does it work? So in our setting, we used one NIC that has the capability of doing RDMA. So in our case, this was fabric, so in Finneband. And the other was just like a normal ethernet connection. But could it be the same, or could be both over in Finneband, for example? Yes, I mean, the thing with Finneband, it doesn't use the ring buffer. So we would have to come up with a different kind of tracking ability to get this, which could even get a bit more complicated because it does this kernel bypass. But if there is a predictable pattern, we could potentially also do this. OK. Thank you. Thank you. Mike Wan. Yeah, hello again. I would like to ask, I know it was not the main focus of your study, but do you have some estimation how practical this can be, this timing attack? Like, if you do real world simulation, not the prepared one, how big problem it can really be? What do you think, what's the state of the art in this field, or how do you feel the risk? You're just referring to the typing attack, right? Timing attacks. S-H-T-M-Ega, but not necessarily the cache version. So the original research that was conducted is out there since 2001. And since then, many researchers show that it's possible to launch such typing attacks over different scenarios. For example, JavaScript is another one. And it's always a bit difficult to judge, because most of the researcher are using different data sets, so it's different to compare. But I think in general, I mean, we have used quite a large work corpus, and it still worked, not super precisely, but it still worked. So yeah, I do believe it's possible, but to even make it to a real world attack where an attacker wants to have high accuracy, he probably would need a lot of data and even more sophisticated techniques, which they are. So there are a couple other of machine learning techniques that you could use, which have their pros and cons. Thank you. Ladies and gentlemen, the man who named an attack, Netcat. Michael Kurst, give him a round of applause, please. Thank you so much.