 Hello, everyone. I'm Vitaly. I work for Red Hat and I'm going to talk about CPU vulnerabilities and public clouds today. So I usually work on Linux kernel, enabling Linux on like third-party hypervisors and public clouds, and I'm also a KVM reviewer in my spare time, one of the KVM reviewers. So as you know, there are a couple vulnerabilities discovered in the last like two years, and you've probably heard about all of them. They have something in common. They are like speculative execution like side-channel attacks, which allow an attacker to gain some information by analyzing the micro-architectural, the lower state of the CPU, and this is something new. The question is like, if you're running your own architecture, you should be aware of them, you should go read and see how you need to mitigate. There is no way around that. But what if you are running your workload on a public cloud, right? When there is a new vulnerability, you probably open up some website on day one, and you read, oh, there is a new scary vulnerability, but then there is a link to my Cloud Providers blog post, which tells me we took care of everything, we knew about it for last like three months, we updated our fleet, it's all good. So is this so or is there something you still need to be aware of? The answer depends. So let's take a look at the possible attacks scenarios. So one VM, like when we are talking about public cloud, they're running VMs, they call them instances, they call them some other types. But the question is like, start through four types of attacks like major, like one VM can attack another VM, the VM can attack hypervisor, and there are two other types like, you can do mount attacks inside the VM, right? So some user space application can attack kernel or some user space application can attack another user space application. So your Cloud Provider will most likely take care of the first two because they're not crazy, right? They can get sued for that. But what about the other two? And answer it again, it depends but because they think or the Cloud Provider think of that it's a little bit outside of their domain of responsibility, right? They don't guarantee you anything because there are multiple types of attacks which can be mounted inside the VM. But they still sometimes need to provide you some tools. So you can actually mitigate some of these vulnerabilities in your VMs. So the question is like, when do you need these tools? And it very much depends on what's happening inside your VM. And you have to answer the questions like, if it's like a single tenant or a multi-tenant VM. If it's like a single-task VM, which basically runs one application, or you have like multiple applications running there. Are you relying on like language-based securities or is there like a jittered environment there? And is actually hyper-threading important for you? So I will go through the vulnerabilities. I won't go into detail because there are too many of them and I don't have that much time, but I will give you like a brief overview of them and if you need some hardware features, do they need to get passed through to your VM or not? So let's start with Spectre V1 because it's V1. And the attack is basically a buffer overflow without buffer overflow. If you have a user control offset, you can mount this attack when switching between different contexts. If you want to learn more, you can go for example to Intel website where there's a deep dive about it. And the way how we mitigate it is pure in software. So the cloud provider will likely fix the hypervisor and there is no like microcode or any other update required. So there is nothing which needs to be passed through to your VM, but you still need to have the mitigation because they happen on a case by case basis, right? So whenever there is a gadget in your kernel, it needs to get fixed. So you just need to keep your kernel updated basically and you're gonna be fine with this particular attack. There is a modification of it, which is called like swap.js, but it's not the same but the same in a way that we get outside of the allowed range when we access data. And again, it's mitigated in software for the existing CPUs which are vulnerable and there's gonna be new CPUs which won't have the vulnerability and you will have see this flag in your VM which will make it a little bit faster, but it's all mitigated in software. Just keep your kernel updated. Spectre V2 is an interesting one actually. It's a vulnerability when we use a branch predictor and we train it in one context and then it's been like misused or it mis-predicts targets in another context. And to mitigate this particular vulnerability, you actually need both software and hardware features. And hardware features I won't go through all of them but basically they either stop the speculation between different contexts, like different privilege levels or they provide a barrier. And in software, we do a technique called red-polline invented by Google and it's a clever way to prevent speculation purely in software. So thinking about attacking like between VMs and VMs to hypervisors for attacks between VMs, it's not possible if you are not sharing cores and actually for almost all bigger instance types of on cloud providers, there is no core sharing there. You get cores to your like exclusive usage. You can still attack the hypervisor but your cloud provider will likely take care of it by utilizing one of these techniques either like rebuild its hypervisor as red-polline or using this hardware features. But what about the in VM attacks? Yeah, I'm also showing how you can check what's going on inside your VM. There is an interface in CIS which Colonel will tell you what's going on for all of them. So for enhanced to bearers is something which will be in I think Cascade Lake and then it will be to certain extent mitigated in hardware but to certain extent mean that there is gonna be no speculation between different privilege levels. On the same privilege levels, for example, one user space attacking another one, you still need a bearer and for that your cloud provider needs to provide you this hardware feature. If it's not provided to you, it's nothing you can do about it. Only I mean, if there is user space which you can like rebuild with red-polline or something then you can mitigate. If you have some legacy workload, there is basically nothing you can do there and also as it's like cross hyper thread so the branch predictor can be shared across cores. It may happen that one thing happening on one thread is attacking another thing happening on another thread and for that there is no good protection. You may actually want to go to disabling SMT is it can actually be faster than some of the Intel CPU features which are being designed to protect you against these vulnerabilities. Going forward, meltdown. Meltdown is an Intel specific vulnerability which is about page tables when your page tables are shared between your user space and kernel and we are speculatively reading some memory which doesn't belong to user space. And we currently, I mean, again, there is no hardware support required for mitigation to mitigate between VMs and VMs and hyperwisers. There is nothing needed because usually they don't share page tables. It was only then PV which was still using very, very old AWS instances when they had to play some trick and they actually put then PV in an HVM container to mitigate this initially. Inside VM you also need an updated kernel but it's already like two years old. So everything after 2018 should be protected. The technique is called page table isolation. So you have different page tables between user space and kernel. And eventually when this is fixed in hardware you won't need it, it will run slightly faster. Speculative store bypass. An interesting one, basically to mount the attack you are writing to memory and reading from the same address from memory and in some cases your read can happen actually before your write finishes. And usually it doesn't really matter because you have control over this memory but in some contexts it actually does. So you are reading the stale value. And to actually mitigate this vulnerability you need a hardware feature called Speculative store bypass disable or word on AMD there is a different one and again it doesn't seem that we are able to use this vulnerability to attack like other VMs or the hypervisor. And it seems that this environment which is at risk is the jitter environment where we can actually write the speculative gadget we need because it provides us an easy way to do them. And by default if you have this CPU feature and your VM or like hardware after a microcode update it's only enabled for PRCTL and second processes because it again slows you down, right? And mostly you need it if you are running an untrusted jitted code in your VM like Java VS. L1 terminal fault again related to page tables that when there is a page table which is like not present for example or reserved then usually you don't have access there but speculatively your CPU can actually access to some memory location which is under the attackers control. And there is a feature again which comes to you with microcode update but basically you only need it if you are running your own hypervisor on bare metal because even if you are running like a nested hypervisor on public clouds and not many public clouds allow you to do that but for example, Azure does. You don't really need it because they most likely take care of this when switching between like VM and hypervisor on the hypervisor to use this feature. There is an interesting implication of this one that L1 cache is actually shared again on it's actually the one L1 cache for the whole core. So all hyper threads are using it and the question is like what for example happens when one of your hyper thread exists to the hypervisor and the other still has like user space task running, right? So there were some techniques suggested like kick it out of the execution to the hypervisor to block the VCPU. So we don't know for sure to which extent this techniques are being used on hypervisors in cloud providers. Hopefully they do something we cannot really check but again to mitigate against the attack inside the VM we use software based technique which is called PT inversion. We basically write some others which cannot be normally used to access any data to PTs which are not present or reserved. And again this L1 there is a parameter to the kernel which you can switch the mitigation but you don't really need it unless you're running like bare metal hypervisor on bare metal. The PT inversion, the software based technique which comes to you with an updated kernel is like good enough. MDS, interesting one. There are some smaller structures in the CPU which you cannot access with any instructions but after doing some execution in the CPU there can be some leftovers in the structures and with some techniques there are like many of them and new types of attacks are being discovered in this space you can actually get some like bits from there. It's not that you're getting like access to some full user pages or anything because these structures are really, really, really small but you can still get some bits for example and if there is like some interesting like security related computation that's being done on another hyper thread it's kind of risky. And again, UCPUs are supposed to get fixed and you can attack another core or another VM and but you can attack the hypervisor when an exit is being done there. So your cloud provider most likely will take care of it by updating the hypercode and using and clearing all these buffers when it switches to your VM and back. Again, the same thing as with a one terminal fault is that when one hyper thread exits to the hypervisor what happens on the other one because they are sharing the structure if hyper threads are not real cores. And again, like core scheduling was suggested that only one VM belonging to only one tenant has been scheduled on the core at the same time and if it exits to for example interrupt then we may want to block another core. Cloud providers most likely do something about this. So the question is what do you do in in software on your VM? And I'm actually grouping these like transaction as the TSX I think a word here because basically it's another way to mount the attack but you're attacking the same structures and it was mitigated in a weird way by Intel at least that an existing instruction was repurposed to clean the buffers. So there is a feature flag called MD clear. You can see it in your like CPU flags if it's present there or not. It will tell you when it's present you know that buffers are being cleaned when the instructions executed and your kernel actually does that. But if it's not present you cannot really tell for sure because it may be the case that your cloud provider actually updated microcode but just didn't expose this feature to you. And you can still use the mitigation by issuing the instruction on the CPU. And that's what actually what Linux does. It still tries. It tells you like, well, the state is unknown. Yeah, but I will still try to issue the instruction just in case. It prevents like user space against like kernel attacks but if again if you're having two user space tasks of like different tenants running on the same core different hyper threads at the same time you're still vulnerable because there is no good place to put the flush, right? So you cannot flush after every instruction. You can only flush for example when you enter kernel and leave kernel but if they're running simultaneously it can still gain some data from another thread. And in that case, if you are really worried about such types of attacks you can either like isolate your cores and put tasks like manually pin your tasks to different physical cores. Assuming that you actually trust your cloud provider to give you genuine SMT topology which they actually do because it's like in their interest to expose you know, trustworthy topology to you or you can just disable SMT completely and it depends, right? To which extent you care about such attacks. Just for the sake of completeness there was some other vulnerability discovered in the fall of 19, yeah, like several months ago and called like ITLB multi-hit and it's not a speculative execution attack. It's actually a way to mount a DOS attack on your physical CPU when you create two different page size structures and don't flush your TLB buffer in between. Your CPU may actually encounter an error just stop like physical CPU but for sure your cloud provider took care of this one. You can actually try, right? Execute, you know, the exploit and your cloud provider either took care of it or he won't let you do anything ever again. But up to you. So, no, it's been mitigated in software only if you're interested how it's been done on the hypervisor. So we are either forbidding different pages, page sizes or like mounting just mapping them as like non executable pages. So whenever an execution happens we actually like split them. That's what we for example do in KVM. So now I just wanted to show you some examples of some like existent instance types on different cloud providers. And let's start with AWS. So here is like R5N large instance and if you go to like this interface and see what's happening you will see something like that. How do you read it? So the first one is kind of irrelevant because there is no VMIX exposed to you. So you don't really care about ITLB, this different page sizes. The hypervisor takes care of it and it's just irrelevant to you. L1TF as I told you has been mitigated in software. We do PTA inversion. So you can see it here. MDS, the feature wasn't exposed to us. We don't see MD clear. So we still try to do the buffer flushing but we're not sure. Most likely it is mitigated. Meltdown, again software based mitigation, paste table isolation, we're likely secure. Speculative store bypass. We don't get the feature, didn't get the feature so it's nothing we can do. So in case we are running this untrusted jitted code then we may be in trouble. Again, you can try to mitigate by putting it like on different cores because it's an attack against level one cache. Spectra with one against software based mitigation in the kernel, we are good. Spectra with two, not really. We didn't get this STA and IBPB features. So if we care about one user space attacking another user space, you cannot attack your kernel because it's user at Palins but you can attack user space which doesn't user at Palins from another user space. Then again, what you can do is you can pin them to different cores or you can just avoid running this like multi-tenant unsecure workloads in one instance. There is no hardware features provided to you. Azure, F8S V2, I was just picking some random instances. These two are I think Skylakes. So no particular thought was put into picking one of another instance type and I cannot show you all of them. There are way too many. But it's very, very close to what I just showed you were AWS. The only difference is that they actually expose VMIX capabilities to you. So you can actually run nested hypervisor there. Though you don't need all these mitigations which you can still use but we have no smart logic for example in kernel to not use them when we are running nested. And again, the same story, those which are mitigated and software are mitigated because I'm running a recent enough kernel. It was 5.5 RC something kernel like couple of weeks ago. And for MDS and TAA we don't know the flag wasn't exposed to us. We still try with this instruction for Spectre V2 it's the same story that no hardware features. So we cannot share different user space tasks which don't trust each other on the same core. The last one I wanna show you is from Google Cloud Engine and it's a cascade lake. So or like in your CPU and there are some noticeable differences here. For example, for SSBD and Spectre V2 features were exposed to us. So our kernel actually is using them. And this actually is the reason why the cloud providers may not want to expose these features because as soon as they do your kernel will automatically start using them. Oh, I have features for hardware mitigation. I need to use them. And this killed the performance. So there are some legacy performance which and it's not like a couple of percent. It can be like, you know, like 50% or 60% performance it depending on what you do. But for example, like STIBP which was like mentioned on here which is like disabled. It's not the great way how these features are presented here I was actually sending an RFC to print STIBP like not available that you don't have support in hardware because here it says it's disabled. You don't know if it's disabled by the kernel by you or there is no hardware support for that. So here it's like a newer CPU. So now it palines needed. So you can see that enhanced IBRS has been used instead. And but they actually passed through all these features. They don't probably care that much about the performance of legacy workloads but they do care more about giving you like all those features to do mitigations. And that's it for me. Yeah, we don't really have time for questions as I predicted by already answered some before my talk. So if you have any questions just catch me somewhere in the corridor and I'll be happy to chat. So thank you.