 Hello everybody, I'm Daniele Buono and this is a seminar on the work we did to improve the security of QEMU by using control flow integrity. Now, I was the main developer of this project. I also got a lot of help and support from my team at IBM, so I would like to thank them and also thank the QEMU community for the big help in upstreaming this work, especially Paulo Bonzini. So, I will start with a bit of action now on why we want control flow integrity in QEMU, then I will explain how it can be implemented in general and what we need to do in QEMU. Finally, we discussed the status of the patch you submitted, what we accomplished in this feature and talk a bit about future work in this area. So, the main problem that we have is that QEMU, like any other program, is subject to bugs. This is a table that shows the vulnerability trend over the years. As you can see, we keep finding bugs in QEMU although at a slower pace than in the past. However, we as cloud providers, we are mostly interested in integrity attacks, because with those, an attacker can gain access to the provider infrastructure or even worse, to data in the end of other clients. Now, the main two ways an attacker can use to cause an integrity attack is by using drop gadgets or indirect function called hijacking. For drop gadgets, the idea is that you can use a buffer overflow on the stack to break a new chain of return pointers. By chaining small pieces of code ending in a return, an attacker can force QEMU to do practically anything it wants. The second type of attack can also be used with a heap buffer overflow and the idea here is to find a data structure with a function pointer and change this pointer to whatever the attacker wants. Now, this attack is very effective in C++ because of virtual functions, but also in SQL that rely heavily on callbacks. And generally, what an attacker does in this case is to call a function in libc that translates in a system call, like a map to allocate new memory or and protect to change permissions on memory pages or event system to just execute another program. Okay, so we know that QEMU has vulnerabilities, but how effective are they in causing an integrity attack? Usually, we rate a vulnerability using the CVSS score. And now if you look QEMU data from 2017, 41% of vulnerabilities are rated medium or higher, but only 4% is rated higher critical. So one may think that only a small number of vulnerabilities can do a lot of damage. However, the CVSS score may be misleading sometimes. Let's take for example the CV over here. It's rated at 6.5 and it's supposed to cause partial integrity attacks. Well, it turns out that there is an attack demonstration that shows out by ejecting the QEMU timer system. This vulnerability could be used to execute any program on the host. And there is a similar attack that has been demonstrated with another CV that only has a score of 4.6. And then in some cases, multiple vulnerabilities can be stacked to increase the damage. Here, for example, we have a vulnerability with a score of 5 that can only be used to disclose information. However, when paired with another CV with a score of 4.6, we can create an integrity attack. For example, an attack could be constructed where we use and protect to mark a page of the guest as executable on the host and then be able to run our own code on the host. There is an example on a FRACT magazine that shows how we can actually execute a shell on the host that can be controlled by the guest. So if you ask us, we would say that just to be safe, any buffer overflow, no matter what the CVSS score is, should be considered as possible integrity attack that can compromise the host. Now, stacked buffer overflows can also be used to create a rope attack, but statistically only 5% of the CVS in QEMU are stacked based overflows. On the other hand, we have up to 31% of them that are buffer overflows, and therefore, we think that CVIs should be able to stop integrity attacks for about one third of the total vulnerabilities on QEMU. So it's clear that QEMU, as any other program, is subject to bugs, and some of these could be used to cause an integrity attack. So how can we make sure that these bugs do not affect our infrastructure? Well, the most obvious option is to remove bugs once they're found. It is, however, as the problem that we can only fix known bugs, and also updating QEMU generally requires stopping the run in the end, which hurts availability for a cloud provider. Another longer-term solution that is currently advocated by some is to completely avoid bugs by using safe languages like Rust. These, however, would only affect some type of bugs, and this is impractical for a project like QEMU that has more than 2.5 million lines of codes. So the only way it would be totally right, which is being pursued by other projects like Cloud Advisor. A third option, which is what we are proposing here with controversial integrity, consists in reducing the effectiveness or the damage of bugs. This has actually been done in the past for a long time with techniques like SecComp, C Linux, C Groups, etc. The idea here is to encapsulate a process and only allow operations that the process may do in normal cases. And the problem here, however, is that QEMU has several different behaviors, so these filters end up being too loose in most cases. Here we think that control-flowing integrity, or CFI, has better chances to stop unwanted behaviors and actually act earlier than other techniques. In our opinion, Google is the golden standard for CFI and other security features when we talk about CNC++ code, and this is because of their large production projects, Chromium and Android. The approach they decided to follow is to automate security features by adding them to Clang and LLBM. So now, thanks to Google, Clang has backward and forwarded CFI, hardened memory allocators, and even strong undefined behavior checks. And since these are in the compiler, it could be used for other projects like QEMU with limited work. But before we go into the implementation details, let's see what type of protections we could use. Let's start with the stack. The common way to protect the stack is by using a shadow stack. The idea here is to have a second stack where we store only copies of the return pointers. And then, before executing a rat instruction, we make sure that both stacks have the same return address. Now, this can be easily implemented both in software and hardware. However, software versions, especially on X86, are not very safe, because there is a rest condition that would allow a second thread to invalidate the return check. On top of that, there are also performance degradations because of the added operations that need to be performed of every function calls. A hardware version of shadow stack could solve most of these issues. For example, the Intel CT avoids the rest condition by introducing a special call and rat instructions that work atomically on the shadow stack. So given that only special instructions access the shadow stack, we could also make it read only for normal load and store operations. An interesting alternative to the shadow stack is safe stack. The basic idea of a RAP attack is to use a buffer overflow to overwrite a return pointer. Safe stack protects the return pointer by storing them on a safe stack that cannot be accessed by buffer overflows. And this is done by moving all the structures that can actually cause a buffer overflow in a second stack that we call unsafe. And keep return pointers and fixed length variables on the normal stack that we can now call safe. Now in this case, even if there is a buffer overflow the return pointers are not accessible because they are in a different memory area. So I would say that safe stack is safer than software-based shadow stacks because it doesn't have the issue of rest conditions. It also had very limited performance effects because it basically doesn't add instructions. We only need an additional register to store the pointer to the unsafe stack. It does, however, change the position of variables in the stack. So if you have a highly optimized stack it could cause some performance degradation. So overall, I would say that safe stack is more comparable to hardware-based shadow stack both in terms of safety and performance. It should be said that a hardware-based shadow stack could protect the shadow stack from normal store operations while safe stack doesn't. But a change in the safe stack could only be possible with arbitrary writes which also open to other types of attacks. So drop attacks is not really a problem. The main concern there. In terms of performance, hardware shadow stacks still has more complex instructions in place of call and write. So we think that in some cases safe stack could actually outperform the hardware-based activity. and could outperform the hardware-based implementation of shadow stacks. Okay, so let's talk about adding safe stack to QEMO. In theory, we should just add a flag and we'd be done. Unfortunately, QEMO makes heavy user coroutines. And now, since every coroutine must have its own Nintendo stack we have a problem here. We can't just create a new safe stack. We also need to create a second unsafe stack for each coroutine. And we also have to make sure that we update both when we do a coroutine switch. Now, coroutines in QEMO is either view context or signal stack to create the new stacks and both of them are not supported by safe stack. So we had to update the implementation in QEMO to manually allocate and update the pointers to the second unsafe stack when a new coroutine is created. However, after that QEMO uses a long jump which are already supported by safe stack to switch between coroutines. So this is all for stack protection. Let's focus now on how to protect function pointers. Here, the common approach is to use forward add control flowing tag. The idea, simplistically, is to check that the function pointed is a correct or allowed function. The main problem is how to define allowed functions. In theory, we could use a call exact match. However, this is not feasible, so we have to use approximations. And the two most common approximations are to either consider every function in the binary as allowed and this is what Intel CT or Microsoft CFG do or to consider as allowed only the functions with the correct signature. This is what Clang is using in their I call protection and what Microsoft will be using with XFG in the future. Now, how can you do that? For signature base checks, Clang uses a order jump table. The idea here is to create a jump table for every function in the binary and order the table by function tie. Now, a function pointer would be allowed only if it points to an address in the correct interval. Another option used in some cases by Clang and by XFG is to compute the hash of the signature and store it at the beginning of each function. So before performing the jump, we need signature hash of the function and we make sure it's the correct one. Now, if you want to only make sure that we're pointing to a function any function, you can add a special instruction or some data at the beginning of each function. And this is what Intel CT does. Now, in terms of protection, obviously a signature checking is much more powerful and allows you to explain in more detail how Clang I call works. If you look at the example on the right, we have the real functions f12f6 in the bottom half of the memory. The upper half stores the order jump table. As you can see, the order is different because it's based on a signature. Addresses 0 to 3 store functions that take an integer and a character pointer as an argument while addresses 4 and 5 store functions. The key here is that the compiler will replace any function pointer with a pointer to the corresponding function in the jump table. Then, before executing the call, you will check that the address of the pointer is in the correct address window of the jump table. If that happens, this is considered a valid jump. Otherwise, the program is terminated with an exception. Now, before we try and apply the callback function, we have to use the callback function that takes a star and the callback type is a charge style or a void style. Now, the general answer is that yes, QMOOT does respect function signatures, but sometimes it is a bit permissive with pointers that are used. Specifically, there are cases where a callback type will be a different pointer type compared to the callback function itself. So, for example, you may have a callback function. Luckily, Clang has an option to do pointer generalization, and this way we can consider all pointers equal. There are still a few sensitive points where a function won't have a corresponding enter in the jump table. This happens mostly in two cases. One is when the function has been generated with just in-time compilation because it did not exist at compile time. Or if we're using a callback function, it does not add to these calls and therefore does not add them to the jump tables. In particular, for QMOOT we have issues in the following things. TCG, when we call a translation block, in TCI when we interpret instructions, all the callbacks for plugins, modules in general, and the signal end railing QMOOT. Now, for all these cases, except for modules, we can disable CFI checks on the QMOOT. We cannot do this with modules because in these cases, the changes and the disabled areas would be too extensive and pervasive in QMOOT. This would actually become an issue in terms of security because we are disabling checks in a lot of function calls. For now, we just decided to not support modules. We actually also have an interesting bonus side effect from this work because to combine it, it langues to use link time optimizations. This actually required us to test and then support in QMOOT for LTO. Now, to be fair, most of the work was already done by Mason, but still we now have this as another feature of QMOOT. Now, let's talk for a minute about the stages of the patches. So, both save stack and CFI I call are supported upstream. The main case is for GitLab. But unfortunately, some of these have to be executed only manually because of the significant burden of LTO in terms of compilation and memory. So, basically what happens is that the shared GitLab runners are not powerful enough to allow wide testing of LTO in the daily CI CD. So, both features can be used today already. There are only two main caveats. Basically, we need to use Clang to compile QMOOT and this is a bit of a problem because most distributions are still using GCC for it. And the other big problem is that we cannot use modules and CFI at the same time. Now, this is actually becoming a pain point for the future because some distributions like REL are moving towards a modular build. So, they would have to decide if they want to support modular builds in the short term. They're going to do modular builds only. So, now that the patches are included upstream, we can ask ourselves, what did we accomplish? Well, if you look at the CBs, a total of 35% of them are buffer overflows. So, we can say that by using control flow integrity, we are mitigating about 1 third of the vulnerabilities and making integrity attacks much more rare. On top of that, we can also support CFI. Now, for the future, the main thing left to do is to include support for shared libraries and this is mostly important for modules. The problem I said before is that the jump tables are completed on the binary, so they have no information on external libraries, even if the libraries itself were compiled with CFI. Now, client does have a solution to, if your local CFI check fails, instead of just terminating the probe like we do now, we check if the address was in a shared library and if it is, we perform additional checks. Now, in this case, if the external library was instrumented with CFI, there is a method to check the signature against the jump table of the external library. However, if the external library was not instrumented, the code would not be protected at all to make this protection effective. We would need to instrument the system-wide BPC and we don't want to do that yet. So, we probably suggest to change to LLDM so that we can fail CFI if the external library was non-instrumented. Another big issue is that cross the SOCFI does not support pointer generalization. So, if we want to make this work with QEMU, we would have to change QEMU and other places where pointer generalization is required. Finally, cross the SOCFI is still experimental and does not work very well with DL Open. And the only way to effectively fix this is to change the BPC or the DL to properly support cross the SOCFI. So, there is a lot of work required here, but we still think that it's possible to do. This is the presentation. Thank you for following.