 So welcome everybody. I'm Mustafa Saeed, a security engineer at Intel Internet of Things Group, and I'm joined today with Stefan Schultz, a security researcher at Intel Apps. Today we're presenting our topic, a journey into fuzzing and hardening edge hypervisors. So to give a bit of perspective, what's the intention of this talk? So the ACOM security team really focuses on the security validation and offensive security validation to make sure that the product comes out in the best shape. And the idea today is to walk through some of our experiments and efforts and activities that is related to negative security testing or known as fuzzing, in order to conduct a comprehensive fuzzing campaign for different security components in the ACRON software stack. So agenda for today will be as follows. So first we will start with introducing ACRON and what are edge hypervisors and highlight some of the use cases related to edge and IoT. And then we go into fuzzing for product assurance and how does fuzzing fit into the product security validation. Later we start talking about fuzzing road trip and walk through some of the tools and the techniques that we followed in order to fuzz ACRON and any components related to ACRON. And finally we'll summarize some of the lessons learned and what are the future steps for this kind of activity. So to start, hypervisors for the edge. So first of all, what is the edge? The edge we can see the edge now in almost every single industry. It can be in transportation, it can be in smart factories, things like industry 4.0, it can be in smart cities. It even touches on some of the industry that we're not used to like using, like in retail, in financial services, in hospitality and so on. And the role of virtualization in such industries is to provide a solution for different use cases in order to have operation efficiency and a way in order to manage different workloads or what we call heterogeneous workload and consolidate them in a safe way. So the question is why don't we just use the cloud? What's the problem with that? So on the edge, we face a different set of problems and those problems are mostly related to bottlenecks in the transportation bandwidth, things related to storage cost and the limited amount of data that we can move from those devices on the edge back to the cloud. Even the nature of those application running on the cloud is a bit different. So we have real-time application, closed-loop application that require high responsiveness, low latency and in sometimes local authority and time-drifted decisions. And for that, we have ACRON. So ACRON is a flexible open-source lightweight hypervisor that is intended to come and support many of the use cases related to the edge and the IoT. So from architecture perspective, ACRON is a type-on hypervisor that is running directly on top of the hardware resources. It's a bare metal hypervisor. And on top of that, we have different sets and types of VMs. We have mainly three types of VMs. We have a service VM. We have user VMs and we have what's called pre-launch VMs. So the service VM is kind of like the control VM that has the native drivers that communicate with the hardware resources. In addition, it possesses the device model or some kind of emulation back end that would assist the user VMs. As we can see here, like Windows or real-time VMs in order to do device sharing or configure some of the devices directly to those VMs. Regarding the pre-launch VM, it's as the name describes, a pre-launch VM is not controlled by the service VM, but it boots in a separate boot path. And we can run things like safety critical application, things like Zephyr and so on. The goal of ACRON is really to address the gap between data center hypervisors and how partitioning hypervisors. And it does that in two ways. First, it tries to tackle the key challenges that we face on the edge. Things like workloads with mixed criticality, real-time versus non-real-time workloads, having safety and non-safety applications and so on. Even for the real-time application, some of them has real-time requirements, some of them have hard requirements, some of them has soft real-time requirements and so on. So these applications have a very diverse nature. In addition, ACRON focuses also on functional safety and certifying the hypervisor components, things like for automotive or industrial use. That sounds good. So the second part is being very flexible and very highly configurable. So as we can see now, ACRON can work for a different set of use cases. Take for example, the automotive sector. So for in-vehicle infotainment, we would require things like hardware resource sharing. You can think of running your instrument cluster in the service VM and using other VMs to run a real-time entertainment application, things like virtual offices and so on. And here the bottleneck would be that VMs are trying to access and share different hardware resources like ODO, VDU, storage and so on and so forth. Moving to another segment, things like the industrial domain where we will have different type of VMs that needs to run. For example, we will have HMI or some kind of OSs that presents some dashboards for the user to monitor things that is happening. It can also be real-time VMs that we can dedicate some hardware resources to those VM and those VM would be responsible for controlling robotic arms or even connected to PLC devices and so on. In more advanced cases, we have things like safety critical application that can run as a pre-launch VM and those have the task to monitor the system has for any application that have high availability requirements and so on. Finally, if we do not really care about resource sharing and we just need to run a couple of VMs directly on top of ACRON, then the logical partition would allow us to do that. And in this case, we won't have a device model and we don't have a service VM in the traditional sense, but just safety VM and user VM that is partitioned in a way and segregated in a way to run seamlessly on top of ACRON. So this was an introduction about ACRON, different use cases and so on. Now when we jump into fuzzing and understanding what is fuzzing and how does this relate to ACRON. So Stefan. Thank you, Mostafa. So in the next part, let's talk about fuzzing for security validation, what does it actually mean? So as you just said, there's a lot of requirements for ACRON to meet functional safety, to meet security and various use cases. For security critical products, Intel has a so-called security development lifecycle that makes sure that all of our products adhere to the current best practices and kind of systematically designed and evaluated and implemented. In particular, you may be of course aware that you should do security architecture, you understand the usages, the assets, the goals. And then you would select the various technologies like you design the actual interfaces, you implement mechanisms like secure boot and exit control, or you decide for particular mechanisms. Then there's a big implementation phase, but then there's of course a question, has the implementation been correct? Has it been according to the detailed design? Has it been according to the architecture? Do the mechanisms that I have specified actually meet the security goals? So there's always so-called security validation phase where there's a separate team that will perform software review, static analysis, vulnerability scanning, all the dependencies of the software, such as libraries, crypto libraries, and so on, analyzed. And we have teams, for example, so-called red teams for penetration testing that basically take a fresh look at the system and see everything so far it has been designed according to best practices and standards. And of course, then at some point when the product is being ready for shipment, the cycle may repeat as new threats, new vulnerabilities turn up, new use cases, they might be a second generation of the product, so that this whole thing is a circle and will iterate over time. And on the other hand, of course, for each of these steps, we're always trying to meet the best practice and to improve our state of art. And one item in particular here is fuzzing. Fuzzing has been getting a lot of traction in academia as well as industry, it's becoming state of art for software validation, in particular security validation of software. And of course, as you will see, it can be quite difficult actually to apply fuzzing when it comes to low level software, firmware, or even things like microcode. And this talk is what this talk is basically about, looking at the different tools and trying to understand how they fit to Acorn and what we are doing in this space to improve on the state of art. So how does fuzzing, what is actually fuzzing and how does it work? Fundamentally, when you are testing a particular product or a sub system of a product, you would call it a test target on the right hand side that might be software or firmware or just a library and interface that is part of a bigger product. And in general, we can say that this piece of software will execute in some way on an execution platform that might be a Linux platform. It might also be a system on chip where firmware is executed or an emulator. And then we have test drivers which has the task of actually executing the target. Or exercising the target. So the particular API calls to actually launch them based on some input that is provided externally. Now, in the case of fuzzing, this input tends to be randomized and it's usually based on a corpus that we provide in the beginning. For example, if you're testing JPEG library, your seed corpus of inputs would basically be a list or a set of JPEG files. Or you would try to have all kinds of different JPEG types, maybe black and white, different sizes, different encodings and so on. And then you might implement certain randomization mechanisms in the fuzzer to damage these pictures, to randomize them, to recombine them and feed them to your test drivers that then exercise the target. At the end of this, if you have a box in your target, you will get crashes and these are, of course, the things we want out of the testing. We want to fix them, we want to analyze them and achieve a bug free product in the end in this way. Without any further feedback loop, just doing it like this, this is called blind fuzzing. It should be kind of intuitive that with blind fuzzing, you do not have a good chance to reach really inputs that are far away from your original input purpose. Unless you define a very good test driver that implements some of the formatting that your target expects, some of the input formats, so-called structure fuzzing or grammar-based fuzzing, mutators in the fuzzer itself. The random binary mutation of the seeds will usually be discarded and lead to invalid inputs early on and not reach very deep code in your target. So there's been an extension which made fuzzing much more popular and effective recently, which is the so-called feedback fuzzing or coverage-guided fuzzing. In this case, the platform actually observes what is going on or maybe the software target has been instrumented to record what it's doing, to record the coverage, for example, that is achieved by a particular input. And that coverage is returned to the fuzzer as a kind of feedback. And the fuzzer may use that to say, oh, this input has actually triggered some new interesting branch in my target. So I'm not gonna throw it away and take a new seed input and randomize it. I'm gonna keep that input because none of the seeds that currently exist that I know of actually reached this particular branch. So doing this many million times, the fuzzer will iteratively build up a corpus of inputs, randomize them, splice them against each other and find more interesting inputs that solves a lot of the typical branches and conditions and corner cases in our typical software products. And doing this, of course, we will get a much better coverage out of the overall fuzz testing. That in turn, of course, leads to more interesting corner cases that are being covered and therefore also potential crashes that might turn up. And that's, of course, what we want, right? We want to find these crashes and bugs before the software is being ready for shipping and for the deployed in products. Finally, for those of you who have already heard about fuzzing, there's, of course, a lot of other aspects which are not so much in the focus of this talk. Once you have this system in place, there is, of course, a lot of additional aspects such as the overall automation and integration of your fuzzing campaign, the triage of the crashes that you find, you need to sort them out, you need to find out, you need to diagnose what is going on. And if you imagine if you get thousands of crashes out of such a campaign, you actually need to sort, you need to start automating this process and sorting out which crashes are relevant, which are the same crashes and who should fix them. So there's a lot of work in all of these areas and many of them are also subject of extensive research in academia right now. But let's get back to fuzzing for software validation. If we are, basically, if we are looking for product assurance, we have slightly different goals than maybe you have heard from fuzzing that is used elsewhere in the community. Our goal is not to run a fuzzer once, find the bug and report it or get the bug bounty for it. But our goal is actually in increasing the software validation, increasing the assurance that you achieve when deploying ACON as a security product in IoT. So what counts in this case is actually much more the overall usability of the tool, the long-term return of investment. So given that you have this setup, how easy is it to automate it? How easy is it to integrate with existing validation backends? How good is the overall performance? Can it be easy to debug, to automate? And in particular, the developers or the software validators who are in charge of the product, is it easy or manageable for them to actually understand the solution and extend the solution? For example, writing sort of test drivers that exercise more of the code. So these are all very important aspects for product assurance, which are not necessarily the highest priority when you do fuzzing for bug bounty or penetration testing. And with that, I give back to Mustafa, who has looked at various tools to see what is the most effective and state-of-art tools for fuzzing a hyperwriter. Thanks, Stefan. So as Stefan mentioned, there are different goals that we have when we try to, when we decide to integrate fuzzing into our ACON offensive security validation cycles and so on to make the product much more secure and robust. So the first thing we asked ourselves before even starting to pick tools and choose what to start with, the first question was, we took a deeper look into the security architecture and we tried to identify what are the most critical software components that we should look into. Basically, we have three main component as part of our TCB or the Trusted Computing Base and that is the hypervisor, which runs directly on the hardware resources. Another kernel module, which is called VHM, which is the virtual IO and hypervisor service module. And finally, the device model in user space. All these, these three components communicate with each other. So the device model is nothing but something similar to the QEMU-like application, which has two main tasks. First is to create or configure new guest VMs and start those guest VMs. And the second task is to have an emulation backend or provide some of the emulation services for those guest VMs in order to utilize and do resource sharing. For the VHM, it's a middle layer between the device model and the hypervisor that supports the device model in making communication with the hypervisor through down cones and also support the hypervisor to notify the device model with some of the requests through up calls. Of course, there are other components which is not part of our TCB. However, they are very critical to the overall system operation, things like the virtual IO front-end drivers, which is how the user VM try to achieve part of virtualization, how they actually achieve sharing of hardware resources, how they communicate to the device model backend. There are also the emulated device drivers that some of the devices are directly emulated using trap and emulate, and those are memory region in the hypervisor that are tracked by the hypervisor in a way in order to see if there are any requests or operation happening on them. Finally, we have the pass-through drivers, and the pass-through drivers are some of the devices that, for example, ACON hypervisor have the possibility to dedicate some of the PCI devices to the VMs. You can think of a real-time VM where we want to assign a device directly to this VM and be directly under the control of this VM. In this case, we assign this device to the VM and then we use the pass-through driver in order to make those VMs directly communicate with those devices. So that looks great. So even going one step deeper and trying to see from each component perspective, so from the hypervisor perspective, the hypervisor run in a more privileged mode called VMX root mode, where any software that runs on top of it, either in the service VM or in the guest VM is considered untrusted. From the VHM perspective, the hypervisor is the only trusted component, but any other software that is running in the service VM user space or even in the user VMs is considered untrusted. Finally, going to the device model where it runs in the service VM user space, it considers the VHM and the hypervisor as trusted components, but anything outside those components is untrusted. So any kind of data or code that is coming from the user VMs and so on. So that is the security architecture for ACRON. Looking to this security architecture from a different perspective, so if we focus on the interfaces and how these components communicate with each other from security perspective, we find that we have different set of interfaces. First, we have an interface between the service VM, VHM kernel module and the hypervisor, and they communicate through hypercalls. The second interface is how the user space application in specific the device model or any other user space application that might communicate with the VHM through our second interface, which is normal syscalls or traditional OS way to communicate between user space and kernel through IOCTLs and generally system calls. The third interface is the device model, as we mentioned, one path for the device model is to configure new VMs and configure new guests. For that, we have a command line interface that allow us to pass some of the configuration and the parameters that we want to pass to the device model to create those VMs. So there is a command line interface, a standard STDIO interface where device model accepts some input through the command line. From user VM perspective, so we have the virtual IO front end driver that exists in the user VMs that communicate with the device model or the virtual IO back end drivers, and that is happening through virtual IO devices. So in part of virtualization, those virtual IO devices are nothing but PCI devices that contain some of the MMIO region and have PIO, so PIO and MMIO region in order to emulate those devices. So some regions are used for emulated devices. We also use MMIO, DMA, in order to access some of these devices, which is not part of virtualized, but rather really emulated. And we have pass through. So pass through is also using PIO in order to communicate with those hardware resources. So the last three interfaces, so four, five and six, focuses more on the memory regions that is trapped and emulated or handled by the hypervisor in a way in order to be serviced either in the hypervisor or by the service VM. So those are all the security interfaces that we have. So the question now, what tools fit better to those components or what tools would work better to target those interfaces? So the first tool that we picked was the syscaller and syscaller is a coverage guided structure aware kernel fuzzer. It is very well established with tremendous Linux support. And we use syscaller in targeting two interfaces. One, which is the traditional usage model for syscaller, which is the kernel module. So here is the VHM. And another approach that we tried is also extending syscaller in a way that we can target the hypercalls. So the interface between the VHM and the hypervisor. So what is the VHM? So let's take a deeper look into the VHM and hypervisor. How did we actually target those, the code base in both components? So the VHM, as we mentioned, is a middle layer with some services task or some modules or some handlers that would allow user space device model to talk to the hypervisor. Those kind of handles do diverse tasks, things like VM management to create starts of VMs. It can also work with introps and handling introps. It can even go further to handle the IO requests and try to match it either from the hypervisor to the device model and so on. So picking the VM management. So we have API like hypercode create VM. As you see in this API, we have two parameters or payloads that is being passed as part of the hypervisor. Those two parameters is a potential candidate that can be fuzzed. What if we inject some malformed pieces of data or malformed kind of data structures into those parameters? How would the hypervisor behave? Another kind of services that is happening at runtime. So if a service VM, if a guest VM is running, this guest VM has to send kind of request to the service VM. So these requests usually require things like notifying the VHM and knowing that those certification has been handled and finished and so on. So as you see as well, it has some payloads or some parameters that is potential candidate to be fuzzed. So the goal here is first to fuzz the IOctals going from the device model down to the VHM module and then later to propagate some of the fuzz input from the VHM down to the hypervisor. So how did we do that? So syscaller, so how does syscaller actually work? So syscaller can operate in what's called remote mode or isolated mode where we can have some of the syscaller component run in a separate machine and then it start fuzzing the target machine which is totally remote or isolated. Syscaller contain different components, some of them run on the host, things like the sys manager which is responsible for starting and managing those VMs that we want to test or that contains the software that we want to test. It's also responsible for copying some of the other components things like the sysfuzzer and sysfuzzer is a process that is running on the unstable VM or the target VM and it is responsible for things like input mutation, input generation, minimization and so on. So some of the input that's generated by sysfuzzer is propagated to what's called the sys executor that takes a single input and creates some kind of coals or fuzzing programs that can be propagated to the target. In our case, we implemented a test proxy that lives inside the device model and takes those input and then propagate them to the VHM first and later to the hypervisor. So the idea here as we've seen in the last slide those APIs like for creating VM or identifying the end or the finish of IO request can be fuzzed first in the VHM layer. So we would see if any issues or any crashes or missing validation is happening in the VHM layer. And the good thing is that we, syscaller instrument the kernel or it gives a possibility to instrument the kernel. And with this, we would get coverage information from the VHM module. So we really know what kind of execution partners have been experienced and so on which would allow us to reproduce any issues that happen. Sorry. So later those kind of coverage information would be propagated back to the sys manager and it keeps a little database that knows or that manages some of the corpus that created those crashes in the VHM. The second part is what the extension that we made is to take those fuzz input and even propagate them one layer down to the hypervisor. So we take the fuzz input from this executor, try to fuzz the VHM. If something happened, we would know because we get coverage information. If not, we can propagate it to the hypervisor. So it would allow us to fuzz all the hypercalls that is happening between the VHM and the hypervisor. So in general, syscaller is a very stable tool. It has a lot of these disadvantages being well supported by the community. It's easy to use. So it provides some declarative description for those structure that we want to fuzz using a language called syslang. It allows us to do fault injection, sanitization and even automation using components like syspot and a possibility to reproduce those things and minimize those fuzz input using things like sysribro. The downside for using syscaller with fuzzing the hypervisor is that we do blind fuzzing for the hypervisor in the sense that we do not get fuzzing, so coverage information from the hypervisor. And in case we need to do that, since hypervisor is not a user space application, it cannot use the compiler libraries in order to get coverage information. We need to put some of the coverage libraries into the hypervisor source code, things like gcall and so on in order to get raw coverage information from the hypervisor. So this was syscaller. The next component is the device model command line and here we want to fuzz this user space application and libfuzzer is a very good candidate to do such tasks. So it's a coverage guided fuzzer that is working as an in-process fuzzer and we use libfuzzer in order to target the command line parameters. So can we fuzz those command line configuration and see if that would impact the device model? So how we did that? Before we talking about how we did that, so the device model from inside, as we mentioned, has three main components. First, it configures VMs or pass those parameter to configure the VMs. And the second thing is to provide those emulation back ends for legacy devices or for virtual IO devices. Starting with the VM configuration. So how we implemented libfuzzer is first you instrument the device model binary. So the device model is just a command line utility. You can pass some of the arguments to configure, for example, the graphics, some of the, for example, the graphics configuration, things like fence registers, graphics hidden memory, memory aperture and it allow you to assign some of the devices as you see in the minus S command. So we would give some of the virtual IO devices and enable them for a user VM. It also allow you to configure memory sizes for the user VM. So that's how a device model worked. We took those configuration and we inject them to the libfuzzer entry function, which is called LLVM fuzzer test one input. And this function in our implementation does nothing but taking this configuration VM file, try to transform it into a data structure and then inject one of the fuzzing input to one entry of this configuration file. Then we call the main function or the main function inside the device model, that's the organic starting point for the device model. And then those fuzzing input would be propagated to the parser or the part of the device model that is responsible to take those input and try to cut them into smaller parts and propagate them to the main function. In case we have a crash, then this function would retain, return back to the LLVM fuzzer one input function. If not, then it would continue to propagate those input to the other main functions, things like creating VMs or like loading the software or doing all the core parts of the device model to configure a VM. If this kind of operation works or fails at the end, we would know because we would get feedback from such binary or from the device model to this function. And then we would be able to mutate the input and tackle new parts of the device model code. So the first part of the function, as you see, is initializing those configuration based on fuzzing input and the second part is executing the main function based on the fuzzing input. So LVM fuzzer has a lot of advantages being very fast. It's a standard tool, it's a state of art tool. It's very easy to set up and it gives us also the possibility to enable different sanitizers, things like M-SAN, AppSAN, K-SAN and so on. One of the or some of the disadvantage that we faced is that we could not really fuzz the backend emulation part of the device model using lip fuzzer because there is a lot of validation that happens in other software layer, things like the VHM or the hypervisor before we reach the device model. And for that, configuring lip fuzzer to target the emulation back and would result in some kind of false poster, which is not the best approach. Also with lip fuzzer, we might have to do a lot of code refactoring to remove any kind of exit statement. We need to do safe returns to the single entry function, which is the LLM fuzzer test one input. So this was lip fuzzer and our experience with lip fuzzer. The last component, which targets the three remaining interface in the guest, which is the MMIO, port IO kind of interfaces, which is considered very critical because a guest VM is not trusted in the ACRON software stack. And for that, we used HyperCube or it's a research kind of OS that do fuzzing in a different way other than SysColar and lip fuzzer and all these tools. So HyperCube OS is a multidimensional fuzzer that is targeting more the low level software. It's consists of two main components. So one component that enumerates different interfaces, so enumerate the assigned PCI devices, it enumerates things like the local APIC, input output APIC and all these low level stuff. And also it contain a fuzz engine that after we enumerate those interfaces, we would be able to inject militias or melt form data into those interfaces. From integration perspective, it's very easy. So HyperCube lives as a normal guest. So it runs in any of the ACRON scenario, either in industrial or IVI, it's nothing but triggering a normal VM using the device model and it lives on top of ACRON to experience those different interfaces as we can see here. And from organization perspective, the HyperCube OS starts at the beginning by the device model. So as we start normal guests, if we want to start HyperCube, we have to define what kind of devices we want to test. And here we define some of the virtual IO devices that we think if we assign it to this guest and then run HyperCube, it would be able to fuzz them or inject bad input to those devices or the memory map region related to those devices. So after HyperCube boots, it do some initialization, things like setting the global descriptor table, the interrupt table and so on. And then it starts to register different regions. First, it start with the MMIO and Port IO region. It then goes to the PCI devices. And if we see here, those are the PCI devices that we have assigned initially by the device model. It then tries to enumerate or register some of the other regions like APEC and local APEC. It creates a fuzzer and then it uses an integrated kind of soil random number generated in order to have a seed. And then we use the seed to start fuzzing. This was our experience so far with different fuzzing tools and so on. So Stefan, maybe a bit about our future work and what's ahead. Okay. Thank you, Mustafa. So as you saw, HyperCube is quite easy to get started with and it's actually a really nice tool to test the low-level interfaces for the low-level VM interfaces with Acron. But the problem is that it is actually just a blind fuzzer and we want to have a coverage feedback and coverage guided fuzzing using something like HyperCube. And the way forward that we are looking into for this direction is basically kernel fuzzing using KFL. KFL is another research prototype also from Ruh Universität Bochum, the same people who also developed HyperCube. And the idea of KFL is actually to launch the target inside the virtual machine using QAMU and KVM, modified versions of them. And you would basically, again, you have a test driver that stimulates the target operating system inside the VM. It receives its input via shared memory from QAMU and from the actual fuzzing content. And then using Intel PT processor trace, we can get feedback for each execution that is being decoded by QAMU and returned back to the KFL fuzzer. The advantage of this is that it is nicely scalable. It uses a well-known interface, which is the virtual machine interface for executing and scaling VMs in the VM backend. And in the latest version, which is called NUX, we also have advanced teachers such as very fast snapshoting, fuzzing of multiple interfaces, as well as a notion of grammar or structured fuzzing of the VMM targets. So in this case, ACORN would run as a nested hypervisor inside KVM, and we expect to achieve this way the feedback also for the hypercube fuzzing case. So to summarize, we have seen that there's a range of targets that we need to pursue for Syscola for ACORN, sorry. They're actually quite different in nature. There's the device model in user space, the hypervisor itself, and there's a kernel module, the VHM, and they all interact in interesting ways and produce a number of security boundaries that have to have good security testing. Syscola has been very promising on the kernel side and hypercore validation side, while hypercube has proven to be very effective and scalable on the guest VM side to validate from the perspective of the guest VM. And moving forward, we are very interested to run this experiment using KXL and NUX using hypercube as a guest in a nested ACORN setup. Litfaza is very interesting to fast the command line interface and configurations that are possible with ACORN, but in general, it has been a little bit difficult to get it working for the device model, for the emulation of devices, which is in user space on the service OS. Right, so overall, I think we've seen that there's a lot to be done for improving, fuzzing, even for low-level software. There's no tools that cover all our bases, but of course, we're working on it. We like to also collaborate with academia and other projects in the space and extend them. And since this is a source project, we also plan to upstream integrate and automate all of these tools. So with that, I thank you all for listening and we're interested to hear your questions and comments. Thank you. Thanks.