 Okay, hi everyone. I am Michalis Papas. I am a senior engineer in virtualization at OpenSynergy. And today I'll present our advancements in virtualization security with the use of Unikernels and specifically show how the Cocos Hypervisor can leverage the Unikert project to create highly isolated Unikernel-based services. I'll start with a brief introduction to the Cocos Hypervisor and proceed to describe the security challenges caused by aggregating functionality on top of large monolithic guests. We'll see how this aggregation is bad for security and highlight the need for component isolation. I'll move on to show how Unikernels provide a suitable platform for building isolated components and introduce the Unikert project, which allows us to build highly specialized operating system images. We will focus on Unikert's architecture and have a look at some data that shows that the Unikernels produced by Unikert are characterized by small size, small memory footprint and high performance. We will then have a closer look on how component isolation is an integral part of Cocos Hypervisor's architecture and see some additional features that complement this approach. Finally, we will see how all these fit together with the case study on how we implement virtualization of trusted execution environments with the assistance of Unikernels. The Cocos Hypervisor or Cocos HV is a type 1 hypervisor that makes use of hardware support for virtualization provided by the ARM V8 architecture. Its main focus is secure and safety critical automotive systems and it's certified as ASLB under the ISO 26262 standard. The main purpose of the Cocos Hypervisor is to allow the consolidation of components of different criticality on the same SOC, which in practice means that you can run guests of different ASL levels on the same hardware and the hypervisor will provide guarantees for strong isolation and freedom from interference. So, strong isolation is critical for both security and safety, but in reality guests are rarely completely isolated from each other and that's because of inter-VM communication that is required in nearly every practical use case for at least some of the guests. I'll demonstrate the effects of inter-VM communication with an example. So consider a typical automotive setup with a Linux-based instrument cluster and an Android-based infotainment system along with an AutoSAR guest providing some other ECU functionality. So in an ideal world these guests would run in complete isolation from each other and in this case the TCP of each guest will consist of the hypervisor. In reality, however, safety-critical hypervisors are small and additional functionality is normally implemented by one or more privileged guests. In our example we can think of one PVM that provides some sort of visualization service to the IC and IVI guests, let's say maybe device visualization. This normally introduces a requirement for inter-VM communication between the PVM and its clients, which is usually implemented over some primitives provided by the hypervisor. Because of inter-VM communication the security of the PVM becomes critical and that's because a compromise of the PVM can impact all the guests this PVM provides services to. So now the PVM becomes part of the TCP of its clients or in other words the IC and IVI are part of the same trust domain. So you can see at the bottom diagram how the IC and IVI now have to trust the hypervisor and the PVM as well. Notice however that the AutoSAR guest is completely unaffected by all this thanks to strong isolation guarantees provided by the hypervisor. So even if the other guests are fully compromised the AutoSAR guest remains secure. So as the PVM becomes part of the TCP for some guests it must demonstrate a small attack surface. This however is not the case with many deployments today as we can see privileged guests implemented over large monolithic operating systems such as Linux that come with a big attack surface. And even worse these guests aggregate multiple services and this is clearly bad for security as it only takes one vulnerable service to gain access to the entire system and we have seen multiple cases of CVEs describing VMSK attacks by exploiting vulnerabilities in device drivers. So a system compromised now affects all services running on the same guest and consequently the superset of client VMS is now at risk. So this calls for a direction towards service desegregation and of course this problem is widely recognized. For example you can see a paper published by the ZEM project on how to improve ZEM security through desegregation. Our answer on how to achieve desegregation is by leveraging the strong isolation provided by the hypervisor to compartmentalize its service to an isolated VM. In the figure you can see how we can break down the large PBM into a set of specialized guests one of each providing a single functionality. And this gives us an improved architecture with respect to security as we preserve a small TCP by minimizing the attack surface and limit the impact of compromise to a single service. Now the question is whether that is possible. So traditionally the addition of guests is considered expensive so it's been avoided and I would even go as far as arguing that the main force behind aggregation has been performance. Now however the landscape has changed with automotive ECUs being deployed over powerful SOCs. For example RENSA Salvatore XS features 8-term cores and 4 gigabytes of memory. And newer platforms exhibit even more powerful specifications. Nevertheless we still need a suitable operating system so I put down some requirements. So we have seen that for our security requirements we need an operating system that provides small minimal images. For the performance reasons stated above we also need these images to be lightweight. We would like to have some degree of modularity so that we can tailor down the applications to our requirements whether they are minimal or feature rich. And finally we would like to be able to port existing components with minimal effort. And it turns out that Unicernals provide the highly suitable tool for our criteria. So let's see what Unicernals are. Unicernals are highly specialized operating system images running on a single other space constructed by library operating systems. So the idea is that the user gets provided with a set of libraries and selects the smallest set of them needed to implement the operating system constructs required by their application. By specialized we mean that our Unicernals execute a single application. And this is the opposite of the concept of traditional general purpose operating systems. By single other space we mean that as there is only one application there is no distinction there's no need for distinction between user space and kernel space. So our application runs entirely in kernel space and in ARM this is EL1 of course. I will introduce now Unicraft which is the Unicernal project we use in the COCOS hypervisor. Unicraft is a POSIX compliant library OS with a high degree of modularity provided under the BSD3 close license. When it comes to architecture Unicraft implements functional units as micro libraries and organizes micro libraries into library pools. A very important feature of Unicraft is that it is fully libraries and that includes the very core operating system primitives. So on the diagram on the right going bottom to top you can see that one can choose between different architectures then you have different platforms to choose from such as KVM, Zen or in our case the COCOS hypervisor. Moving up you can see the main library pool that provides things like schedulers, memory allocators, network stacks or even different flavors of a standard library or even run times for high level languages. The workflow for building an application is configuring Unicraft using Kconfig to select a set of libraries and setting various parameters and developing or porting an application that uses these libraries. Then everything is linked together into a single binary which is the resulting Unicernel image. So from what we can see Unicraft allows us to be highly specialized and minimal operating system images and being fully libraries it gives us a lot of control. In our domain we are mainly interested into very minimal images but there have been occasions where using a wider set of libraries has been required to port third-party applications. So having the flexibility to do that is important and being POSIX compliant means that it is possible to port applications from other operating systems with reasonably small effort. So let's have a look at Unicraft's performance. The graphs here come originally from the Unicraft project and the first graph shows some comparisons of image size of common applications built as Unicernels with Unicraft and we can see the resulting images are very small with the total image size ranging from a few hundreds of kilobytes up to less than two megabytes. And with additional optimizations we can achieve even smaller images. In terms of ideal memory usage we can see a comparison between individual Unicraft applications and various Linux images and by a look at the numbers we can see that the difference can be as large as an order of magnitude. Of course memory usage depends on the application workload so these numbers can change depending on the application but you can still get an idea of how big the difference can be. Regarding boot times, again we compare common applications running on Unicraft against Linux images and the difference here is significant as most Unicraft applications boot in less than 100 milliseconds. The last graph on the bottom right shows a comparison of LightVM which is based on work preceding Unicraft against Docker booting on a 64 core server. We won't be looking at Docker so much as it's not relevant to our case but rather on how it is possible to boot a large number of Unicernels on a given core while maintaining high performance. So the conclusion is that Unicraft provides us with an environment that allows us to build highly performant operating system images which makes the addition of guests no longer expensive. At this point I'll switch back to the Cocos Hypervisor and discuss an important part of our architecture which is Cocos Hypervisor extensions. Cocos Hypervisor extensions allow us to delegate functionality from the hypervisor to extension VMs. And that is because in order to fulfill its security and safety requirements, the hypervisor core implements only the smallest set of features required to provide isolated virtualization. Non-essential functionality is implemented in extension VMs which are executing in a separate other space from the hypervisor core. This operation functionality allows avoiding cascading failures as attacks on extension VMs or malfunction on extension VMs don't interfere with the execution of other guests or the hypervisor itself. So hypervisor extensions are an important concept as they allow the hypervisor to preserve its small PCB and they additionally provide high flexibility modularity to the whole architecture as users can pay extensions to tailor the hypervisor functionality to match their use case. Another feature that complements hypervisor extensions is synchronous exception forwarding. Synchronous exception forwarding allows dropping synchronous exceptions triggered by a VM in the hypervisor and dispatching requests to extension VMs for handling. So the Cocos Hypervisor can be configured to trap exceptions of this kind for a number of guests. When an exception occurs, execution of the guest that triggered the exception is blocked. The hypervisor traps the exception and dispatches a request to a preconfigured extension VM. The extension VM processes a request, it notifies the hypervisor with a response and the hypervisor updates the aborting guest context and resumes its execution. Okay, so with all the background covered, I am now ready to provide an example that shows how all these work together. So this example shows how we can use a unicernel-based extension VM to provide TE virtualization services. I will first provide some background on TEs. So TE is an acronym for trusted execution environment and trusted execution environments provide platforms with a secure environment that can be used to protect secrets and perform operations on them. The idea is that if the system is compromised, the secrets remain protected by the TE. Common examples of functionalities implemented by TEs are key management, anti-rollback, secure storage, DRM, and secure credential processing. So in this example, we will be focusing on Opti, which is a TE developed by Linaro. So Opti's execution model is governed by the global platform standards and the standard specifies client applications executing in a rich OS environment that communicate with trusted applications running in the trusted execution environment. And in ARM-based architectures, there is hardware support for that, which is trust zone, and which provides two execution contexts, the normal world and the secure world. In the normal world, you have your usual operating system or multiple VMs in the hypervisor. And in the secure world, we have the secure OS, which is, in this case, Opti. At any given time, the processor can execute either in the normal world or in the secure world. And communication is usually invoked by the normal world, issuing a secure monitor call, or SMC, which causes the processor to switch to the secure world and process the request. Nowadays, TEs are becoming ubiquitous in various types of devices, and so is the requirement for virtualization support. So major TEs now start providing this feature. Virtualization support normally requires some assistance from the hypervisor. And in Opti, that involves appending its request with a VMID and performing some memory translations, security checks, some context maintenance, and so on. Opti refers to the virtualization layer as the Opti Mediator, and there is a reference implementation in the Zen hypervisor. The architecture can be seen in the figure on the right. So here we have two guests issuing requests to the secure OS, and all requests are relayed through the Opti Mediator, which is implemented as part of the hypervisor. Now the problem with implementing the TE virtualization layer in the hypervisor is that it doesn't scale very well in safety critical systems, and that is because in the ARM ecosystem there are several trusted execution environments. And although all of them implement virtualization support in a similar way, that is with the help of a virtualization layer, currently there is no standard governing TE virtualization that is implemented by vendors. So the result is that the specifics of the virtualization layer is very much dependent on the TE implementation, and the complexity of this layer again varies very much between different trusted execution environments. So this means that a hypervisor needs to provide a different implementation for every TE it needs to support, and when it comes to safety critical hypervisors that need to be certified, this can be a problem. So our approach is to remove the complexity from the hypervisor and outsource it to an extension VM by leveraging the synchronous exception forwarding feature that allows us to trap SMCs happening on the guests and forward them for handling to an extension VM. The architecture is shown on the diagram. So a guest will issue an SMC through the current driver. The hypervisor will trap the SMC. It will issue a request to the TE mediator which will perform all tasks required for virtualization and then it will forward the SMC to the secure monitor which will in turn dispatch it to the secure OS. When it comes to the extension VM we implement this guest as a Unicraft based unicernel which results into a nearly bare metal image which is exactly what we need for this type of application. The image consists just of a few libraries so going from bottom to top we have the library that implements the architecture part in our case ARCH64 then we have the COCOS HV platform library that implements the core part of our guest then we have Nolipsy which implements a subset of the standard library for applications that don't require full implementation and the COCOS HV the COCOS Hypervisor extension library that implements the communication protocol with the hypervisor core. On the top we have the Octimediator application that implements the main logic. One last thing I would like to point out here is scheduling as there may be several guests running on the system whether these are regular guests or hypervisor extensions we'd like to ensure that our extension VM consumes as little scheduling time as possible and we achieve that by adding the extension VM to the hypervisor scheduler when there is a synchronous exception to process and removing it from the hypervisor scheduler as soon as handling is complete. Now remember when we introduced the synchronous exception forwarding that when a guest causes an exception execution is stopped for that guest until the exception is handled. These two combined means that the extensions VM effectively stills the scheduler slot of the VM that causes the exception for the time it requires to handle the exception and returns it back to its execution it returns it back to the hypervisor when it's time for the execution of the aborting guest to be resumed. This allows us to maintain high performance as we keep the number of scheduled guests constant. Coming to conclusions, we saw how component aggregation imposes security risks and we highlighted the requirement for compartmentalization and isolation and we saw that Unicarnals are a suitable platform for that purpose. We introduced Unicraft and we saw how it can help us produce small specialized guests of high performance and low resource usage and we had a closer look on how the COCO's hypervisor achieves isolation with the hypervisor extensions and synchronous exception forwarding. Looking ahead, there is an obvious question on whether it is possible to isolate every single service or component and the truth is that there are several challenges ahead. For instance, there are cases of very complex device drivers that also depend on equally complex subsystems that would require a lot of work to port to a Unicarnal and clearly in these cases it's probably a better idea to stick to Linux. When it comes to scalability, what we have learned from microcarnals that feature a similar architecture is that scaling up involves its own challenges as well and based on the above, it should all come down to achieving the right balance. So this is it. Thank you for attending this talk and I hope you found it interesting.