 Hello everyone, my name is Agilopez and I work as a software engineer in the virtualization team at Rehat. And here I'm going to talk about LeapKRUN, a more than a BMM in dynamic library form. So the first question we need to answer is what is LeapKRUN? And if I had to define it in a single quote, that would be that LeapKRUN is a dynamic library that enables other programs to easily gain KVM-based isolation capabilities with the minimum possible footprint. Among LeapKRUN goals we can enumerate that we want it to be easy to use, to integrate all the features needed for its purpose with minimal standard dependencies, to be as small as possible in code size, which means to implement the minimum set of required features, but also to have the simplest implementation of those features. We also wanted to have the minimum possible footprint and to provide a friendly environment for micro-service and container workloads. What we don't intend it to be is something that supports conventional virtualization workloads. In other words, LeapKRUN is not a replacement for QMU, nor VirtualBox, nor any other BMM out there. LeapKRUN integrates a number of components distributed among two libraries. The main one, LeapKRUN, integrates C-bindings to interact with the library itself, a simple virtual machine monitor based on Firecracker and Rust-BMM crates, some art-dependent devices, an integrated BitioFS server, and a minimal set of BitioFS devices, which are BitioConsult, BitioFS, BitioValue, but just the free-page-supporting feature, and BitioVsoc. And provided LeapKRUN firmware, we have a minimal interface to access the guest payload, and a bundle minimalist Linux kernel has the default payload. Now some of you may be wondering why implement a BMM has a dynamic library. So let me explain this with an example. Let's imagine you have a runtime and you want that runtime to execute a BM, in this case using an external BMM. What the runtime will need to do is to locate the BMM binary through the file system, and possibly this BMM binary will need to locate other components as well as the file system, such as all the libraries, a kernel image, maybe some firmware. This is not a problem on itself, but it may become one if the runtime intends to switch between different namespaces. So let's imagine the runtime has switched to a different one point namespace. In this case, the runtime won't be able to locate the BMM binary, and even if you would, the BMM binary will be able to locate its own dependencies. This means that the runtime will need to somehow carry all the BMM and all its dependencies between one point namespaces, which is complicated to do and not very efficiently in any way. So what happens with the clicker run? With the clicker run, the runtime is linked against the dynamic libraries. So the moment the runtime is executed, the dynamic loader brings the clicker run and clicker run to rework with all the components inside the process memory map of the runtime. This means the runtime can safely switch between different namespaces, including one point namespaces, knowing it will carry with him all the dependencies it needs to run like with BMM. Okay, switching to another topic. Sooner or later you have probably noticed that when I listed the Bitio devices integrated in the clicker run, there was no support for Bitio Blow, Bitio Scasi. So how are we doing storage with all without the devices? So we are using Bitio FS to use any directory on the host as the guest root file system. So what's happening behind the scenes is that every time the guest over at the system is used as a file system request, this file system request is relayed to the integrated Bitio FS server, which is running in the context of the runtime. And this Bitio FS server acts on behalf of the guest by accessing a directory on the host file system. The advantage of this mechanism is that it requires zero storage management, so you don't need to create images, shrinking, rowing them, you don't need to partition them or layer a file system on them. It allows to easily share files between the host and the guest out of the box because it's basically how they work by default. It's very friendly to microservice or container world for the previous reasons. And the problem we need to have is that the performance is not as good as when using block-based devices. Basically, because you can't rely on the cache of the guest, so which means you need to go more frequently all the way through the host to access to request data. But on the other hand, this is good for our memory footprint because we avoid polluting the guest memory with the file system cache of the guest. And the other problem is that the attack surface is larger than using Bitio Block because it requires more code and more syscalls. That said, the ACV-enabled version of Lycoron replaces Bitio FS with Bitio Block. Mainly it's because it's better suited for running confidential workloads. It's smaller, requires fewer syscalls and allows us to rely on Lux2 for integrity and encryption, which is great. We will be talking more about this on the don't peek into my container talk which follows this one. Similar to what happened with block devices, among the Lyxos, Bitio devices supported in Lycoron, there is no support for BitioNet. So how are we doing networking without network interfaces? Well, in this case, we are using another technique which is called Transparent Socket Impression National or TSI. And what happens with it is that when the user space application running on the guest requests the kernel for an AFE net socket, the guest custom kernel provided with an AFTSI socket instead, which has compatible semantics. And this AFTSI socket integrates both a B-sock and an innate personality within it. This all happens in a completely transparent way for the user space application which doesn't require any kind of modification to specifically support TSI. Okay, so now we have a user space client that has received this TSI socket instead of an innate socket. And let's say that this user space client wants to connect to a local endpoint to a server that is running within the context of the guest. What will happen is when this client is called toConnect, the TSI socket will attempt to fulfill the request using its innate personality. And since there is a user space server listening on a port on the local context inside the guest, this request will be fulfilled immediately and the connection will be successful. And both this user space client and the user space server will communicate between them the usual way without any knowledge that they are going through a TSI socket. Okay, now let's make things a little bit more complicated. And let's imagine this user space client attempts to connect to a server that is running outside the guest to an external endpoint. Okay, what happens after this is the same thing as before. The user space client will call toConnect on the TSI socket. The TSI socket will attempt to use its innate personality first but won't find a local endpoint there. So, after filing toConnect to a local endpoint it will attempt to fulfill the request using its VSock personality. This VSock personality will communicate with an integrated VSock which is running in the context of the runtime. And this VSock server will attempt to fulfill the request by connecting to an endpoint which is outside the context of the guest. If it manages toConnect to a server that is running in this context it will establish connection to it and we will reply to the user space client running inside the guest to let it know that the connection has been established. From that point on both the user space client running inside the guest and the user space server running outside the guest will be able to communicate between them in a completely transparent way without the knowledge that they are going through a TSI or a VSock socket. And what happens if instead of a user space client we have a user space server using a TSI socket? Well, in this case, once the user space server goes to listen the TSI socket will start listening both on the innate personality and the VSock personality. So, we will have a listening port on the context of the guest and we will have a listening port outside the context of the guest in the context of the runtime which is managed by the VSock server that is integrated in the current. If we receive a connection from a user space client running inside the guest operating system this connection will be fulfilled through the innate personality and if we receive a connection from a user space client running outside the guest will the connection be fulfilled through the VSock personality and through the VSock server acting which is acting as a proxy. What are the advantages of this mechanism? Well, for instance, we just need a minimal network configuration basically just the DNS. It allows leak around to outcome on behalf of the user space application running on the guest without the need of implementing a TCP stack on the library. From the host perspective, all connections come and go to the leak around enabled runtime and are visible in the network space of the runtime context. There is no need for network reaches, no IP table rules and as a result of the above, the environment is very friendly to continue workloads and to the point that things such as stereo sidecars work outside the box without any kind of special support for TSI. The disadvantages of this mechanism is that it requires specific support for each other's family and there is no support for raw sockets. Now that we have learned what leak around is, let's talk a bit about how you can use leak around. The first step will be to obtain leak around. We already have binary shipped by OpenSUSE Tumbleweed. There is a copper repository for Fedora. There is a homebrew repository for macOS M1 which uses the hypervisor framework instead of KBM and you can of course build it from sources. The project is hosted in the continuous organization on GitHub. Once you have obtained the leak around, you will get a header which contains the all documentation for each function. You also get a couple libraries but you only need to worry about leak around itself because it will bring leak around freeway into the mix and linking can be as simple as this example that we have here with GCC. This is a minimal example of a program using leak around to execute a VM. It's obviously ignoring any kind of error or check-in but just to illustrate how simple it can be to create a VM with the leak around. Basically what's happening here is that the program is first creating a context for establishing the configuration of the VM. Then it's configuring the VM to use a single BCPU and 512 Mb of RAM. Then configures the root effects directory to be used as the root file system for the guest which will be relative to whatever this binary is going to be, this example is going to be executed. And then it says slash v in slash sh has the first program that will be executed and then the point that will be executed within the guest. And it simply starts the kernel passing the configuration ID created before. In fact you can even copy-paste this example into a file compile it create a directory to be used as root file system for the guest extract an OCI image into this directory and then if you execute the binary you'll get right away a freshly started VM running a different kernel than the host and despite the fact that you don't have any external interfaces you can start a local service in the guest and connect to it from the outside, from the host. Lastly I would like to talk about some examples of use cases for Leak-a-Ram. We already have some projects using Leak-a-Ram we have K-Ram VM which uses Leak-a-Ram and build that to create that with VMs from CI images. We also have C-Ram which is the OCI runtime used by Podman which uses Leak-a-Ram to run containers with virtualization-based isolation and there are other ideas we are already working on which is the ability to run full encrypted workloads using AMD-ACB and we are also like to spur in the future other ideas such as giving conventional services the ability to self-isolate for instance we know that there are HTTP servers that are able to use CH routes or main spaces to isolate themselves and it would be nice to give them the ability to isolate them in a full VM without any kind of maintenance of configuration required for the administrator and another idea that I think would be nice to explore it would be to have a Microsoft platform that will deploy functions inside virtual isolated contexts using Leak-a-Ram This is all I had to share. I hope you enjoyed this session and thank you for listening. Bye bye!