 Hello, my name is Ricardo Kohler, I'm a researcher at IBM, at TJ Watson Lab, and today I'll be talking about virtualization, VMs and VM monitors, and why are these getting more smaller and lightweight? Then I'll talk about what I believe is the logical next step, which is getting VMs even smaller and monitors even tinier to the extreme to not having one monitor. I'll talk about a prototype of such an idea called a NAVLA Linux, and I'll talk a bit about how would that look like, and some results, some performance numbers, and some isolation analysis. So this whole idea of having VMs being smaller comes from containers. So the things that containers are great for development, and I have really good lightweight performance characteristics, like they can boot very fast, they have very low memory overhead, and there's a lot of tooling and orchestration built around containers like Kubernetes, Istio. Also containers are really good for things like sharing files, for example, and even things like sharing TCP and PSTACs between containers, which are hard to do with VMs, for example. But containers are not so great in terms of isolation. So that makes them not ideal as cloud units of execution. And the reason is, it can be seen on the figure on the right. The thing is that in the cloud you have machines running containers from different tenants. Where a malicious tenant can have access to the host kernel via some attack on the host kernel, and then gain control of the kernel on the whole machine because this host kernel is privileged, and then access some secret from a collocated container, for example. Now the reason why that's easier to do in a container is because a container accesses the host kernel via a very large attack surface. The host kernel provides this container processes with a very large set of systems, pretty much all the systems available in it, which are almost 300. And that's because a container relies on a very high level of abstraction. As compared to a VM, which uses a lower level of abstraction, relies on a virtual machine abstraction, which needs a narrower and smaller set of APIs from the host kernel. So in a way, this is a smaller attack surface to the kernel. And that makes VMs, that makes malicious VMs harder to attack host kernels and then gain access to the machines. So because of that, VMs are being used as the units of isolation in clouds, and projects like Kata containers use them, for example, Kata use them by having a container, a container per VM, or a container, or a pot of containers per VM. The issue, though, is that containers, sorry, VMs and VMs are kind of bloated. VMs, in particular, which are the pieces in this stack that emulate the machine and all the devices that the machine needs, have traditionally been emulating way too much and doing way too much, more than needed in the cloud, for example. They emulate a lot of devices, which not everybody needs, and they even emulate multiple architectures or instruction sets that are more complicated than needed. For example, a VMM, like an older commute, with all the features, will be able to emulate something like a floppy and even being able to run something like MS-DOS, which is definitely too much for the cloud. So there has been a movement towards having a smaller and more lightweight VMs. That means attaining the guests and the monitors. So on the guest side, there has been an effort to make user spaces smaller, for example, with Alpine Linux. On the guest kernel side, there are configurations for the kernel. For example, the one from the micro-VM one from the Firecracker project, or even extreme cases like Unicernals, which are VMs built to only run one specific application. They include nothing more than what's needed. On the monitor side, there are projects like a Firecracker or Unicernal monitors like Solo5HVT, which specialize for some specific cases and only includes what's needed, nothing more, making them as smaller as possible. By doing these, you get these micro-VMs, and you get lots of benefits. For example, you get a smaller attack surface to the host. You have good performance, lower memory overheads. You're able to boot much faster than a traditional VM, and because everything is simpler overall, you have less code, and it's cheaper to audit the whole thing for security. Now, as an example, we have a Firecracker, which is the state of the art, and this is a monitor in the KVM stack written in Rust in a safe language specialized for serverless workloads and containers. By specializing for these particular workloads, it can make some assumptions, and it has a very reduced device model. It only emulates a very small set of devices and makes things like not having PCI or not have a BIOS. By doing these things, it's able to boot really fast in 125 milliseconds, and has very low memory overhead, five megabytes in the case of Firecracker, which is great. Why is that not enough? Actually, that's great, and that's enough for mass cases, but I've been thinking about it. Can we do better? Do we really need to run everything as traditional VMs, meaning that you have a KVM and you have a monitor? Wouldn't you use, for some types of workloads, wouldn't you go with just not having a monitor, something like this in the figure? At the very least, the design on the left, the design on the right would be much simpler because you won't be using the VMM, so you have one extra component in the stack less, and also because you're not using virtualization in traditional settings, you wouldn't be using KVM, and in cases where you don't have things like hardware support for utilization, where you don't have support for extended page tables, this KVM piece gets really, really complex, so not having to use it makes things simpler overall. Even in the case where you do have hardware utilization support, the hardware features just to support that it should be accounted as a complexity as well. At the very least, this thing on the right should be simpler, and even maybe faster, we'll see later. For example, in terms of latency, system calls that do IO should have lower latency because they have to go through less steps in the stack. So now we're going to see how you do that. The first thing that we need to think about is that this VMM was doing something, was doing a lot of work, actually, so now that you don't have one, you need the kernel, for example, the guest kernel to be doing its job. So let's say we wanted to have this thing on the right running as a VM, as a single process. Let's say even if you have multiple guest processes, you wanted to run somehow as a single process running on top of a low level of abstraction, meaning on a very small subset of all the system calls available. So ideally, we want a VM running as one process on a small set of system calls, and we need to be able to do whatever the VMM was doing. So let's take a look at what the VMM was doing. Basically, the VMM emulates the machine. It emulates virtual devices, CPUs, it emulates virtual MMU, and very importantly, it protects the host. And this is important because that's the reason why we're interested in VMs for the case of content, right, is the protection part. So virtual devices, these VMM emulates, for example, disks, actual disks which are consumed over a bus, for example, and these disks are emulated using some underlying resource in the host. For example, a file in the host, or it could emulate a network card on top of a raw socket on the host. That's one of the jobs of the VMM. Another job of the VMM is to emulate virtual CPUs. That means one or more CPUs with the complete instruction set of the machine, of the host machine, right? And that could mean, for example, having to emulate 2B CPUs when the machine, the physical machine only has one physical CPU. The VMM is able to emulate that. Another job for the VMM is to emulate an MMU. Now, an MMU allows the guest to implement virtual memory. It makes it possible to have multiple processes with different address spaces in the VM, and protection between address spaces, and even having different levels of privileges, for example. It makes it possible for the guest to have a kernel with pages only accessible by the kernel, or having pages only accessible by processes and not the other processes, also makes it possible to map pages between processes, mapping the actual same physical page between two or more processes. And all of that is controlled by the VM via a page table and some registers' accesses. So the VMs running on top of this VMM are expected to handle these virtual memory configurations by writing into memory and writing some values into interregisters. That's another job of the VMM, right? Finally, another one of the final jobs that I'm listing here for a typical VMM is to protect the host. So the way it does that is by first limiting what the instructions running in the VM can do. That, in a way, limits what type of access will the VM have with regards to the whole physical machine. And the second thing is that the VM does IO, for example, by indirectly talking to the VMM, and in a way it has certain control of the VMM. So it's important to also limit what the VMM can do. So the typical way that VMMs do it, for example, QMU or Firecracker, is by implementing a sandbox. So that's like a wide list of system calls, a list that only includes the system calls that the VMM process can do at runtime. And that means that, for example, a VMM shouldn't be able to open any random file in the file system of the host. In particular, if the VMM is already running and it's using some files at the disk and some type device, it has enough file descriptors and resources to do its job. It shouldn't have to open anything else. So the way to sandbox this is to not allow this VMM process to make that particular open call. Another reason is that you could have the VM, malicious VM, escape the instruction box and get access to a VMM. So you want to limit what this attacker running as the VMM would do as well. Another important point to notice is that this sandbox, this list of system calls that the VMM is allowed to do, is not only small, but it's also bounded. Meaning that whatever application, whatever workload you run in the VM, the VMM will never have to do more than those particular system calls that you have in your sandbox. And that's different in the case of a regular process of a container, which is very hard to define ahead of time what are the system calls that some application would need. As an example, an extreme example, let's say the application is Python. So the application could just import the OS module and call OS whatever system call it wants to make. It could literally make any one of the 300 system calls that you have available in the new scourge. But at the same time, if you have that application running on the left as a VM, it can do whatever combinations of system calls it wants, but the VMM will never, never, ever make more than the system calls allowed in the sandbox. And that's a great reason for why we use VMs as an isolation boundary as compared to a process. I think that is the reason, the main reason. So we just discussed what's the role of a VMM, what is doing in the intervalization stack. So we can remove it and have some other component do the same, do the same, have the same job. So specifically, we talked about all these four tasks, virtualizing devices, CPUs and MUSE and protecting the host, and we want to not have a VMM. So we basically want to have a kernel running as a process now, do all of these ones. So doing the jobs with an OK symbol there, like virtualizing devices, CPUs and protecting the host, it's completely doable. On the other hand, virtualizing an MMU, that's hard. Actually I don't know how could you possibly have a process providing a virtual memory abstraction to itself in a way. The only way would be to have full emulation. You could have QMU, for example, with full emulation and pretty much emulate whatever you want, including an MMU. But I was thinking about it, and I think one way to go about it is to not have an MMU. And we'll see that that's, although it sounds crazy because not having an MMU could mean not having processes seems too much, but as you will see, it's actually enough to run a lot of stuff, even processes in some way. So the way to go about it is by using two known features of the Linux kernel. The first one is UML, user mode Linux, which makes it possible to run a Linux kernel in user space as a process. And the other one is the no MMU config option for the Linux kernel, which makes it possible to run the Linux kernel on devices with no MMUs. So let's first talk about no MMU. So I don't know if you guys remember the iPod. So we had, sorry, guys and girls. So we have, I mean, there was this thing called the iPod that I guess nobody's using anymore, which was at the time, it didn't run Linux, but now turns out that it can't actually run Linux, even though it didn't, it was a very simple device that even didn't have an MMU or didn't have a, I mean, had a very simple MMU. But it's possible to run Linux on one of these things, which is pretty amazing. And the way it's done is by using this config option in the kernel called config MMU. So you just say no, and that's it. And I mean, from the user point of view, it's very simple, right? But actually what happens behind it, it's very clever and it is very neat. So this is an idea on the right of what's happening, right? So you have one single address space, and remember, there is no virtual memory. So there is just one linear address space, which is the same for every single process in the kernel. And there is no protection. Every process can see everything, can touch everything. And every address will be the same for everybody in this machine. But even if you have that, it's possible to run multiple processes. And the way it's done is by using processes with code built position independent binaries. Typically, binaries are built with hardcoded addresses. And they ask via ELF files that to be loaded at specific locations and offsets in the virtual address of a process. But for security reasons, people have, I mean, Linux developers, have developed these older type of binaries called Py, position independent. The idea is that now these codes for different processes can be loaded anywhere in memory. And again, that was done for security purposes. But it's very convenient for something like a device with no MMU, because now you can load, for example, these green binary or the orange binary, whatever you want in the address space, which again is just a single one, and just run it. And run it means you give it a stack, you give it some memory for a heap, and the thing should just run. And actually, it does run. So that was the first feature, right? The second feature, it's called UML. It's a user mode Linux. And it's really not exactly a feature, it's from the Linux kernel point of view, it's an arch. So you can build a kernel with arch equal UAM. And it will build an executable, typically VM Linux, which is a regular executable in Linux that you can run as a process. So it's a regular in every sense, like it has a main, and you can run it like any other process. And the difference though, I mean, it's a special process because it's actually the Linux kernel running, and it does some interesting things. For example, whenever you start a new process in UML, this UML kernel process will fork, literally fork from the point of view of the host, a new process, and control it via ptrace. So what happens is that whenever this user level process, user level from the point of view of the UML VM does a system call. The system call is trapped, and the UML kernel gets control, so it can implement the system call and then return back to the guest, to the guest process, again, from a point of view of UML. Now you might be thinking, this should be enough, just not even considering no MAMU, just UML, it should be enough as what we were looking for, which is a virtual machine running in user space without using a monitor. In a way, this is what we want, but not really, because for security reasons, we said that we want a type of VM with a low level of abstraction, accessing a low level of abstraction in a very narrow attack surface to the host. I don't consider a fork to be low level enough, so this doesn't really feel like a virtual machine in that sense. Fork operation is not a machine operation, a machine operation is something like write a packet on a network card or write a block on a block device, but really that's debatable. Another reason is that the cost of doing a P-trace is just too high, and just in case there are other projects that do something very similar, for example, GVisor has a similar architecture, also P-tracing for processes and things like that, the only difference is that instead of using the UML kernel, they use their own version of a safer kernel written in Go, but they have the same issue with P-trace. And just in case, there is another version of GVisor that uses a virtual machine abstraction, a virtual machine boundary as the mechanism to trap into system calls and have the kernel implement the system calls. But I won't go into that in this presentation. So again, what we want is a mix of UML and NoEMU, and we really want a single process running on a low level of abstraction, doing very few system calls like here and a VM capable of doing useful stuff, and for example, being able to run multiple processes. So again, the idea is that we will use UML to implement these drivers and implement these network operations, and we'll use NoEMU as a way of having multiple processes running on a single address space, a scene in the feed. Now, if you do this, most of implementation is already there. For example, UML takes care of the part of how running the kernel as a single process. So sorry, as a process. So UML, I mean, R-U-M, all that piece of code in the Linux kernel already knows how to already define some main, already knows how to start the whole thing, start the kernel, it has code to handle kernel threads, and just in case, these are green threads, pure green threads. So that means that although you have a single process, a single UML process, you have multiple kernel threads running in this single process. And it's basically code long jumping between pieces of code. And that's different from a different type of training, which would be something, for example, that you would do with P-trades, where a P-trade maps to a kernel thread in the host. UML is not doing that. UML is, again, it's a single process, single thread of execution just happens to implement trading in user-level. Now, all that is already implemented in UML. UML also implements devices. For example, there is a UML disk, which can be using a file on the host, where you can define a Q-Cal 2 as a regular machine, regular virtual machine to be used as a disk for UML, or a tab device, or a raw socket. Yeah, that's all done on the UML side. And from the NoMMU side, we get memory management, we get the MM for NoMMU. And that's important for things like M-Maps, for example. We need a way to do M-Maps without an MMU. And just in case, these are not all possible types of M-Maps. For example, the NoMMU version is not able to do M-Maps at fixed locations. Because that will be tricky, because you only have one address space, and there is no way to ensure that a different process won't try to M-Map the same location. So the way to fix it is to not allow it, not even implement it. But there's some stuff that is not implemented by using UML and NoMU to have these crazy things that are being described. For example, all the process creation and management in UML is using fork and pitrace. We don't want that part. Also, it assumes that the transition from user to kernel space, typically a system call, is done via pitrace. We cannot do that anymore, because we don't even have pitrace, right? Also, another thing that needs to be done is that loading an ELF typically uses memory maps. I mean, the same physical page corresponding to a file, an executable file, is mapped to multiple processes. And now, we don't have an MMU, so we cannot map to anything. So the air-floating part had to be adapted for an MMU case. And on the user space, although we wanted to go with minimal changes, you need some change. At least, you need to rebuild some stuff. For example, libc needs to be adapted to use a different mechanism than just doing C-scull instructions, in the case of x86. Things like busybugs also needed to be changed, and we're going to see these ones in more detail in a bit. So the first point. So again, UML handled processes by forking the host and then pitracing these processes, right? I mean, we don't want these, because again, I consider these to be too high-level. It's not VM-like. So we wanted to have something like this on the right. So now that we're on the right, we're running everything in the same process. We don't have an idea of a kernel user-level privilege, so everything is a function call, right? So the way that user calls the kernel is by a function call. So the question is, where does it get the address for that function call? Or how does it call it? Turns out that that problem is already solved. So the kernel implements, well, I mean, used to implement this thing called bc-scull and more recently, bdso, which allows the kernel to define a mechanism for which the user space, most likely libc, is going to call the kernel. And that's used for some system calls like getTimeOfDay, which should be really fast. So it's a way for the kernel to tell user space, a user space, don't do a regular system call instructions in the case of getTimeOfDay. Just use this older mechanism, which in the case of bdso, it's a way of really having the user space run kernel code in a kind of a safe way. We can use that mechanism to tell in UML, in this modified version of UML, to tell user space, these orange program there, to call us by some specific mechanism, which in this case will be just a regular function call. And again, that mechanism already exists, but is not used by default for every system call in libc. So libc only makes this check of, is there a bdso version of this system call for some system calls only, for example, getTimeOfDay. So let's say that the adaptation in this case, to make this proof of concept was to have that mechanism apply to every system call. So for every system call, libc should ask, is there a different way, kernel, tell me? And the kernel will say, yes, you should make this function call. So yeah, these libc changes is what I just described. Plus there's some other extra things, for example, libc knows how to malloc memory, and malloc could be implemented with mmaps. And in this case, mmap is the same mmap calls, but not all the available flags are available for a libc with no mmu. But nice enough, at least this first thing, it's already implemented in muscle. So you can build muscle libc with no mmu. And it just works. You just need it to be changed a bit to have this is called the default for every system call. But changing libc is not enough. You also need to change applications. For example, you need to change, in this case, if you wanna shell and do a common line applications, you need to, I mean, the easiest way to go and change busybox and to build it with no mmu. And also, amazingly, that also already exists. You can build busybox with config no mmu and just works. And it does the following two things, most importantly, it removes a dependency for some operations on mmap-statifics locations. And most importantly, it replaces forks and execs with v forks and execs. And I'll go over that next. Because it's quite interesting. So forks and execs. So forks and exec is the traditional way by which something like a shell will spawn a new process and a new executable period. So what happens is that you have a parent process or process that calls fork and fork creates an extra process, a child. And the rule is that the child process and the fork process have the same memory, exactly the same memory, but the child has a copy. Instead of actually doing a copy, modern operating systems implement fork by having memory copy and write. So that means that it's a lazy copy, no way, right? I'm sure you all know this already, but you'll see why this is relevant in a minute. Now what happens is that if you have fork and exec immediately after doing a fork, the child will go and execs, so we will load another executable and just run it. And at that point, these mappings that you have for the cow, for example, are not used anymore. So these child now in green, it's using its own version of memory for everything. It has its own heap, its own everything. Nothing is marked copy and write, I mean by default. In the case of fork alone, this is used, for example, for things like having worker processes all starting from the same memory. So as you will see, this is problematic for the cases where you don't have any memory because copy and write can only be done if you have virtual memory. I mean, copy and write really means mapping pages to the same thing and having one migrate only and things like that. Without an MMU, there is no concept of virtual memory or mapping or none of that. So you cannot really implement these with no MMU. So the way that it's fixed, I mean, not really fixed, the way that it's fixed in busybox, for example, is not busybox in no MMU mode. Whenever it does fork and exec, I mean, this configuration makes it use a Vfork instead. And Vfork, it's not a made up system called just in case, it has a particular historical use, but now it happens to be very convenient for no MMU devices. And think of it as a fork with slightly different rules. So the rules for a Vfork are that when a process Vforks another process, the parent process blocks means that it does nothing, it's just here blocked on the Vfork until the child execs or exits. And now that's super convenient because that means that if, for example, the shell in busybox happens to start another shell, for example, you don't have this issue of needing to mark things copy and write. Because there is no moment in time where you have the parent and the child needing to access the same pages as copy and write. There is no need for that. So it's safe. On the other hand, a fork alone, it's not implementable. So one way is to replace it with a Vfork. Another way is just keep a fork, but it's a special type of fork because now things are not necessarily copy and write, things will be literally the same pages. So meaning that if a child writes a page, the parent will see that change. So that's not what you would expect on a fork. Anyway, that's what busybox with Noememew does. And again, that's not something I did, it's there already, for most likely for devices that do not have any memory. Now, let's see an example of how this proof of concept works. So just in case, I don't have a clear idea of how this thing will be consumed, but this is how I've been using it and how I ran the experiments that I'm gonna show you later. So you can take an Alpine Docker image. This Alpine Docker image could have things installed. For example, it could have Python, Node, I don't know, a bunch of other stuff, but it comes by default with a muscle Lipsy, just a file, and a busybox, which is also nicely just about. So the way I've been transforming this Docker image to be consumed by this UML VM, I mean, because really it's just like a modified UML. And just as any VM, it doesn't know how to consume a Docker image, it knows how to consume a disk. So you have to make a disk for it. So the process I've been using is to replace Lipsy and busybox with our own version of muscle Lipsy and busybox and create a disk. In this case, I've been creating XT4 on QCAL2. So you do that, and now you can run it as a VM. Now, why Alpine? The reason is that, again, we need to have position independent executables, PyExecutables, and Alpine, one of the Alpine goals as mentioned here is to be secure. And there's that again, I mean, PyExecutables are nice because they make things more secure. Really, they make the attacker's job harder. But anyway, the point is that Alpine, one of the goals is to have everything as PyExecutables. And that's actually kind of true. Not everything is Py, but most things are Py. And the other reason is that Alpine uses muscle and busybox. I could have used G-Lipsy, but in this case, busybox, I mean, it's perfect. It makes things very simple because it means that you need to just replace one file. And also busybox already has an option for be built with NoMEMU, which is really convenient. So if you build one of these things and you run it, this is how you would run it. Actually, you'll be able to run it. I have a link at the end. So in this case, I'm running on a host and just in case I'm printing the kernel version, which is a 4.15. And then you run this VM, which again can be seen as a regular VM. This is not much different to the way in which you will run a QMU VM. And also, if you know about UML, it's exactly the same invocation that you would use for UML. Now you run this thing and you will immediately get a shell. In this case, I'm using a particular type of init, which is basically starting a shell, being SH. And then from there, if you print the kernel version, you will see that it's a different kernel version. Again, it's really a VM in a way, right? If you go there and you do a top, you will see that it's running, in this case, it's running just two processes, my shell that started top and a bunch of part kernel threads. And the best part is that all of this stuff is running as a single, as one host process. From the point of view of the host, this is one process. And again, just to be clear, there is no KVM, there is no hardware utilization support of any kind, there is no monitor, it's just a modified UML process that is doing just these system calls on the right. So it has system calls, for example, write and read to, I don't know, read and write to console, the block device. I don't know, nano-sleep, so it can implement, you know, timers. Timer set time. By the way, I don't have much time to go into this one, but timer interrupts in UML are implemented as timer signals. So actual timer signals are mapped into timer interrupts, which are forwarded in a way to the Linux scheduler, the regular Linux scheduler to preempt processes in running as regular. So this is a preemptable kernel, just in case. And again, in that particular case, I didn't implement anything, everything is already done for UML. Just to compare, if you were to do something, oh, by the way, I forgot to mention, in this case, these system calls, these S trace result was gathered after running a bunch of stuff, I ran Python, GNU plot, I formatted a bunch of disks, I ran Node, Nginx, a bunch of stuff. And again, whatever you do, it will never be more than these system calls, which is really nice. It's the same property that we saw that happens for a regular VML. Now, just to compare, if you were to do a Hello World in Python, these are all the system calls that this process would do. And in just in case, if you just change the version from three to two, it's a different type, it's a different set of system calls. So that's to show how many system, how high level and how many system calls does the kernel need to provide for something like a process in just a Hello World? Just in case, I mean, there are lots, lots of limitations, right? Although you can run Python, which is awesome, and this for me, there is a lot of stuff not implemented in this VM, right? First of all, you don't have virtual memory, you don't have memory protection. So everybody running in this VM sees everybody's memory. So it makes it much easier to just crash the whole VM. So it's easy to crash the whole VM, the whole VM. For example, a null pointer exception will crash the whole VM, which when you run on a regular Linux box, it just crashes up the process, right? Also, you don't have forks. So just to be clear, that means that if you, for example, going to Python and do an OS fork, that fails with an error. In this case, fails with an invalid argument there. And again, it can only run PyExecutables and you need to use a modified libc. And turns out that the PyExecutable thing, it's kind of a pain. Being trying to run TensorFlow PyTorch and because of not being Py, I was unable to do so. Also, just the fact that I'm basing, I use muscle also makes some stuff harder. For example, I wasn't able to run PyTorch. And that's not even considering these new type of VM that I'm talking about. It's just being able to run on Alpine. Yeah. So super quickly, some performance numbers. So one of the things that this thing is good at is to boot fast. So on the left, we have Firecracker with the micro VM configuration. And although in the paper, the Firecracker paper from NSDI mentioned that it can boot in 125 milliseconds, I was able to boot in less than 60 milliseconds, which is great. But NABLA Linux, this thing is able to boot even faster. It boots in less than 10 milliseconds, eight milliseconds to be correct, which is great. Now, this slide over here is just to show system call latency. And it's just to show that just because you're removing one layer, it doesn't necessarily mean that latency is gonna be better. So it's better in just the system calls in the middle. Some stuff like selects on a bunch of file descriptors or forking lots of times is not necessarily faster. So this is just to show that this is not necessarily better in every case or something like system call latency. And finally, NGINX throughput, this is just application throughput that I want to measure with IO. And I compare it against Firecracker. And interestingly, it's exactly the same as Firecracker. And that's expected because although Firecracker is using a VM and it has Verdio in the middle, maybe latency is higher, but because Verdio is batching a bunch of packets, at the end doesn't make any difference, right? So just to finish, what's next? So there are a couple of things I wanted to get out of this presentation. So first of all, see an interesting trade-off for anybody, more specifically, I mean, this weird thing that I'm proposing gets you some stuff. For example, simplicity, it's much simpler because you don't have one of the components like the PMM or KPM. It boots fast and could help with some specific cases like let's say nested. In this case, you would be able to nest, I don't know, as many times as you want, but all that is in exchange for generality. You cannot run anything, you cannot run any binary or any container. So again, is this a trade-off that people might be interested in? And finally, if the answer is yes, what is the best way of consuming this? Because again, I've been doing this crazy hack of just replacing a file and just crossing my fingers that things will run after this, right? I mean, doesn't seem ideal. So that's something that I wanted to discuss, hopefully, as well. So that's all I have. Thank you very much and I'll be taking some questions. And just in case, here's the link for the repo with all of this. Thank you very much.