 Thanks very much for being here and thanks to the organizers for inviting us, so I'm Anastasius Nanos. I'm going to talk about our work on Unicernals, how we integrated them into Kubernetes and how we actually applied Unicernals in a use case like Knative and Serverless. A bit of information about us, so we're a young company, a young SME, we're doing research. We're involved in research and commercial projects and we focus mainly on system software, on the low-level parts of the stack. This is the team involved in the specific project, so it's Babis, myself, Jorgos and Yannis, all Greeks. So, let's first go through the broader topic of the presentation. So we were interested in in serverless platforms where users write a function in a high-level language, they pick the event to trigger this function and the underlying framework handles everything else, the instance selection, deployment, scaling, monitoring, logging, all this stuff. Now, there's a group at Hasuric where they are doing work on systems in general and they focus on serverless platforms and the requirements of these platforms. So they are building their own system. So they they identify that there are issues regarding execution latency, throughput, energy efficiency regarding the execution of the function on the hardware and the security and isolation of these functions. In this talk, we focus on the first and the last one, so low latency, really low response times of the function and then on isolating the actual function from the rest of the platform. So these guys have concerns about the system's software stack. So they mentioned that it retrofits legacy infrastructure and presents high overhead when managing short-lived tasks. So however, Kubernetes is still the dominant orchestration framework and Knative is a Kubernetes native serverless framework. So our take was it's not a random project. It's actually something deployed and used by many, many people. So we focus on trying to optimize the parts of the stack that we care about and see what's going on there. Let's have a look at the architecture of Knative. So Knative has a couple of components, the activator, the autoscaler and the function pods. These components are being triggered by external requests and the activator talks to the autoscaler, the activator launches function pods and these function pods do the actual work. We examine isolation issues in this setup and we do that using sandboxing mechanisms and we examine the response latency, how the response latency is affected by these isolation by the sandboxing functions. We look into Cata containers, so we are involved in the community and it's actually one of the most mature frameworks to sandbox containers into VMs. So the Cata containers runtime is CRI compatible, that means you can spawn Kubernetes pods. Essentially, the way it works is that it spawns a micro-VM in AWS Firecracker, KEMU, Cloud Hypervisor, their own custom hypervisor, Dragon Ball and they spawn all the containers in a pod in this micro-VM. Essentially, the other sandboxing mechanisms like GVisor use the same principle. If we apply this sandbox container runtime into K-native, we do that using the runtime class option, so we end up with a figure like this. So the K-native function pod is inside a micro-VM, so the rest of the stack is protected from the user submitted code, however, there is this sidecar container, the kuproxy container in K-native, that is still in the same pod, in the same namespace, in the same security sandbox, let's say, as the user submitted code. Additionally, using these kind of runtimes increases the cold boot time because you have to spawn the micro-VM, you have to pass through the container with FS and spawn this container, essentially. And we ask ourselves, what if we had a way to isolate the user container from the rest of the stack, even from the kuproxy container, and at the same time, reduce cold boot times? And, of course, our mind goes to Unikernels. A bit of information about Unikernels. Unikernels is a specialized single address space machine image. It's built using a library operating system and, essentially, it's tailored for a single application. There is no kernel and user space separation, so we don't have mode switches. This is important for IO and for booting. And, essentially, it contains the absolute minimum software components for the application to run. However, Unikernels are seen as a research approach. There's this famous quote here that Unikernels are unfit for production. However, lately, there has been considerable work to make them more mature. We've got wider library support and we've got many tools to facilitate adoption, actually. There are many frameworks out there and each framework is tailored to a specific use case. The issues that we have with Unikernels in the cloud-native ecosystem is that they are not containers so we cannot use all the container tooling that exists and we like. And they are not typical VMs, so we cannot reuse all the sandbox container runtimes that exist and we can use with Kubernetes. So we think, and not only us and many people years ago, we saw that Unikernels should look like OCI images because OCI is used extensively in the cloud-native ecosystem and container runtimes should know how to parse these OCI images and boot the Unikernel instead of a container. And we thought, let's build a Unikernel-compatible container runtime. How hard could it be? So we built Urans C. It's CRI-compatible, it's written in Go. No surprises there. It treats Unikernels as processes, so essentially the container runtime manages the application, not the actual system that the application runs on. And Unikernel images are OCI artifacts and we have this modular approach where you can plug in hypervisors, you can plug in Unikernel frameworks and end up with a generic runtime, let's say. To build the Unikernel into an OCI image, we used the bug tool, essentially. We call it BIMA. We use a container file-like syntax, so essentially it's like a Docker file. We copy, you can see in the figure the format. We copy the binary. We copy any other extra files needed, like a configuration or extra libraries, whatever. And we annotate this container image using labels to facilitate the execution from the container runtime. We build it using the same way as Docker. And of course, because this is an OCI image, you can use any tools available for these images, like Scopeo, Emosi, Dive, or you can push them to Harbor, to Docker Hub. A tricky part about the integration with Kubernetes is that there are sidecar containers. There's the POST container or any other sidecar container, like in K-native that we will see later. What we did is that we check the annotations on the OCI image and we say, is this a Unikernel image? Then we use Uran-C, the standard code flow. Is this, if it's not a Unikernel image, then we call Ransi with the rest of the, of the software flow, of the code flow. So essentially we've got this separation even on the same pod. If we apply this logic into the K-native function pod, then we get, we build the user function as a Unikernel image. We package it using Vima, so we've got an OCI image. And if we create the K-native service with Uran-C's runtime class, we get the user container in a Unikernel, so it's sandboxed, but the Q-proxy container, which is not a Unikernel image, is a simple container, so it's booted using Ransi. So there is actual hardware virtualization separation between the user container and the user submitted code and the rest of the platform stack. So we get security from isolation from the user submitted code and faster spawn times because we don't put the whole thing in a micro VM. We have an agent, the agent talks to the runtime and blah, blah, blah, whatever happens with the sandbox containers. It's just a Unikernel. So to prove our hypothesis that this is faster from sandbox container runtimes, we got a server, we set it up with K-native, we tweaked K-perf to make sure that we measure what we want to measure, and we use a simple HTTP reply function. We used Go for the generic containers, the example from K-native, and we used C for Unikernel, it was a bit easier. So what we expect to see is we do a curl on the hostname and what we expect to see is the headers of the request plus the headers of the response. To dig into what we measure exactly, we build a sequence diagram. So we've got K-perf, which issues the request, goes through the ingress controller, goes through the activator, it creates a deployment. The deployment essentially creates the K-native function pod. We've got the user container and the Kuprox container, and essentially the request goes into the Kuprox container, the Kuprox container forwards the request to the user container, and we've got the way back, the response. So we measure the whole thing. We don't measure just the call-to-time. We call that service response latents. The numbers we took show that we get on the x-axis, we've got all the container runtimes that we measured, on the y-axis, we've got the service response latents, what K-perf said. And we can see that the sandbox container runtimes are almost two times slower. So the x-axis is in seconds, so lower is better. So all the sandbox container runtimes are twice as slow, and the generic runtime run C and urun C are almost identical. And this is the median value. If we show the 99th, the percentile, the slowest response, again, all the sandbox container runtimes are twice as slow, the generic and urun C are almost identical. And we do that for many instances. So this is where we tweaked K-perf so that we can have a one-to-one response. From, we issue a request, and we expect it to be served by a specific function. And we need to measure that. So we spawned 300 instances, 300 functions, and we measured the behavior of the various runtimes. And what we saw is that, again, the sandbox container runtimes up to 125 are almost two times slower. Then it goes up because we saturate the cores, essentially. And the generic and urun C are almost identical. They scale the same way. And what we take from these early measurements is that we are able to isolate the user code using hardware virtualization mechanisms and have the same response latency, the same execution experience, let's say, for the user, as a generic runtime where there's minimal isolation and there are security concerns because the K-native threat model says that you cannot spawn untrusted code. You have to have a dedicated cluster to be able to run user code. Now, perfect, we have time. So I'm going to show you a demo where we have a built workflow, how we build the Unicernel, how we build the EMADS, the OCI EMADS for the Unicernel, and how we push it to a registry. And then we're going to boot these Unicernels and show the memory consumption on an edge node, on a Z-tune, on an NVIDIA Z-tune oring, where when we spawn tens of functions on these devices, how the memory consumption changes. Let me see if I can do that. Presumably I can do that live. I also have a video, but perfect. Let me just clear this stuff. So, yeah, so there is this tool in the Jetson, in the Jetson software system where it's equivalent to H-top or to top, but they call it J-top because it has more information about the GPU and stuff. We don't care about that, we just care about the memory consumption. And here, we've got J-top and we're here. Let's have a look at the services. So we have deployed a simple K-native service. We've got, we call it HelloContainer. This is the EMADS that I didn't show how it's being made. So I triggered this just before my talk. So this is essentially a GitHub workflow where there are steps that we built the Unicernel, steps, another step that creates the image, that signs it and pushes it to a container registry. So there's this HTTP reply example. This is the step where we built the Unicernel. It's a Unicraft Unicernel. So what we do essentially is we clone the repos we use the default config. This is the chemo thing. Same thing is for Firecracker. And we get the artifacts, the Unicernel image essentially. The next step is to actually prepare the image. So we've got the container file just as I showed earlier in the slide. So we've got the Unicernel binary. We copy it, we annotate the image. We enter information about what kind of Unicernel it is and what kind of hypervisor we want to use. We sign it, we do that for ARMv8 and for X86 and we create a manifest so that we can glue the two images together. So the same thing happens for the generic application which is a docker file for the HTTP reply go application. So in a generic container for the Knative service, we create this service, we call it HelloContainer. This is the container image, the one that's being pushed from the GitHub action. And we also created a domain mapping so that it's easier to talk to it. And we have deployed this service and let's watch the pods. Everything is empty. I'm not sure if you can see. Ah, maybe you can see the memory consumption. And we use this tool, Hei. So Hei is probably you already know about that. We issue 20 concurrent requests and we do that for 15 seconds. So if we do that, we should be able to see containers could be created. So this is the cold, the cold boot time and we should be able to see memory consumption. Now one of the issues here is that, and this is one of the reasons that we tweaked K-Perf is that it didn't spawn 20 containers, it didn't spawn 20 functions because the first function served the first request and then started to serve the other request. So it's not a one-to-one mapping with the graphs that I showed earlier. But you can get the points. I mean, the memory consumption increased a little bit because it's a generic container. If we try to do that using, let's just go there. So this is the generic container. Let's go to this one. Let's go to the CataFirecracker case. So we've got hellofc is the service, the runtime class is catafc, the container image is the same and the domain name is hellofc. Let's just wait for it to terminate, to see the memory consumption as before. I think we're good, 2.8. So if we do that, what happens or what should happen, a bit of latency, what happens is that it spawns micro VMs, the sandbox container, spawns the hellocontainer, the HTTP reply go and you can see that the memory consumption is increasing and it's a lot different than before. So it's 5.4, 5.5, still increasing. And so what we get from this example is that if we want to be isolated, if we want to isolate the user code, we'll have to spend more memory. We have to, let's say, get the overhead of the micro VM of the sandboxing of the user code. And if we wait a bit for it to terminate and we do the same with, okay, should be ready in a couple of seconds. Yeah, we've got 2.8. So if we do that with UNC and Firecracker again, we can see it's not as low as the container, but it's a lot lower than the sandbox container and we get better latency than before. We get the first response latency. We get isolated environment. So it's something that it's worth investigating at least. Let me go back to the presentation. So all of this wouldn't have been possible without the funding that we get to do this kind of research is through EU projects. And to conclude, we can see that connection containers are really great. We like it. We like them, all orchestration platforms depend on containers. However, they present loose security and isolation. They have loose security and isolation issues. So by sandboxing this containers, we get the isolation that we want, but we also get overhead. So if we can use Unicernals for this specific use case for the serverless use case, if we can use Unicernals to reduce the attack surface and to improve spawn times, that might as well be okay. So in order to be able to do that, we have to make Unicernals cloud native. So we think that this is a first step to get cloud native Unicernals. All of our code is open source. So you can see here from the links, you can go and play with, you can see with Bima the tool that we built the OCI image. We also have the workflow for building the Unicraft image and the generic application and push it to our registry. So you can play with that. And we also have a blog post about these numbers and how it details on how we took them, what configuration we use for K native. And that's it. Thanks very much for your attention. I'll be happy to take any questions online or offline. If everything was clear or not, or everything was unclear. Hey, thank you for sharing what you have done here. There's certainly some interesting work and fills the gaps in some that I've noticed trying to mush together Unicernals so Kubernetes and container workflows. Is this generic across any Unicernal? Is there adaption work that needs to be done to make this work? Like what does that workflow look like? I'm sorry, could you repeat? Is it generic? I lost the other part. What work needs to be done to make a Unicernal able to be used here? Is it generic enough that you can just throw a Unicernal that's x86 compatible? Or does adaption work need to be done to make it run using Urun-C? No, no, no, you just have to create the necessary hooks, let's say, so that you can boot the Unicernal with Urun-C. It's essentially, it creates the command line interface that the hypervisor uses to spawn it. So if I understand the question correctly. So for instance, Unicraft, you have a specific command line or specific configuration file for file package, let's say, to boot it. So what we do with Urun-C is that we create the hook to create this config file, let's say, or create the command line for chemo so that it can boot. If, is this what you're asking or? Yeah, thank you. Perfect. Hello, I've got a question for you. One of the massive benefits of Unicernals is spin up time. You showed like how fast they were responding but not how fast they started up, which is one reason I asked that is you can almost get to a request to Unicernals. So you've found a request that comes in a Unicernal. And I'm just curious if you've done any research in that space. Unfortunately, not yet, but we are in the process of doing that. Okay, great. So we wanted to do that. So in this figure, in this sequence diagram, we want to break the time down, but in order to do that, we need to annotate the other container untimes. We have done that for Urun-C, but we need to do the same for Cata, for Givizer, and for Un-C. And we didn't have the time yet to do that. It's work in progress. Hi, thanks for the talk. I'm just testing my understanding here. Am I right in thinking that if the user has a pre-built OCI image, that you can't just run that image using the Unicernal. You have to run Beamer to statically link the user's code first before. So yes, you have to glue. You have to actually, you have to glue the OCI image with a container untime. So if the same stands, you could just exec. So you could be the OCI image with an annotation that says don't do any virtualization. This is not a Unicernal, this is an application. And that's what we did for debugging. And you can tell the container untime that this is an application. It's not a Unicernal. So you just exec the thing. So it's modular enough. And it's naive. It's like a generic container untime. The complicated part of your unsee was all these ping-pongs and back and forth you do to re-exec, to fork, and communicate with the original process and the forked process using the IPC mechanisms, which is a bit of an OMS. But yeah, okay, I think.