 Hi all thanks for being here. I'm Anastasios Nanos along with my colleague Jorgos, we're going to talk about a Unicernel container runtime that we built, we call it UranSea. We are going to share a few details about us, we're going to talk about containers, sandbox containers and Unicernels. We're going to present UranSea, the tool that we've built and it can actually boot the Unicernel in a cloud native way and we're going to talk about the integration we have with Kubernetes and with a serverless computing framework. So we are a really small company, we're doing research, we focus on operating systems, virtualization with the container runtimes, we expose hardware acceleration functionality to workloads running in a sandbox. We are a team of almost ten, we are researchers, engineers and software developers and we have a mixed academic and industry background. We are based in the UK, Greece and in Germany. Now the concept of our talk is around how users deploy their applications in the cloud. So in the figure you can see a VM stack, so we've got a host kernel and the hypervisor all together or split. We've got a VM, we've got a guest kernel, some libraries, runtime, stuff like that. We've got the application and some files to configure the execution of the application. So back in the early days of the cloud, users used to provision a VM, log into the VM, copy their files, their application, their configuration, configure the VM, maybe do some installation and then run their application. Then containers came so the users were able to develop and package their application on their own infrastructure, on their own laptop, create a container image, send it out to a container registry and then provision this container as is in a cloud infrastructure. However, cloud vendors saw that there is an issue there, there is an issue with isolation. So containers do not provide the same degree of isolation as VMs. So what they did is that they sandboxed this container inside the VM. So we've got the same stack as before with the deployment ease of use that the containers offer but we've got a really complicated and bloated stack. Containers are great, they are lightweight, they offer fast boot times, they can run anywhere that there is a container engine, they are scalable, you can run many containers on a single hardware node and they offer some kind of isolation at least in terms of software dependencies. You can run a container with Ubuntu 2004 and the container with Debian or something else on the same node. That's the reason they have dominated the whole cloud deployment stuff. It's the fact of application packaging and deployment in cloud and edge. Containers feature a mature ecosystem with many tools, you can do whatever you want. So they are really, really great. However, they have a major drawback. The thing that we mentioned earlier, they do not provide strong isolation. So they share the same kernel, they rely on software components for the isolation and if you take a look at the CV lists with the container tag, you can check for yourself that there are a lot of bugs that allow privileged escalation. So one container can easily escape its sandbox. So what cloud vendors started to do is to deploy tenant containers in an isolated sandbox, either software or hardware assisted. So we've got software solutions like Seccomp or Aparmor or even GVisor and there are VMs. We've got micro VMs like Firecracker or Cloud Hypervisor or other stuff based on RastVMM. We've got Kemu, like the traditional VMs, let's say. So we're back again to VMs. In terms of deployment, cloud vendors combine both worlds, containers and VMs. They keep the benefits of containers so it's easy to deploy an application in a cloud environment and they have the same isolation as before so they can host multiple containers on the same node. However, there are side effects like there's higher overhead, so we've got like we have to provision CPU and memory beforehand for a container to run in a VM in a hardware node and there's a complex system stack. So we've got to put more effort in developing the container runtime that will handle this sandboxed container. Let's take a step back though because users what they want is to have their application running somewhere. So let's look at what the application needs. Let's assume this is the current state of an application running in a cloud vendor. We've got a sandboxed container, we've got the container runtime, the container application, some runtime dependencies, some libraries and the whole system stack which is the guest kernel called VM. We've got a hypervisor, we've got the host kernel and the hardware underneath. We try to visualize that the application does not need the entire stack. It needs some parts. We have identified that as the yellow boxes. So the application needs some parts of the runtime, some parts of the libraries, some parts of the guest kernel to interact with the hypervisor. And it would end up something like that. So we've got a stripped down version of the same thing but only with the things that the application needs to run. It needs the parts of the runtime that interact with the application, the parts of the libraries and some parts of the operating system. We have, we call that LibOS, not us, the community. And essentially this kind of stripping down exists already and they call that unikernels. So unikernel is a specialized single address space kernel that is built using a library operating system. So in other words it's tailored for one application. There is no separation between kernel and user space and it contains exactly what is needed for the application to run. The actual application binary, the configuration and all the glue code that it needs to interact with the hardware or with the virtual hardware through the hypervisor. That's almost the same. When comparing unikernels to containers, they are more lightweight. They are provide even faster spawn times. They can run wherever the hypervisor is compatible. So they can run almost anywhere as containers can. They are scalable because of their small footprint and they are truly isolated because they are based on hardware assisted extension. So it's, essentially it's a VM. However, there are challenges that are not on deploying unikernels in a cloud native way. And these challenges are mainly around the packaging. And this means that unikernels, essentially unikernels are not containers. So the whole tooling around containers, which is really great, it's not there. And unikernels are not VMs, are not typical VMs because they only have one application. So it's a single application thing. It's a specialized VM. So all the tooling that exists for containers and VMs can be reused, but cannot be reused as is. You have to tailor the software stack, the system software stack for being able to deploy unikernels as containers or as VMs. So we have identified two main issues. The first one is the packaging. So it seems that if we build a unikernel like an OCI image, because OCI is a well-defined and widely used format, maybe we can bridge the gap between containers and unikernels. And if we do that with the image, then we can maybe do something about the execution as a normal, as a container. So container runtimes, they don't know how to handle unikernels, they know how to handle containers, they know how to handle VMs, maybe the sandboxed container runtimes, but not unikernels. So I'm going to give the floor to Jorgos to talk about what we've built. Hello, everyone. So we came up with Uransy, which is a container runtime built specifically for unikernels. It's CRI compatible, it's written in Go. The way it handles unikernel VMs is like processes, so it directly manages the application. It uses OCI images for the unikernels, and it makes use of underlying hypervisors to spawn unikernel VMs. So it's easy to add more hypervisors, making it extensible. So let's see how we imagine the OCI image of unikernels. So the image includes the unikernel binary, any configuration files required or HTML files or whatever the application needs, and a uransy.json file containing any uransy specific metadata. The produced images can be managed and distributed using any standard tooling we use for container images, like Scopeo, Dive, et cetera, and can be distributed using container image registries. So to take a closer look, we have built a specialized image builder tool, we call it BIMA. Essentially it just copies the binary and some extra files and provides with some specific labels for uransy to run. So this is how a sampling location looks and the sample container file. So now let's take a closer look at the execution flow of the uransy. So like in typical containers, container D first pulls the image from a container registry, unpacks the image, and then creates the prepared storage backend for the container to run. So in our case, after that, it invokes uransy, passes the storage backend and the bundle, and uransy then spawns the unikernel. Now, if we want to take a closer, even closer look, we can see how this whole thing works. So container D sim first invokes uransy create, uransy create then forks itself, essentially creating a new process, a reexec process, which is spawned in a new network namespace, and this process, once it starts, it notifies the parent process that it had to booted. Then the parent process saves the state and executes any great runtime hooks, and then sends an okay message, a knock message to the reexec process, executes any create container hooks, and takes it gracefully. So container D sim gets notified. So then it invokes uransy start. What this essentially does is just notifies the reexec process, it's okay, we can start, and executes any post start hooks. So this is I think the most interesting part. Now the reexec process sets up the networking and the storage components, executes any start container hooks, and essentially replace itself with the Unicernel VMM process. So for the networking, we have decided we can use the veth endpoint that CNI plugin provides us, and in order to provide networking for our Unicernel, we create a new tab device inside the container network namespace, and using traffic control redirection, we map all incoming traffic to the map interface, and all outgoing traffic to the veth endpoint. So we can then pass the tab device to the VMM. The current state of storage handling is that we first extract the Unicernel binary from the container image, the rootfes, and then we attach the storage backend to the Unicernel. So as you can see, these are the layers, the rootfes layers, and we have the Unicernel binary, then some configuration files, the HTML files, et cetera, our uransi.json file, of course, and all the configuration files are inside the dev map or block device. So now we have seen how uransi runs, so what can we do with it? We can deploy a solo5 ramp on nginx, for example, Unicernel using standard container tooling. So to do this, we first need to create our container file. As you can see, we just copy the Unicernel binary, the configuration files, and the HTML files, and provide the required labels. We can see now the configuration is a simple nginx configuration, and the HTML is also a really simple HTML file. Nothing special here. So now using BIMA, we define the tag, we define which file we want to use, which Docker file we want to use, container file, and the build context. So now we just build the image, and we can now push it to a registry, just like any other normal container image. Okay. So now we can run our Unicernel using just standard NerdStl. We have to define the container runtime we want to use, so we will define that, you can see. And we will also need to define the snapshotter. In our case, we use DevMover. And of course, the image we want to run. So as you can see, boots. Sorry, I think I pressed it. Yeah. Sorry. I read it forward. So it was normally, so nginx is now running. And we can see that if we inspect the processes, we can find the solidify process running. We can also curl our nginx, and hopefully, yeah, it responds. So it's that simple to take Unicernel binary, build it into an OCI image, and run it using NerdStl. However, we think there are also some other use cases, more complex perhaps, that are really, that they would really benefit from Unicernel. For example, serverless functions, soft grass service, and of course, edge deployments where our devices are usually resource constrained. So let's see how we can integrate that, you don't see with Kubernetes. Now, the main challenge we had is that in order to deploy Kubernetes pods, you need to handle non- Unicernel containers. For example, the post-container of the pod or any other sidecar container. Now, to achieve this, we use RunC. So any generic container gets spawned using RunC. And then the user defined container, which is Unicernel, in our case, is spawned using RunC inside the same pod. So let's see an example of how this looks like. So here we will build the simple engine X Unicernel using Unicraft. As you can see, it's really fast. And once the Unicernel is built, we will just copy the binary, some template files, and we can just create now our container file. As before, we copy the binary, we label any necessary RunC metadata, we push the image. Okay, we can run it with NerdCTL to check everything runs okay. It boots everything is okay. So now let's see how Kubernetes deployment YAML looks like. So it's pretty standard as well. We can just apply it, and as you can see, it just booted. We can also curl it, get a response. So this is how easily it is to use Unicernels and RunC to deploy Unicernels in Kubernetes. So Tassus, if you would like to join me, and so something even more elaborate. Thank you. So, yeah, we saw that we have you don't see we can deploy stuff on Kubernetes, Unicernels, Unicraft, Rampran, Solify, whatever. And the ultimate goal that we had in mind is to optimize the serverless computing approach. So we thought let's take a popular framework, a popular serverless framework, K-native, which is built in Kubernetes, it's open source, it's platform agnostic, and in K-native, the user code is essentially a container. There's no sandboxing or isolation or stuff like that. The way that K-native works is kind of shown in this figure. It's the architecture. So we've got clients on the left side that talk to an English controller. This is outside the K-native stack. The English controller talks to an activator. This is part of the K-native stack. The activator has a queue. It talks to the autoscaler. The autoscaler sees there's no deployment available for this kind of function. Let's spawn a pod. So it talks to it. It creates a deployment file and spawns a K-native pod. Now, in this pod there's the queue proxy container, which is the essentially it handles metrics and it manages the incoming requests and the responses, of course. And this queue proxy container talks to a user container. This is the container that runs the actual user code. Now, this is for the first invocation. The other invocations after that just go through English controller. There's a map to the queue proxy container and then directly to the user function. The queue proxy container also pushes metrics to the autoscaler to scale up and down the pods, the relevant K-native function pods. So if a user, if a cloud vendor, sorry, if a cloud vendor wanted to sandbox the user code running on their infrastructure, they could use something like generic container sandboxing mechanisms like the Cata containers, like, I don't know, device or maybe something more elaborate where they would sandbox the user container and the queue proxy container, the whole K-native function pod in a micro VM. That would offer a strong isolation because the user code is sandboxed inside something that has hardware assisted isolation. It would offer fair scalability because you have to provision for the resources of the VM. Let me remind you that a sandbox container in a micro VM has to boot Linux. So there's a whole new stack, the kernel, a root of FES, libcontainer d again and stuff like that. And that ends up to having a increased CPU and memory footprint. So what we think is that you cannot fit many sandbox containers like these on a single node. So we thought let's use Uran-C to spawn K-native functions for the user code. So we've got the benefits of the sandboxing approach. So we've got strong isolation and we've got the same or even better scalability than containers and generic containers because Unicernals are smaller, they have smaller memory footprint, they have smaller images so you might get even faster boot times. And to show that this thing works, we have prepared a short demo. Let's see. So we build a simple K-native function. It's a Unicraft HTTP reply Unicernal. We have not put any speed in this approach because it's even faster than Nginx. So it just builds LWIP and the simple main.c file which is the HTTP reply. We build our image folder, we build the container file, we copy the Unicernal binary, no need for in 3D. We build the image using BIMMA, we push the image to our registry, we run it to make sure that this thing boots, it boots of course, and then we create the descriptor for the service function. We internally we've got an ingress controller to handle FTDNs and HTTPS certificates which is not shown here but it's easy to show you if you want. So the descriptor here for the K-native function, we define the name, we define some scaling parameters, we define the affinity which is irrelevant to this case, we define the runtime class which is UronC and we also share the image that we built earlier. And we apply that, so this is registered as a K-native function and if we do a curl to this address, we get the HTTP reply. So it just copies the headers that it took. That's the simple function and if we inspect the pods, we can see that the pod booted and it terminates because we have set a low scale down timeout. That's all from us, so we think containers are great but they lack isolation in order to avoid the multi-tenancy issues due to isolation, cloud vendors revert back to VMs for isolation, to micro-VMs, so that leads to a bloated and redundant system stack and we think that unikernels could be a viable alternative for many use cases, serverless computing, edge computing, stuff like that. We have shown UronC which we consider the missing component for bridging the gap of unikernels and containers, bringing containers closer to the cloud-native world. We have the code online, so you can scan the QR code, you can go on github and check it out. It's because you're on C, because Bima for the helper tool. We would like to mention that this work is part of two EU funded projects, Serana and 5G Complete and we are more than happy to take any questions. Let's see if you like it or not. For the... For the Unicraft Execution, yes. For the Rampran Execution, we use Solo 5. Solo 5, yes. Do you use KVM? Solo 5 has two modes, so they have a naming issue, Solo 5. So they have named the bindings, the unikernel bindings and the tenders and the hypervisor, Solo 5. So they've got a software mode in their paper unikernel as process, I think. So there's Solo 5 SPT, which is the second enabled isolation and there's Solo 5 HVT, which is KVM. Essentially, so they do vis-a-vis you create, the memory is set and it's on. Yes, yes. So it's... Yeah, so the... the Riexec stuff is pretty standard Run-C stuff. So it's... So if you take a look at the annotations, so we've got... I cannot see the dot, but anyway, at the last line you see hypervisor, chemo, unikernel type Rampran. For the Unicraft example, the NGNX, hypervisor is chemo, unikernel type Unicraft. For the Solo 5 case, the first NGNX example, hypervisor was HVT, unikernel type Rampran. So Yorgo, do you want to take this? I think it's... it would be straight... pretty straightforward. You just have to replace the... create a new firecracker... Yeah, if you just have program time options, but for frackrack, you have to create a JSON file, networking is done differently there as well, you create a tab device for it and it's a bit different, so I'm guessing you need some sort of way to specify those ideas. You are opening a can of worms, you are... Regarding the first thing, I just wanted to share that it's Go, so it's Go packages, so you inherit the hypervisor class or whatever this thing is called Go, and you implement your own. So to implement a different hypervisor package, just like that. So for the networking and the tab stuff, this is really, really tricky. So what people do usually, I don't know what you guys do, but what people do in the catacomptainers world, they have a tab device, they have an Ethernet pair from the end point and they bridge them. They don't use bridge because it has an overhead, they use DC direct, which is fine. Now if you put another unicamier there or another sandbox container there, another VM, you have an issue because how are you going to handle traffic? There are no IPs, you have to do smart stuff with IP tables. But in the catacomptainers there's no issue because the pod has all the containers inside, so it's one sandbox and many containers. One pod and many containers. In our case it's a bit different. In the unicamier stuff, there's no app, sorry, roadblock. In the container world you have a network name space and you have localhost. So all the containers talk to localhost and that's fine. In the unicamier world localhost is a whole different story. So if you put a second unicamier you have to do really, really tricky stuff for the network. So it's complicated, we're trying to figure out how this should work. We have some initial implementation but we think that we should work a bit more on that, maybe together. We have some numbers but we wanted to validate them before we show them. Numbers look good, it's a first prototype, so it's fully optimized. I would dare to say that it's almost as good as Ramsey. There are cases where it is better but some people from the team thought we should validate the results before we show them so we won't show them. So what's your goal? Your goal or your personal and your party number? Do you want to, like, tell me a few seconds? So we've got two goals. The first one is be spawn and host as much containers as possible on the same node. So to show that the uniqueness have a really, really small footprint that's one thing. The second thing is to show the call good time which is kind of tricky how to measure that. It's really tricky what people call call good time. And the third one is show how fast the uniqueness can handle IO so how fast it can handle network. We don't have a specific number in mind because there are a lot of numbers going around and I'm not entirely sure we're comparing how this will happen. So there are milliseconds, there are some milliseconds there are a lot of stuff but we will definitely go for the call good for something like using Firecracker with Unicraft because we haven't been there yet. We definitely want to explore the SPT, the sitcom stuff the divisor stuff we have no experience but we tried it out with K-Native it seemed like great but it had a bit of overlap compared to generic containers so it was maybe we need to spend some time there some more time to understand a bit more. So there's a lot of open stuff to check. Is it cost any in an elite program since the kernel is not as kind of competitive? Yeah, so that's another issue, that's another fork that we're working on. So in order to back at a user function into a Unicernel you have to have some kind of build system. So you cannot run whatever you want there's some work from Unicraft where you can take a container and have Linux compatibility later and you can run a generic container as a Unicraft Unicernel we need to try that first of course the default way the default way, okay that's nice so we can take that as is and spawn a K-Native function from a Docker container one, I mean this is something that you can do with Unicraft but I'm wondering what would be your requirements from the Greater Unicode community with respect to providing what you need to make this happen of course there's going to be applications that are part of Unicron we are going to work on expanding the platform supports of Hyper-VVM where Hyper would be out what would be some requirements from your site designers and providers and tying this up to the Greater Unicode system so if you do the great what would be that this apart from more applications running and more platform being supported I think that one of the major challenges is the unification of the storage and network handling so this is like a real pain I mean here we do some hacks to make that work with Unicraft and Kemu because there is a bit of an effect that's fine with Firecracker we cannot do that we need DevMatter with Solify same thing storage and persistence and networking storage is one thing which is messy and the other thing is the network like I mean the interfaces and the DC filters and what happens when you have another one that's a bit tricky so we have used Unicraft in our presentation we have used Unicraft we have used Rampram but Rampram was running over Solify so essentially we are going to use Solify at the moment that's the two stuff that we support can you grant me a multi-process for the containers that are more and more yeah so multi-process applications inside the same Unicurner I'm not sure these are the experts we are working towards that we are working towards that we don't have to regulate that yeah we can talk about that we just got some GS and PLS and I can just mess it up but in terms of multiple containers for both we can do that so we can have a generic container the couple of container containers and they all can share the same let's say this one yeah I know the tasks that you learn it's different there is also a problem for example, post-breed it's notoriously anti-intuitive so post-breed we have to actually import that this point on top of Unicraft because it is multi-processed there is some work that is very important but it's multi-processed if you look for the engines even for Rampram it's their master process it's the same engine it's very possible