 Good afternoon, everyone. Let's begin the topic. Our topic is Cata containers. Some of you may notice that in the morning's keynote, I might have introduced Cata. And we are part of the Cata teams. And I'm Xu Wang from Hyper. And this is Samuel from Intel. Hi. Yeah. And so I might give the background of Cata and we will give some detailed introduction of it. So this topic will have three parts. First, we will give an overview of this project. And then we go into some technical details. And last, we will introduce something about the project itself and how you'll contribute to this project. And Samuel is my co-speaker. And firstly, I will give some brief introduction and the history of the project. And then Samuel will give some vision and the architecture of the current Cata project. And last, I will give some process details of Cata itself. And then I will give a demo of how the Cata project or the virtualized container work together with Kubernetes and give a simple demo. And at the last, we will give some contribution. For the history, the Cata containers come from the two projects from Hyper and Intel. We both announced the project in May 2015. And the Hyper.sh launch, we launched HyperDima. And part of it is RunV. That's similar to RunC. And RunC is run container. And the RunV, you run the container inside of VM. And the Intel clear container, I think, are the same week. Yeah, that's real pair, like twin brothers. Doesn't look like it, but yeah. And in the past two years, we have many interactions between the two projects because that's quite similar. And all the audience may know the two projects. We'll ask a question. What's the difference between you? What's the strength and the weak point with each other? And we want to eliminate the difference and make by some means, exchangeable so people can use the virtualized container technology instead of to choice or think which one is better. So we began the merge. I think it's pretty early, last year, maybe. Last year, we have proposed to do something we call more pluggable. At first, it's the NVM part, the guest agent. And then we share the same protocol between the VM and the host. And we also pull the OCI back. And in this September, we have met in Portland. And we finally decide to accelerate the merge of the two projects. In the past, I think, three months, we're working together and try to push the project forward. And yesterday morning, 7 AM, I think, we announced the merge project. And we two project come together. And it's now. Now lights shift the mic to Samuel. So what do we want to do with the Cata containers? There are some technical aspects of the project. And there are also some non-technical ones. On the technical side, really, and this is why we merge because we have the same vision. And we are going for the same thing. What we want to do is running lights and fast VM-based containers. And by VM-based containers, each and every one of the container is going to run in a full virtual machine. So the end goal is really to merge the two technologies, Clear Containers and Hyper-V, together in the same code base under the same repo. We want to seamlessly integrate with Kubernetes. So today, we do that on Clear Containers. And we also do that on Run-V. But the end goal, obviously, should be the same for Kubernetes. It's going to be multi-architecture compliant. So it's not only about X86. Yeah, trust me. So today, we support X86. Run-V supports more architecture. And the Cata container and implementation will support more than X86. And we will also support more than KBM. We won't support KBM, Zen, and well, that's pretty much it. So yeah, that's the technical vision. If you look at the two containers, the two runtime, sorry, they have a lot of features in common. But they also have quite a few features that are specific to one or the other. And really, the objective of this is to make all those features common and merge into the same code base. So things like multi-architecture and multi-uppervisor is something that Run-V is very good at. Things like direct device assignment, SRLV, multi-OS is something that Clear Containers supports today. So I'm not going to go through all those features, but the idea is really to have this joint set of features merged together into one specific implementation, which is Cata containers. We also have non-technical goals. It's not only about merging code and working together. It's also about being a vendor-neutral project. So Clear Containers obviously was Intel-tainted. And we want to go further than this. We want to be open. We want to be vendor-neutral. And we want to be in the neutral umbrella. And that's the OpenStack Foundation. So we currently manage at the OpenStack Foundation something really important to highlight here is that this is not an OpenStack software project. So it has no dependency on the OpenStack software component. So we're under the OpenStack Foundation, but we're not depending on the OpenStack code itself. So that's important to highlight. And really, the final goal is to have everyone that's interested in using VM-based containers working under the same umbrella, which is Cata containers. OK, so let's go into slightly more technical talks. If you look at containers today and how they're run in the cloud, this is a very high-level diagram. The idea is to show that your container is running typically under the same virtual node. So you spin a virtual machine, and you're going to run all your containers inside this virtual machine. One really important thing to notice here is that all those containers are sharing the same kernel. So they're running under the same kernel. And really, the isolation between all the containers is a software construct. So your containers are not reaching other containers because the kernel prevents you from doing so. So it's a soft isolation. It's not the hard isolation. What we propose with Cata containers or iProvisor-based containers is to run each container inside its own virtual machine. So one big difference here is that each container runs on top of its own kernel. So you don't share the kernels across containers anymore. And you have a hardware isolation between all containers and no longer a software isolation. So it's all about security. And one thing that I want to say that doesn't show in this diagram is that in a Kubernetes context, since this is a Kubernetes conference, it's not the isolation unit is not the container. It's the pod. So when you run a Kubernetes pod, your pod is going to run into its own virtual machine. And each container inside its pod is going to run inside this virtual machine. So there's not going to be one virtual machine per container but one per pod. If you run just a Docker container, it's going to be one virtual machine per container. But in the Kubernetes context, it's really one virtual machine per pod. And so each container or pod is going to be iProvisor isolated. And at that point, you have the same isolation level that you have with virtual machine. Again, it's no longer a software isolation. It's a hardware one. I mean, when we talk about virtual machine, people think about starting their good old legacy virtual machine in five minutes. This mega images that takes gigabytes of memory and so on. We don't want that with the Kata containers, obviously. We want them to boot really fast and we want them to be very small. So we're going to talk about this a bit later, but that's really the goal here. Make it extremely fast to boot and make it as small as possible. And finally, we want Kata containers to be seamlessly integrated with the rest of the container ecosystem. With Kubernetes, with Docker, with OpenStack, whatever your software stack is, we want them to be integrated transparently. So you won't have to change anything on your workflow. And really the goal here, as Imaro was saying in his keynote, today you basically have to choose between speed or isolation when you have to select between a container or a virtual machine or running a container inside virtual machine. And what Kata is trying to address is to be able to run at both speed and isolation. So we are filling this gap here where you don't have to choose anymore. You can run Kata containers and you get the typical legacy hardware isolation and you get the same speed as you used to with regular containers or native containers, whatever you want to call it. Some more technical details. So those are all the components that make Kata containers today. So we have the runtime. As I said, it's a runtime. But we have a few other components. But before going into this a bit deeper, I really want to highlight something that Kata containers integrate at the, what you guys are familiar with, at the run C level. So to use Kata containers, you're not going to have to replace Docker or you're not going to have to modify or modify Kubernetes. You just have to specify another runtime, something different than run C. Another point that I want to make clear is that it's not replacing run C, but it's leaving alongside run C. So you can have run C and Kata containers living together. And you can run VM-based container alongside with namespace containers. And they will just work together. So we're not trying to replace anything. We're not trying to replace Docker or we're not trying to modify Kubernetes to use Kata containers. It's really important for us to integrate seamlessly with all the container ecosystem. So here, so you have all the higher level Docker communities and OpenSack. And on one side, they send or receive IEA, so STD in, STDR, STD out. And this is handled by Shim. So all those components are actually talking to a Shim because they expect to talk to a process and not a virtual machine. So we have a Shim sitting between your software components, Kubernetes or Docker, and the actual virtual machine that holds the container. And the runtime is the one handling all the OCI command and all the OCI specification. So one really important thing is that, yes, the runtime is OCI compliant. So we talk OCI and we get OCI commands and specification from the high level of the stacks. Then we have a proxy here in this picture because currently we're talking to the virtual machine through a serial interface. So the red arrow here that you see is a serial link. The proxy, the Shim and the runtime talk GRPC. And the proxy basically multiplexes and demultiplexes everything through a serial interface over a component that's called YAMUX. Then it goes into the hypervisor, talk to the kernel, and we have an agent running inside the virtual machine that's actually responsible for spinning all the containers and all the pods inside the virtual machine. So I think as the agent has a streamed down, really, minimal version of RunC. So we have a very, very small RunC running inside the virtual machine, and it gets OCI commands and specification and spins your containers inside the virtual machine. Another version of this architecture, a simplified one, is based on Vsoc, which is basically a virtual machine providing a socket semantics through the host. And then we don't need a proxy. So we have the Shim and the runtime talking directly to the hypervisor without the need for a proxy. Inside the virtual machine, supporting Vsoc is easy to do. But outside, Vsoc is a fairly recent addition to the kernel. So we can't just go full Vsoc because we can't expect all the host and all the distro to actually support Vsoc today. But for those who do, you can just keep the proxy component and have a simplified architecture. So you want to continue? Yeah. Sam will give you the whole picture and the architecture of the Cata Container Project component. And now I will give some detail how it could be run as fast as a container. And you know from the Kubernetes view, when you launch a pod, it's part of preparing the images, the root FS for the container. And the other side is to prepare the sandbox for the pod. And for the traditional existing container, they will do this serially. And because the Linux container is quite fast, and the main consumption is the root FS preparation. And for the VM part, let's accelerate it by parallel to the Boso part. Let's inhale and do some job on the lightweight VM and make it could be run in about maybe 100, 200 milliseconds. Yep. And that still consumes some time. And at the same time, we prepare the root FS. That's the upper part is about the sandbox preparation and the down part is about the bottom part about the volumes and the root FS preparation. Then we use the whole plug to make it together and launch the containers inside the pod. Yeah. So we all plug a lot of things into the virtual machine. We start really tiny virtual machine. And as we know more about the pod that needs to be created, stuff is all plugged into the virtual machine. Yes. Not only about the root FS and the volumes, but also about networking and even about the memory and the CPU hot plug. So you could prepare the VM before you pass all the specifications because you could hot plug insert the CPU later. And this is how we make it faster. And make it smaller. We use the minimal root FS and the kernel. And also, the hybrid team introduced VM template technology that's like to VM fork and make all the VM share the same part of the shared binary part. That's really the only part. So you don't have to pay the tax for each VM's text part. And Intel introduced a VM based DAX technology that's just in place running technology. You could just do the memory map to the memory device to the memory. And you don't need to allocate to the real memory. So that could save your memory. And also, we could use some KSM technology to do more aggressive memory saving here. And by use of these technologies, our partner, Huawei, have tested about the secure container technology. It could reach about 10 plus density than the traditional VMs. There's still some consumption compared to the Linux container. But it's very high density than the traditional VMs. I just want to highlight something here. Something we've noticed is that when you run really, really big workloads with the KSM enabled, you actually get better density than with the typical native containers, because we actually manage to deduplicate a lot of memory pages across virtual machines. So it's not a binary overhead. So sometimes the overhead is actually negative. Yeah. If you have many CPU cores, you can align the dedupt drop to the specified cores. And that could save the memory aggressively. And about the networking part, that's maybe one of the most significant difference between the current containers and VMs. So we have some different matter here. And firstly, the clear containers team have the microwave tap to bridge the VETH pair to the tap device. And also, we do some TC rules to do the traffic transfer. Actually, you can do other. For the runway, we also have a specified interface. If you're a CNS group, no. There is a VM-based container. You can directly plug the tap device to the VM. CNI usually don't care about VMs, so that's our problem. Yeah, but for 10-B, they will. And the storage part, we support the volumes, either from the block device or from the 9-Pi files to do the file system sharing. So one thing to highlight here is that with Virtio Block, when you have a block device storage on the host, the performance is actually much better than with 9-PFS. And it's pretty close to bare metal or native performance. So it's a much better setup than not using block device storage on the host. Yeah, yes. Actually, for block device, even for stuff like networking block device, we will have some performance comparable to the host. And for the 9-PFS, there is some PD part. It's slower than the native file systems, and they have some bugs on the positive semantics. But we have fixed some of them. Some of them, yeah. Yeah, yes. And for the usage of the Kata containers, the Kata containers introduced some change to the existing host Kubernetes work scheme. And the left side is currently, we set up Kubernetes cluster for different users. And they use different Kubernetes masters for their VMs. And the right part is, what Kata do, also, what do we try to do? We use the Kata containers, use the VM, to isolate the party itself. And use the networking part to give the multi-tenant network to the users. This is just like the AWS Fargate. Make the user use the Kubernetes with the Kubernetes APIs and have the native Kubernetes API. But they don't need to manage the Kubernetes masters. And they have their own users. They have multi-tenant Kubernetes. So this provides you with a CPU and memory multi-tenancy. But the networking part, if you use it just your regular CNI plug-in, it's going to be still a soft multi-tenancy implementation. So you need to have what Xu is going to talk about. And here, I prepared a demo for this. There's another project called StackCoopy. It used virtualized container technology to provide multi-tenant Kubernetes destroyer. And as Samuel mentioned before, this Kata itself doesn't depend on any Kubernetes or any OpenStack components. But the existing OpenStack, the whole stack, give us some convenient and vendor-neutral services, such as the neutral and the sender. And they work in an OpenStack, so they provide the vendor-neutral interface. We could use this together with the Kata containers and the Kubernetes to provide multi-tenant usage. Here is a simple work scheme from the control. You can reach the Kubernetes master API server. And then you can allocate your tenants from customized results. That's based on the OpenStack keystone. And you can allocate network through the neutral and provide the layer-to-isolated network for each tenant. And we use the Kubernetes called Fracti. That's a CRI server to call the Kata containers or other runtimes to create containers. And let's do the demo itself. Where's my terminal? Ah, it's here. Small? Yes, sure. And so let's do the demo just on the Google Cloud. And that's quite simple. Let's see. We have two YAML because it's still small. That's great. It's much bigger than mine. Can I see it? OK. So that's very simple. We will create a tenant called Test1. And we just created it. And I prepared a script here. So I can just paste it. That's created. So you could find, as a last line, we have the Test1. It's actually six seconds ago. And we can create another one, the second one is created. And after we create the tenants, let's see about the network a bit longer. You can find there is a network for the namespace Test1. And you can see the address is 10.244.0.0. And for another user, the same network address area because we have an independent layer for network. So they can have overlap IP address range. And let's see what can we do. We could just create and import that to run a single container for the Ubuntu. And let's do something really bad. Yeah, I think you have seen this line in Stack Overflow to do a fork bomb. And this line is quite beautiful. So there's no letters at all. You can rename the colon to any word, but here it's. Let's do something like a geek. Oh yeah, that looks normal. But now you can't do anything. That's because it used the pipe to do the fork and fork and fork again. Now, mind the process here. And it will kill the party itself. But because it's running in a single VM, outside the VM, that's everything is OK. And you can do anything, for example, to kill the VM itself. So the VM technology, it means even some VM or some party is hacked by the others. That don't affect the host or affect the other parts. So you can safely just kill it, kill it outside. And when they delete it, it will be reclaimed. And for the next, let's run some part. Uh-huh, it's still here. Wait it to be garbage-collected. It will create test queue. Let's see the parts. It will create a service and create another one. And then we could see them. Yes, sure. And so for the test one, we can see we have a part for the engines. And we also could run the part to check. We could run a busybox. But when we run a busybox, we will try to reach the different services. And because we have a multi-tendent network, it can only reach the, you could use double guide. It can reach itself, but it cannot reach the others. That's bad name. And so it's just a multi-tendent network. You can see what I do. My operator, that's identical with the standard Kubernetes. That's what we will say. That's a vanilla Kubernetes with some plugins. And they can run on top of cata containers and without any modification for the Kubernetes itself. So any Kubernetes user should feel comfortable about that. And because the time is limited, so I will finish it soon. Yeah, that's the standard image. We just put it from the registry. You don't need to modify the image. It's just standard Docker. You don't need to do any modification for the images. And for the next day, in the first half of next year, we will have the 1.0 release and all the CR integration support and some OS vendor support. Can we take some questions? If you want to get involved, there's a GitHub repo. That's for the informator. Yeah, can you put the information, then we take some questions? I think we have a few minutes left. Questions on the audience? OK. This guy was first. So the time is up, so we will only take maybe one or two questions. Yeah, just a couple of questions. We have a booth, you can come and ask some more questions here. So when you mentioned the 9P versus Verdeo block into the VM, I was wondering how our image pool is done and is your local image cache on the hypervisor? Is that configurable? Or is the choice between 9P or Verdeo block based on that? No, so really it's about the storage overlay that you use on the host. And everything is cached on the host. So we don't cache the images on the VM itself. We just export it from the host into the VM. Yeah, but I'm wondering how you pull those. So like normally you do Docker pull or? Yeah, you still do that. So is one of your components actually doing that pull? No, the Docker pull is done from the host. No, the CRID months will pull the images for you and prepare it. And it depends on your configuration. It could storage as a block device or as a file system. OK, so there's still a dependency on like containerD or DockerD or something, right? So if you integrate with cryo, for example, there's no dependency. If you integrate with continuity, CRID, yes. Yeah, for the continuity, it will depend on continuity. For the CRIO, it's really CRIO and for Fracti, for Fracti, that's all the different CRID months. Another question. The last one. I'm actually wondering about, do you give us the capability to manage the so-called lightweighted VMs, the kernel and everything? I know that you know, drumV and the HyperStart will get the kernel and NRD from Barlib, something. We have a use case where you have multiple kernels required for different use cases, different ports. So that's a long story. But ideally, we would like OCI to provide us with the kernel and guestOS images. And this is something we're trying to push into OCI. But both runV and CRIO containers and Cata containers will be able to support multiple kernels per pod. So you can run all of your pods with different kernels that you have provisioned in the node. Yeah, because the team come from both the runV and the CRIO container. What do you got in runV? You can get it in Cata containers and we could do it either. Thank you. Thank you all. Thank you for your time. We have a booth. Do you want to have more questions? I have a booth. Yeah.