 Okay, next up we've got Sherry who's going to be talking to us about Kubernetes demonstrating the container of runtimes. Hello, are you hearing me correctly? Yes, okay. So my name is Thierry Carras, I work for the OpenStack Foundation, and today I want to talk to you about what's happening below Kubernetes in the wonderful world of container runtimes. But first you may wonder why I'm here because I work for the OpenStack Foundation and obviously OpenStack has this focus on virtual machines around the OpenStack project. And so like containers, replacing VMs and stuff, where am I here? Well first OpenStack is more than just virtual machines, it's a collection of projects so some of them are very container oriented. I like to have a Zoom project that lets you run containers or we have the Magnum project that gives you ready-to-use Kubernetes clusters. But second, the OpenStack Foundation is more than just OpenStack, we're actually a foundation that generally supports openly developing open infrastructure solutions, which is open source solutions for providing infrastructure. And so we support OpenStack obviously, but also projects like AirShip to declaratively provision infrastructure, Zoo for a cloud-native continuous integration system, StarlingX, which provides infrastructure for Edge and IoT, and more to the point of these Stokeata containers, which is a secure container runtime. And so as part of my involvement with that project, I tried to make sense of what was happening in that area. More specifically, the talks you will have in this room and in other dev rooms at FOSDEM around Kubernetes are mostly focused on what's happening on top of Kubernetes, like how you use it, what are the APIs, how you customize it, how you extend it with operators to allow you to deploy complex applications. And this talk is more about what's happening below it, like all the open infrastructure PCs that you need to run below it. More specifically, the space between Kubernetes and the Linux kernel. And what I discovered there, by doing my investigation, is that it's a pretty complex mess of technologies and overlapping projects and products. Some of them overlapping, some others complementary. And I had like plenty of questions like, do container D and cryo overlap in some way? Or is Cata containers competing with Firecracker? Or do CRI and OCI have anything in common? Or how many different meetings can container runtime have? The answer is a lot. And I was trying to make sense of it. I started to draw this diagram and as it was like doing it for my personal usage, people told me that it was actually useful to them. So this talk is the story behind the creation of the diagram and how I've dated it over time. I left enough time in the end so that we can have multiple questions, hopefully. And so you can all tell me how my diagram is wrong or what I forgot in there or obviously the technology that is not there that should be there. There should be plenty of time for blaming me at the end. So five years ago, when we started this Kubernetes thing, the world used to be very simple. We had Kubernetes at the top and it was calling Docker to create containers. And Docker did its magic with namespaces and C groups and things were perfect. But then we started that was probably a bit too simple. And it probably gave a bit too much importance to Docker. This giant green box was not to everyone's taste. So we started to add interfaces. The first interfaces we added was OCI. OCI stands for Open Containers Initiative. It was created early 2015, so really at the proto Kubernetes age. And it was really to standardize the wild waste of containers. Everyone was doing containers in a slightly different way and it was very confusing. So the OCI defined two specs. One is the runtime spec that defines the primitives that you can use to start, stop, pose, destroy containers. And the other is the image spec, which is defining how a container bundle should look like in terms of its binary form to be able to be processed by an OCI runtime. And so we took this and we said let's split it into and have the container running functions in the run C part and have the Docker CLI and daemon doing all the processing of the requests. So that was pretty cool. Then we started a new another interface, the container runtime interface CRI, was added late 2016. And here the idea was to have primitives to manage pod life cycles. So create a pod sandbox, destroy a pod sandbox, add a new container to that pod sandbox, that type of of request, basically what Kubernetes needs to do its pod stuff. And back then the problem was that there was two ways of running containers in Kubernetes. Rocket on one side and Docker on the other. And each had its own code within Kubernetes to handle the pod creation. So it was obviously a bit confusing and every time they had to change a thing on the Docker side, they had to replicate it on the rocket side. So obviously that was calling for an abstraction and an interface which was added at that layer. So CRI just sits below Kubernetes for giving orders to all the CRI runtimes that sit below it. So Docker had to reorganize a bit in order to submit itself to that. So it's actually split in three pieces and I'm simplifying because each box contains multiple pieces. So the Docker CLI was called containerD and to glue between the CRI interface and containerD, a new project called CRI containerD, very creative name, was created to do this shim between the two, between the needs of containerD and the needs of CRI. It was still pretty simple. The diagram is still readable. But back then, at that point like 2017, Kubernetes business was booming. So everyone wanted a piece of it or at least to further reduce Docker influence on it. And at that point, Rocket had lost steam probably because Docker showed a lot of willingness to adapt and split its components to respond to those interfaces and so Rocket was basically made irrelevant by Docker willingness to be to be very adaptive. But there was still an area there that you could fill with something because why would you need, as a Kubernetes user, why would you need to do CRI containerD to go to an OCI runtime? So there was a room for simpler Kubernetes specific component to bridge between CRI and OCI. And that's where cryo was created. It's basically very Kubernetes oriented, takes CRI on one side, speeds OCI on the other. And that's very, that's very convenient if you're not buying into the Docker ecosystem. And if you did not buy into that Docker ecosystem, then the containerD part that you had to run to continue to run the Docker CLI was a bit redundant. So there was space for a CLI tool that would allow you to test containers and pods outside of Kubernetes without having to run any of the Docker bits. That's where Podman was created. Together with Lipod, I don't want to oversimplify it. But then everyone was still using RunC, so obviously that was too simple. As containers were becoming more seriously used and people realized that there was a need for stronger isolation between workloads, especially in the public cloud scenario where you might host your containers next to someone else's containers. And some people don't have a really good hygiene in when they share their workloads. So that's when everyone ended up discovering the dirty secrets of containers. They're actually not very good at containing and at least not enough for sensitive workloads. And as a result, in the real world, containers actually run in VMs. That's the dirty secret of containers. And when I say in the real world, I mean in Amazon Web Services, Google Cloud, Ali Cloud, Microsoft Azure, they all run into some form of VM isolation. Those proprietary clouds all have proprietary solutions to do that. That isolation is powered by pieces that are not open source. So there was clearly a need at that point for an open source solution, an open infrastructure solution, to run Kubernetes pods within VMs in QMU KVM. Back then, there was a company called Hyper that was creating a container runtime called RunV that allowed to run containers in VMs, in QMU KVM VMs. So they had a Hyper CLI tool that you could call to run containers within micro VMs. Ten minutes left, that's plenty. That's perfect. Speak fast. So they developed a CLI runtime called Fructy, which you might have heard of or not, that allowed you to run pods directly on RunV and be able to run those pods in VMs. Around the same time, Intel was working on clear containers, which was an OCI runtime to run containers on QMU KVM. And there were really a lot of similarities between those two projects, RunV and Clear Containers. So they merged into a neutral openly developed project called Cata Containers under the OpenStack Foundation. And that was Cata Containers proposed to run pods of containers in micro VMs run under QMU and KVM. So since Cata Containers was an OCI compliant runtime, you could use Podman or Docker to actually run containers directly on Cata Containers. So you did not need that Hyper CLI anymore. That's why it's removed from the diagram. The other nice side effect of doing this project, this open source open infrastructure project, and it's more generally a great thing about open source is that it encouraged those companies, the Googles, the Amazons, to also release their proprietary technology for container isolation under an open source manner. They did not want to be displaced by an open source solution that would make whatever they were using less relevant. So that encouraged them to also publish as open source their own projects. And so Google released GVisor, which at least in its Ptrace mode is using syscall filtering for container isolation. So they are not really using VMs. They have modes where they run within VMs. But the Ptrace mode, which is probably the most interesting one, is actually doing syscall filtering, active syscall filtering. And Amazon released Firecracker, which is a highly opinionated virtual machine manager, because they found that QMU was way too wide in terms of what it supported. And they had a very narrow use case, which was to run functions in micro VMs. So for Amazon Web Service Lambda, they would run to run those functions also in VMs. That's another dirty secret of functions. And to run their secure containers, run them in secure containers. Cata containers evolve so that it could run in QMU or in Firecracker or in NEMU, which is like a light version of QMU for actually running VMs. So you can really run those pods into extremely simple VMs that boot in microseconds. That's actually where the diagram started to get too complex, because there is a hole in there and nature abores void. So someone decided that it would be a good idea to directly link container D with Firecracker and bypass the OCI interface and directly connect the two. And they created a piece of software called FC Container D, also a very creative name, bypassing the OCI runtime. And yeah, how do I represent that? I need to add another dimension to this diagram. And then I learned that Cata containers also plugs directly into container D and cryo to leverage advanced feature in there, in addition to being an OCI compliant runtime. So those CRI runtimes are actually developing advanced features that are smarter than what the OCI runtime interface actually mandates. And so plugging directly into cryo or plugging directly into containers, then you can get better performance or more features. And that's where they started to more directly link, also bypassing the OCI interface. It's still an OCI compliant runtime, but it's also not an OCI compliant runtime if you want a directly plug. So that's where I did stop trying to represent it in a single diagram. That's where I stopped. So you have CLI tools in green, you have CRI runtimes in purple, OCI runtimes in blue, and virtual machine managers in red. And you can see which ones are actually complementary, which ones can be used as alternatives. And hopefully this diagram gives you, if I know this, Devrom is probably well versed into the wonderful world of container runtimes, but it, I hope that for some of you gives you a better understanding of how those pieces fit together. I know it helped me. So which ones are complementary, which pieces fit together. And so thank you for your attention. And we have time for questions. I'm pretty sure there are errors in this. So tell me. So there are many options. Which ones do you use and do you recommend using and why? So like from our perspective we're producing mostly Kata containers, so our problem is more to make sure that whatever is invented around it, we actually support it correctly. So the most of the work that's been done is through supporting the advanced feature in cryo and container d and plugging into QMU and MU firecracker really quickly, so that things like FC container d don't get created, but that doesn't prevent people from creating code. And so that's our perspective. I would, personally, I like the, I like going going vertical, Kubernetes cryo, Kata containers, firecracker, KVM, because that's where you get, I would say, the simplest stack. But I understand how people like to use the Docker toolset. And then if you're like a Docker shop or if you run also containers, simple containers, rather than just Kubernetes, it actually makes sense to traverse it in another direction. So going through a CRY container d, container d, and then whether or not you use Kata containers or RunC or GVIS or others is really about your sensitivity profile to the various security properties, because Kata containers is not as fast as RunC. Like it's more like tens of milliseconds on one side and hundreds of milliseconds on the other. So if you are at that level where you need 15 milliseconds response time, maybe going Kata containers, KVM is not, is not okay. Kata containers, firecracker might be one, because it's like 50 milliseconds, 30 milliseconds. So it still has an overhead, but it depends completely if you are running trusted workloads on a private humanity cluster, you don't care that much about the noisy neighbors and the, the, the peaky neighbors. If you're a public cloud, you have to run something like, like Kata containers, basically. Other questions? Oh, you know someone back now. Where would you place LXD here? Oh my. You probably, Stefan, can probably place LXD somewhere in there, because I'm pretty sure you can plug into Kubernetes. Yeah, so someone, someone wrote a CRY for LXD and Kubernetes. That's LXE. LX what? LXE, because someone is just going to have the next letter, you know. So there's that thing, which then would do Kubernetes, CRY, LXD, and then LXD can either go through LXC straight to the kernel for containers, or these days we can run virtual machines, which then goes to QMU and then kernel. So LXD would kind of replace the entire middle layer. Yeah, but you end up with LXD, ends up running in LXC containers, right? LXD as of last week can run virtual machines through QMU as well. Yeah. I know I had to stop at one point. Yeah. I think I stopped at the right moment. It still makes sense. So from a diagram, we can understand that the left part is the more secure one, and the right part is the one with the lowest latency. Well, no, no. So you have one C which would run in traditional containers. LXC would probably be also be usable there. And the isolation increases as you go left, I would say. And there are other things, like NABLA containers is also a solution that does container isolation for pods. It just has a slightly less success in the container, in the Kubernetes world, I would say. But what we're seeing is people using Run-C and Cata containers at the same time, and depending on the workload, they would switch to one or the other. So you can basically run them in parallel and decide workload by workload if you're switching to the secure runtime or the less secure one. So next year, please 3D print a diagram with the other options that are missing. Yeah, really, it's a space where people like to reinvent things. And so I'm not very... I don't trust that this diagram is getting any simpler in the future. Human nature. All right, well, thank you very much. Thanks.