 Hello everybody. Welcome. Thank you very much for joining us and thanks for sticking around until the last day of KubeCon. We all made it. You know, I hope everyone's had a great one so far. And I hope it gets even better with this. My name is Peter Hunt. I'm a senior software engineer at Red Hat, working primarily on Cryo, but also an upstream signode and on RunC and Podman and other container-related technologies. Hello everyone. Hope you've had a great KubeCon so far. My name is Ravishi Munani and I'm a principal software engineer at Red Hat, working in the container space. So mainly working on Podman Builder, but I'm also a Cryo maintainer as well. So today we're here to talk to you about Cryo's back. All right. Essentially we're going to be giving you a quick intro on Cryo as well as some updates that's happened in the project since the last time we met. So I'm guessing some of you already know what Cryo is, but for those of us that are new joining us today, Cryo is a lightweight daemon that implements the Kubernetes Container Runtime interface. It's a CRI that helps you run your containers and production securely with good performance and with stability. So Cryo is compatible with the OCI spec, so it supports all OCI-compatible images, all OCI-compatible container runtimes, and all OCI-compatible registries. Some of the OCI-compatible container runtimes are RunC, CRUN, CATA, GVISOR, many more out there. So an overview of what Cryo is responsible for when you're spinning up your containers. First thing it does, it authenticates your images with the OCI registry. It verifies the image that you're trying to pull. It can use 6-store now to do that as well. Once all of that is done, it will go out and pull the image for you if you don't already have it downloaded and it provisions the disk resources so the image can be written and stored on your disk. When it comes to pod, it creates the pod-level namespaces and delegates all the networking creation to the CNI, which is a container networking interface plugin that is used in Kubernetes. And then for containers, it translates the CRI request that we get from the KubeNet. It translates it to an OCI spec so that it can be passed down to the OCI runtime to be started up. It monitors and redirects any output of the containers back to the KubeNet so the KubeNet can get notifications of when the container is up and running, when it's down, what happened. It also transfers the logs as well. And it also provisions the disk resources that is needed for that. Yep, I already went through that. Yeah, so why Cryo? So I think one of the biggest things is that Cryo is made specifically for Kubernetes. It only works with Kubernetes. Our focus is Kubernetes. So if you try to plug in Cryo with any other container orchestration tool, it's probably not going to work. We haven't tested it. But we ensure that we optimize performance of Cryo based on any of the Kubernetes features that are coming in. You can also enable experimental features with annotations. So we track option Kubernetes very closely. So if we know there's something new that's coming in, we work on trying to get that into Cryo before it may be available in the Kubernetes if it's something that needs to go in the CRI and like a container runtime interface basically. And then we rigorously test Cryo. So we have over 1,500 tests running. These include the Cryo test suites, Kubernetes test suites, OpenShift test suites. We also have some Cata tests that are running as well with Cryo. So we rigorously test this to ensure that whenever Kubernetes has a new release or something is going on, we don't break Kubernetes. One great thing is that Cryo versions walk in lockstep with the Kubernetes versions. So if you're using Kubernetes 1.26, you know that Cryo 1.26 is what will work with it. You don't have to have another graph to try and match, oh, Kubernetes 1.26 is that Cryo for something or you know, but it's the same versions. And every time Kubernetes releases, we also release at the same time. So we just released 1.27 yesterday. Yesterday, yeah. And then we heavily focus on security with Cryo. We ensure that we reduce the attack surface as much as possible. One thing we do is we enable less capabilities by default. So if you go and check the containers that are running with Cryo, you look at what capabilities enabled. You will see that they are not as extensive as you would probably see with Podman or Docker or any other container runtime. Cryo only has functionality for what exactly is needed in a Kubernetes production cluster. So you can't do build with Cryo, for example. You can't push an MSR registry because all that is not needed when you're running your workloads in production. And we're quick to adapt to new security knobs. We have read-only mode to run your containers in read-only mode. So minimizing the attack surface. I see the next capabilities as well as user namespaces. Yeah, so the goal is making running containers in production as secure and boring as possible. So you don't have to worry about it. Here are some few Cryostats. We have over 4,500 stars on GitHub. We have had over 180 plus releases so far. And we have synced up with 18 versions of Kubernetes so far. Over 7,500 comets and we have 100 contributors on our repository right now. It's an open source project. We're always looking for more contributors. If you're interested in trying it out and checking it out, please check out our repo. Open issues, open PRs. We're always looking for all the help we can get. And we currently have 10 publicly listed adopters. So a few updates that have gone into Cryo since the last two or three releases is. And 126 we added support for Node Resource, for the Node Resource interface. NRI essentially lets you use plugins to carry out certain actions that are outside of the scope of the CRI. I think it kind of started with a focus of for devices, but with the community collaboration, we're making it such that it's agnostic so it can be used for different types of resources and not just limited to devices. This is kind of an example of the community in bloom. We've been working with Container D and Cryo and NRI to get all of this done and supported. And we're hoping to replace the CDI with NRI completely in future releases. Next one is FreeBSD is getting a lot of traction. So we are working on adding support for it. Cryo has been mainly Linux-based, but that doesn't mean that we cannot support FreeBSD, even this demand for that. So we're currently working on it, on adding support using RunJ on FreeBSD. We have added some initial support to Podman and Conmon. Conmon is our container monitoring tool. And we're working towards Cryo for that. This is all currently very experimental. And as I mentioned earlier, community feedback and support is very welcome. Peter will go ahead with the case study now. Thank you. So yeah, so next up we're going to talk a little bit about some other new things that we've been working on and I've kind of written a story about how those all kind of tie in to you as a cluster admin using Cryo. And so specifically we're going to be talking about mitigating like machine in the middle attacks and some features in Cryo that exist that will help one do that. So imagine, if you will, I'm a cluster admin and I have my very important app running somewhere and I'm worried about supply chain attacks. Those are very hot right now, something people worry about rightly so. We want our containers in production to be running in the way that we expect them to and we don't want someone coming in and mucking around with them. So the bad case scenario in this case is some malicious entity coming in and intercepting for instance like the container image poll or messing with the registry a little bit. And there are some ways to mitigate this outside of these features that I've talked about using full image manifest to pull, the digest to pull your image rather than going by tag. So that's a very easy way to mitigate it but we're not always very diligent about that and it's harder to update images if we do that. So there are some other options that Cryo provides to provide some extra security around that. So the first thing that we can do, we're running our cluster and we're worried about these machine in the middle attacks. So the first step would be to catch someone actually trying to do this. So ideally we'd have some way to be paying attention to the container and have the Kubernetes API alert us when there's some unexpected processes happening in our containers. And we can do that with Setcomp. So Lipsetcomp is an interface that the kernel exposes to allow user space processes to get an idea of what's going on which this calls processes are calling. Historically it's been used, the kernel will kill a process that uses a syscall that's not allowed within an allowed list but there are newer kernels. There's also the option to have the kernel notify a user space process when a syscall is used outside of the specified list. And Cryo has recently, as of 1.26 I believe, added experimental support for leveraging Setcomp notify to allow an end user to catch a syscall being run inside of their container without, you know, and then kill the process but also let them know. And I'll go over a little bit about how that works. So we start off in Cryo. Cryo is going to be running our container to pull the image, start the pod and then eventually start the container. So we take this first step and we run the container. The container is running, it's doing its thing, chugging along. And suddenly there is a syscall that we don't really expect to be in there. We have our list of syscalls that we expect our container to be using and suddenly some malicious user was able to access the container and use a syscall that wasn't in that list. So when that happens the container attempts to call the syscall and that goes up to the kernel and then the kernel catches that and notifies Cryo and tells, hey, Cryo, like your container, just use something that you weren't expecting. Cryo will then stop the container and then emit an event both to the Kubernetes API as well as a metric to Prometheus. So here we have a couple of ways that we, you know, a couple of configurations that we need to set this up. So on the top left here we have, you know, the Setcomp profile that will be used. So we allow these couple of syscalls, but we don't do anything else. We don't allow anything else. And then we also, in the Cryo configuration, we have the default runtime runC, having allowed annotation that says Setcomp Notifier Action. And then in the pod spec what we do is when we set the Setcomp Notifier Action to stop, Cryo will do all the internal plumbing to set up a routine inside of Cryo that's listening to the kernel for any notification actions that happen. And when that container does use one of those actions, as we saw over here, that watcher routine will catch that action and then respond to it. And then here we have what the different events look like in, you know, that are emitted from Cryo. The first thing is Cryo will emit a, you know, the reason that it terminated the container. It'll say Setcomp killed. And it'll actually say the use syscalls that the reason that it was killed and it'll actually wait a couple of moments before stopping the container to allow the container to, you know, use a couple extra ones so that we catch all of them. And this is, you know, it's possible in this case that the action, the syscalls that were caught, we actually do expect. And it's possible that, you know, maybe this was just an issue with the second profile. But, you know, so we attempt to collect for a little bit longer just to allow, you know, that container to use all of them. So we're not like doing a crash loop back off situation. But, you know, eventually either the action would be to update the second profile because those syscalls are expected, or we'll want to do a little bit of diagnostics on what actually happened, like why this container was using the syscalls that we don't expect. And for this, we can move towards our next feature, which is Checkpoint Restore. So in Kubernetes Checkpoint Restore, support has been added in Alpha in 125 and Cryo has added support for it as well within the Alpha feature. Checkpoint Restore, there's a program CryU, which is very confusing for some people, myself included when I began. And CryU stands for Checkpoint Restore and User Space as opposed to Cryo, which is the container runtime interface, OCI. Using CryU, the Kubernetes stack is able to take requests now to checkpoint a container, a running container, and then an admin can inspect that checkpointed container, run it somewhere else, maybe restore it on another node where it's a little more sandboxed, or maybe just poke around the checkpointed blob and see what's going on, like what the container was up to, maybe catch some additional issues with it, and then hopefully be able to then quickly fix that in production. The fact that it was checkpointed will be hidden from the original container, so if there's a malicious actor doing something, they may not necessarily know that the container is being checkpointed, which will allow someone to respond quickly, but also efficiently, and eventually a container could be restored to another Cryo environment, or back into production node. So as I mentioned, the feature is currently in AlphaState, and we're looking for feedback on the user interface, as well as we're working with the container community to come with a unified format for the image, the checkpointed blob, so that eventually you could even checkpoint on one and restore on another, perhaps. So here we have the user interface, so it's pretty simple, though it's kind of convoluted right now, because once we get to beta or graduation, the user interface will be better, we'll be able to ping through the Kubernetes API server, but for now you just have to directly make a request to the Kubelet, so you have to know. Here we have this at the top, the URL that you would request, and so the first part of that local host, that's just like the name of the server, the URL of the server where the Kubelet is running, where you want to checkpoint this container, and then you hit the checkpoint endpoint, and that's in the default namespace, and then counters would be the name of the pod, and then counter is the name of the specific container. Right now you're just able to checkpoint a single container, and maybe one day we would extend that to be able to checkpoint a whole pod if this feature was extended to something like live migration or something like that. And then the way that you restore it is you just, you create a, so this here is a CRI JSON blob. If you've ever used CryCuttle, then you'll be aware of kind of what this looks like, so this is the way to create a container with CryCuttle, so you would maybe run this outside of a Kubernetes environment, but while running Cryo and use CryCuttle to actually, you know, restore that container and inspect what's going on in it. So the key things here is you just have to give it a name, because all the objects need one, and then for the image instead of specifying, you know, like an OCI image path that would maybe be pulled from a registry, you specify a path to the checkpoint archive, and then Cryo will under the hood interpret like that checkpoint archive and be like, oh this isn't an image, I could pull anything, I just need to restore this container. So Cryo could do that, and you could poke around inside of it or see what's going on in an isolated place and maybe be able to, you know, catch a Melissa's actor who's, you know, intercepted your container or your image running. So we've done our investigation, we mitigate the issue, we find, okay, you know, there was someone, we, you know, fixed, you know, the permissions on things, or you know, we managed to migrate something so that this isn't, you know, we've settled the incident, but now we want to look forward and ask ourselves like what can we do to prevent this in the future, you know, in a retrospective kind of way. And a great way to do that is to use signed images to be able to verify that an image that you're pulling and running is the one that you expect it to be. And for that, we would like to recommend Sigstore. So Sigstore's been a hot project for, you know, a year or two now. And Cryo has support for verifying with signatures using Sigstore. So we've always had, or for a long time had the ability to verify with signatures in general. But recently, as of Cryo 126, or 127, we also have the option of verifying signatures in like a recore or a full CO store. So, you know, an admin could after, you know, handling this whole incident begin signing their images. And so whenever Cryo attempted to pull an image, we need to also verify that image against like a recore log. And to make sure that, or, you know, maybe they'll be signed by full CO, a CA in full CO. And so the idea is that we can use keyless signing and you don't really have to be managing GBT signatures and the like. We can have an easy user interface and that's all integrated into Cryo. And Podman, some I recently, also has support for pushing images with signatures. So if you want a complete stack solution, you can use Cryo and Podman together to sign your images and push them to the registry and then Cryo can verify those there. And so with that, we have kind of a complete stack of, you know, experimental features, a lot of features that we're looking forward to kind of moving further along and developing that can help a admin, you know, manage the, manage different, you know, levels of attacks on the stack to mitigate malicious users. So we actually, we went quite quickly and that is the content that we have. Thank you everyone for joining us today. We have a number of links here and we'll post this on the schedule as well so that you can access them directly. I would like to take any questions if you have any and I'll leave this up for a little bit and then I'll move to the QR code for any feedback that you have, but do we have any questions in the audience now? I would like to ask one question while checkpointing. So you're talking about from security perspective, but what is the plans maybe for integrating into the job control? Like suspending pods and just resuming without any security impact, but for like preemption kind of things? Totally. So that's definitely on our radar as many Kubernetes contributors know in this room, the process of getting a feature into Kubernetes is sometimes a slow one because we try to be very deliberate about, you know, making sure all of the pieces work together. So the main engineer who's been working on this Adrian opted to go for like a step-by-step approach of getting checkpoint restore into Kubernetes. So this is the first step. The first step is just, you know, we have the API between a Kublin and Cryo. Eventually maybe we'll have one between the API server to allow for the checkpointing to happen, you know, and then between the CRI implementations we're going to be negotiating what the actual restored artifact could be which will make it easier to pass along that archive to different places. Eventually I think it is within, you know, on our radar and in the scope of what we're imagining it would be cool to be able to do like a live migration thing but we're taking it, you know, deliberately and slowly to make sure that it's stable across the stack. There's a lot of pieces that kind of need to be moved together but definitely if you look at Keb2008 which was listed in the slide earlier if you leave your feedback in the enhancement itself you know that this is something on your radar this will help the Kubernetes, you know, contributors prioritize the work that's important for you so if you have, you know, a specific U-case you have in mind, we'd definitely like to hear that. Thank you. We've got another one over there. Hi there, so I wonder about the event emitting part if there will be any mechanism implemented to prevent the amount of events emitted like the huge amount of events emitted by the SRIO Sorry, can you reword that a little bit? Yeah, like there is a possibility to emit a lot of events, a lot of syscalls from the container and emitting that huge amount of events could block potentially permit use or some other services and since the resource management is being done on container level the runtime could it's a part of the resource management so it could be a lot of overhead while emitting the events so how to deal with that or if there will be any mechanism to protect the note from dying by emitting a lot of events. Right, so yeah, the overhead for you know, doing the cloud notify pieces will be charged to the system so it'll kind of need to be calculated into the system reserve that the kubelin and cryo end up using something that an admin could do to mitigate that is choose the containers that they want to you know have these events emitted for so you know this is an annotation that you have to specifically opt into so if there are containers on your system that you're not worried about as much or you've already done this testing or you know you're using syscstore to verify the signatures on those images if they're more verified then you can reduce the number of containers that could hit this we kind of consider this situation like using a syscall outside of the list of syscalls to be an abnormal situation so we in designing this did assume that there would be an amount of overhead for that because you know this we think that emitting the event and the metric will help developers or cluster admins cast this situation and for the containers that they opt into it I think that value proposition will end up being worth it but definitely you know reducing the set of containers will reduce the overhead and it's just kind of a trade off of like you know do you want this feature for enhanced security or do you want slightly better performance the other aspect to mention is that we'll go through will depend on the restart behavior of the pod so we'll only emit one event for that specific container because it'll be terminated so like that termination will be caught if that container doesn't have a restart policy always or on failure then if you have a restart policy never then you could you know do your investigation and then catch all try to expand the syscon profile or you know mitigate the attack and then and then eventually there'll be fewer events so the idea is that this is kind of a later step you would have ideally already kind of figured out your second profile for your container so this won't be happening a whole lot it'll just kind of be you know for the special containers that you're kind of worried about maybe in a multi-tenant environment or you know just ones that are very critical and so the scope is a little bit smaller thank you do we have any other questions sure yes so we have the feedback here and yes thank you for everyone for joining and I hope you enjoy the rest of your QCon and stay travels home