 Hello, everyone, and welcome to Pots and Circumstance, Cryo's Graduation Celebration. I'm Sascha. I'm one of the maintainers of Cryo, and I'm also involved in container runtime-related projects since a couple of years now, and I have the pleasure to be here today with Peter. Hey, everyone, thanks for coming. My name is Peter Hun. I also work on Cryo and SIGNode container runtime things. Appreciate everyone coming out here. You all are my favorite conference attendees for making it out. Also, everyone on the virtual platform. And we're here to talk about Cryo. And so as some of you have gathered, Cryo has recently graduated in the CNCF, which we're very excited about. So this is, in part, we're going to start off talking a little bit about the process of graduation and some of our takeaways. And hopefully, this could also be a template for other projects who are interested in graduation and some of the gotchas or difficulties we found along the way, and also some of the easier parts that are things that are being fixed. So yay, we graduated. We graduated over the summer. It was a pretty exciting accomplishment. It's a sort of a, it helps, sorry. It reinforces sort of the Cryo's usefulness in production and its stability. So we're very excited about that. We have a, there were a number of nice articles that were written about it. And nice quotes here, some of them here. But yeah, we were really excited. It was a long process. We pretty much started the graduation process three years ago or so. And along the way, we kind of hit a couple of road bumps that maybe could have allowed us to graduate a little bit earlier. So I'm just going to talk through some of those. I think one of the biggest hurdles in the graduation process was actually getting a security audit. So one of the, so here's a handful of the requirements that you have to do. Most of its paperwork, you have to have a governance file. You have to maintainers file. And all of these things are pretty straightforward. You just have to do them. That's not too hard. The best practice is badge. And that's just a number of items that you have to complete as a project, having CI and having linters and stuff like that. Having a license. Having committers from multiple organizations. We have long had committers from multiple organizations. So the project is largely maintained by Red Hat. We have maintainers from SUSE and IBM and Intel or contributors. And we'll get into that a little bit more. And then you have the security audit, which is a big piece of it. And then you need to get the vote from the TOC. And that is one kind of remaining hurdle. So talking a little bit about just our biggest, one of the biggest challenges is supposed to be easy. There's not that many things. But we found a couple of challenges along the way. One of them was getting the security audit in the first place. So here's sort of a little timeline. I've taken out some of the names for privacy reasons. But this is some of the correspondences. And you can kind of see a timeline. We reached out in September of 2020 being like, hey, I think we're ready for a security audit. We're ready to start the process of graduation. And this was someone in the CNCF. And they were very helpful in starting out the process. And they were eventually how we were able to get the audit. But we had one company try to do the audit. And there were some weird bureaucratic problems. So that stalled basically all an entire year. And then finally, after we reached out again and they were able to connect with Austif, which is a company that basically helps be a liaison between open source projects and various resources that they need to open source. I forget what the acronym is actually. But the folks there were incredibly helpful. We're really appreciative of that. And eventually they did get us set up with the security audit. And then in 2022, basically March 2022, a year and a half after, or yeah, we managed to get the audit started. And then finally it was completed. And the auto went well. Cryo was deemed to have a pretty good security posture. We're pretty happy with the finding that they made. They set up fuzzing for us. And it was good. I think for projects who are getting prepared to go through this process, this process is much more streamlined now that Austif is kind of regularly involved in these conversations. So if you're a project and you're looking for a security audit reaching out to the CNCF and they'll probably connect you with Austif and then they'll connect you with security researchers. So we were just kind of caught the beginning of that wave. And so it kind of, we had some challenges along the way, but we made it. A couple of other small little paper cuts that we had along the way. Even though it's not explicitly spelled out in the requirements that the CNCF gives, you do have to fill out, or you're asked to fill out a due diligence document, which combines basically all of the formal requirements and then also has a number of informal requirements. You kind of make a pitch for your project and talk through what is your project, talk about the architecture of it and kind of say why you think that it's ready for graduation. This was a requirement that kind of came as a surprise to us, but it was revealed that a number of other projects had already done this in the process of graduation. So this was something, I mean, it was again, just a little bit more paperwork, but something that came as a surprise. And then one final little surprise that we kind of hit along the way is the technical oversight committee for the CNCF. They have meeting times that they need, they basically have sessions, times that they're meeting. Before QCon, they stopped meeting, which makes a bunch of sense. They're super busy during this time, but we didn't realize this and sort of we missed a couple of deadlines as well because the communication wasn't super tight. And because of that, we also slipped. So we actually maybe could have made it, graduated a little bit earlier, but all in all, the process happened. We're graduated and we're very excited about that. It's a good reinforcement of the stability of Cryo and sort of how well it's been running in production for what, like eight years now, something like that, seven years. So we're very excited to be among the graduated projects. And so I'll break out the champagne later. I just wanted to shout out a couple of things that we've been doing as a project. So we're gonna be talking more about features in a little bit, but just we've been starting doing some mentee programs which has been helpful, getting some extra people on the project and we're always looking for more contributors as practically every other project in the Kubernetes ecosystem is. But having official avenues that are provided through the CNCF like Google Summer of Code and LFX, which is the Linux Foundation, I don't know what X, it's just like internships done through the Linux Foundation. These are quite helpful programs. So if you're a project and you're looking for help, we found this to be good. We've had some help on our Comm on RS which Sasha will talk about a little bit. And we also have had help with CRI stats which I am about to talk about. So I don't know if anyone has been following the CRIO update talk, the abstract for it over the past couple of years, but pretty much every talk, we have mentioned CRI stats update and actually pretty much every talk, we have not talked about CRI stats and there's a little bit of lore behind that. And the reason for it is it's taken a long time to do this process. So the general overview of the idea is in SIGNode, we have this initiative to move the stats collection from C-Advisor to the CRI implementation. There's a couple of reasons that we'd want to do this. Mostly just that C-Advisors being included in the Kubernetes ecosystem is kind of a historical, it's like almost like a ghost limb, a phantom limb of the time with Docker. So there wasn't sort of an entity to do this metrics collection regularly with Docker. So C-Advisor was introduced right at the beginning of Kubernetes and it's just kind of been around ever since and we added plugins for CRIO and Container D along the way as those became supported options for the CRI. But the interactions are a little bit awkward, like it has an HTTP API that it hits both of the CRI implementations to get information kind of behind the back of the Kubelet and it registers an I-Notify and the C-Group hierarchy to actually know when a container is created. This is like working fine, but it's also difficult to extend. It's kind of a monolith program because it, you know, and it's hard to extend, especially for different like topologies of containers like a Cota Container, you know, which is a VM running on the host. It's running I-Notify on the C-Group hierarchy. There isn't a C-Group hierarchy for a VM or like from the host perspective, like there isn't a C-Group created for that VM. It's totally separated. So C-Advisor doesn't have any insight into what those VMs are doing. So we started this process a long time ago, 2021 originally, and the idea was just to take the SaaS collection and move it over, but it turned out that there were a number of difficulties. C-Advisor is actually pretty entrenched in the Kubernetes ecosystem. So the process of breaking things out and moving it into the CRI has been a little bit difficult. We finally kind of have a design. We kind of came to that somewhere around 125, 126, and now we've just kind of been working on the CRI implementations to get support in there. So here are the GRPC calls. So this is like CRI files. So these are the GRPC calls that the Kubelet will use to call directly into the cryo or a container to get the stats. So you can see on both sides, there are kind of two different stats, two different ways that the Kubelet uses the stats. I'll go into a little bit of detail about that in a bit, but just think of the stats as a structured API that the Kubelet uses to emit for things like eviction, and then the metrics are unstructured key values that feed into Prometheus. So we have here, so before basically how it works today is Cadvisor does all the stats collection. It holds basically these large stat objects, and it registers their C groups and I notify it, just kind of continually like, is watching the C groups and is continually reading them. It then directly writes to Prometheus through its metrics, Cadvisor endpoint, and this is how you get the container stats through the entire ecosystem. And then it also has, there's a API, well, Kubelet requests the stats from Cadvisor and then translate that into the stats summary API, which also feeds into the metrics resource API. And these two, this is for structured stats that the Kubelet like basically has an API guarantee of like these are the stats that we will report for eviction or for the scheduler to know how much space is on the node and things like that. Kubelet also depends on a couple of other like Cadvisor for things that are not pods and container stats like the node level stats. And so, and the eviction manager does request directly of it as well. The goal and what we're gonna basically be moving to is how, and I showed the CRI messages earlier, but basically the stats summary API is gonna be fulfilled by the pod sandbox stats request, and this request is going to basically take a structured API from cryo. And so you see here like we have memory usage and CPU usage and like these are things that the Kubelet will then immediately interpret and feed to the various APIs that are reading from it. And then we have the metrics which are more unstructured and this is actually what I'm gonna be talking about today and this is what's kind of dragged on a little bit is the CRI metrics have been a little bit tricky to get going because we want to keep the metrics Cadvisor endpoint intact, but we also want to be able to have them come from the CRI. So we've basically made it so that, and we also want it to happen efficiently. So you'll notice there's, so we have the metric and actually in this picture, I'm missing a little bit, I'm just noticing, but the sandbox metrics and the container metrics basically will hold all of the values of the metrics and the keys will be registered as a separate API call like the list metrics descriptors. So there's basically two separate request lines that will be going down to CRI on the reason for this is we reason that the metrics that are gonna be reported on a node, like once the node starts will be pretty static. The metrics will be read. The keys that the Kubelet cares about will be the same and then the values of the things that change. So we wanted to reduce the amount of object creation, which I'll also talk about a little bit later as an optimization. And so we have this sort of double request way. So all of this here you see are like, these are the values of the metrics. And this is what CRI will use various libraries to read into and then populate these objects and return them up through the Kubelet, which will proxy it through the metrics C advisor endpoint. So even though it won't be coming from C advisor and end user who upgrades to use this feature will see the metrics as they have because ultimately the metrics have kind of become like an unofficial API of Kubernetes. Like even though we've never promised anything of it it actually has kind of been that way for so long that people just assume it's there. C advisor is still gonna be around for node level stats. So it's not totally going away. It's just gonna be out of the business of doing potting container stats. So here we have my colleague Sohan is open to PR and we've been working on this PR to get the potting container stats or specifically the metrics from the CRI. There's a handful of like sort of challenges that we've come along the way and some optimizations that we wanna fix. So I don't know if anyone's ever profiled to Kubelet and seen kind of how heavy the metrics collection is in Kubernetes but C advisor actually consumes a lot of CPU and a lot of that is just creating these tiny little objects like container stats over and over and over again. So something that we want to try to focus on in cryo specifically is like reduce some of the overhead of metrics collection and this is kind of some of the hope of the promise of CRI stats. So we ideally would collect a metric once then use it everywhere. So instead of duplicating between the stats and the metrics we just collect it once and then basically hold those objects and return them. That's pretty straightforward that makes sense but we have a periodic collection. So cryo is collecting on its own independent time and so an admin could configure how quickly the stats are refreshed to be able to configure how much CPU is used in that process and that's run independently of the Kubelets refresh frequency. So the Kubelet isn't waiting for the CRI for cryo to get the metrics but it's also collecting them basically independently from each other and returning them asynchronously. We also are hoping to have configuration for which metrics to collect. It'll only be on the granularity of like a C-group controller so like you can get all the CPU metrics but maybe you don't care about the network metrics like your pods are just, they don't care about each other so they're not doing any network things for some reason. And you can tune cryo's behavior to be able to not collect those metrics and reduce the memory in CPU that cryo is kind of continuously using. And finally like hope of ours is to be able to reduce the object turn. So a big part of the CPU usage of the metrics collection stack is actually from the go runtime garbage collection, these tiny little objects that it's just allocating over and over and over again. And so our hope is that through the metrics collection process, we can hold on to some, like the idea is basically to have like two lists of metrics simultaneously being held. One of them is the old metrics that's being returned to the CRI and one of them is the one we're updating and then when we finish updating them we switch the two pointers that we're just constantly holding these two lists which the go runtime would practically be doing in memory but this will allow us to write it in place rather than having it constantly allocating these new objects and optimize the CPU in that way. So these are some of the things that we're hopeful about in the metrics, in the metrics collection. To be honest, we haven't finished it yet. I was hoping to have finished it by now but we haven't, we're aiming for 129 but it might slip to 130. It's a hard process and it's kind of so because it's so entrenched we want to get it right. So we're not going to rush it in and have an incomplete solution, we want to get it correct but we're hoping to get it as soon as possible because it's been a long time, we're ready. We also have been working with the continuity community advocating for them to add support. They have yet to do so as well. So and then once both of us are finished with that we're going to push the cap to beta and finally maybe one day be able to GA and drop C advisor from the business of collecting potting container stats. So that's it for me. We'll push it over to Sasha for talking to Conmonis. Thank you. Yeah, let's speak a bit about Conmonis. So just as a summary, we have Conmon as container monitoring tool right now in place and this helps us to watch the whole life cycle of the container. But we were working on and rust replacement which is called Conmonis and we had a working implementation probably since the beginning of the year but we released a bunch of new versions since KubeCon in Amsterdam. And I think it's almost about stabilization right now but we also have a new logo. So just look at this little tiny crepe and this little eye. So it's more or less inspired by the all seeing eye. So it should outline that it's a monitoring tool but it should also highlight that it fits into our ecosystem of other projects like Potman, Cryo and others. So it just has the same shape. And other than that, we also added did like everything which is around Conmonis. So we added multi-architecture support for our static binaries for AMD64 and PowerPC. And those are also now integrated into the Cryo static bundle and also the packages. And this will give us the opportunity later on to directly switch over from Conmon to Conmonis. So in theory, users could also right now try Conmonis. It also works. We also have testing set up with Conmonis but we are still waiting for the integration and department. And other than that, we are also working on as part of the LFX Linux Foundation rendering program. We are also working on different looking mechanisms and there are also some thoughts about directly forwarding the logs to some external resources and stuff like that, but all those features don't have the highest priority right now because we are just waiting for the Conmonis 1.0 which probably will be released next year. We are kind of optimistic about that. And another big feature we are working on right now is the six-store signature rarefication. So I think this would be now why don't we, why do we need it at all? Kubernetes signs container-based images since 124 and which is a long time. And they also have signatures for every other artifact right now but we are mostly focusing on container images because that's what Cryo can actually work with. And having six-store signatures for container images can be already verified by Podman. And the idea is now to also integrate this feature into Cryo to be able to verify six-store signatures for container images. And this is supported since Cryo 128 for pulling container images. And other solutions which don't do it in the container runtime actually go for the admission controller-based verification. So there's the policy controller available from six-store, but we are trying to go a different way and directly validate the signatures where the image gets pulled. And from our perspective, this kind of increases the security. So how does it work? Cryo reads policy.json since the beginning on. So this policy kind of outlines the main rules for what should be, what is acceptable for Cryo. So here we can see that the default type is that we reject everything and then we define our transports. So Docker is the standard transport for containers. And now we define, for example, like a container image which is signed. And then we now outline that it should be six-store signed. It should match the repository which basically means it matches all tags which are under this repository. And it also should match the OIDC issuer from GitHub and also the subject email. So the CA data, CA data from Fusio and the RECOR public key data is stripped down away here. Those are the standard instances which are provided by six-store which also means that we later on have support for custom instances as well. So what is new in Cryo 128 with respect to that? And we added a new option which is called signature policy deal. And this defines the route path for namespace separated signature policies. So Cryo will look up this path and for example, if we have a default namespace then it will look up this option path and then look for the default.json. So Kubernetes namespaces are unique which is great. So we can just use a directory tree structure to outline policies with that. And this works right now on image pull. And we also use the global policy as fallback which is kind of nice because then we can do something like administrators can build a hierarchy of policies and then apply it to the actual cluster. And yeah, future work. And right now it doesn't work on container creation. So the kubelet pulls container images only when they are not already available on disk. When a container image is already available on disk then it just passes the container image ID down to Cryo and Cryo should pick up this image. So and the issue with that is that we just could easily hook into that process and we need a signature verification container creation if the image already exists on disk. This is something we are working on right now. And also another big topic is custom resources on top of that abstraction. So if we look at the policy controller from six store for example, they have custom resources in place which make it more easier to manage those policies. Now we have a little demo and we can probably make it full screen. And yeah. How the signature verification actually works. Oh yeah, that's cool, thanks. So we can assume that we have a running Kubernetes cluster so Cryo 128. The latest release is Cryo 128.2 by the way released last week. And we have a cluster up and running which kind of is a single node cluster in our case. So now we have to ensure that Cryo runs with enabled six store support. And for that we have to ensure that the default configuration for the registries contains the use six store attachments and that this defaults to true. And now we can think about creating a policy. So the default policy in our case doesn't specify anything of interest. So it's just accept everything with the default transport and also for the Docker daemon which is like the default that transport for container images. So we allow basically everything in our default policy. And now we can create a namespace policy. So by that we can just create default.json and this contains the example I showed you previously. Just requires that a Cryo signed image is six store signed and that it matches the repository with the issuer and subject email. And if we then try to pull the image with specifying the sandbox config so we can just outline a simple sandbox config which contains the actual namespace which is important, otherwise it wouldn't match. Then we can use crycuddle to pull by using that pod config and the image. And now we expect that this works and it also works as intended. So this is great. But if we now change the policy to match a different subject email like simply wrong, not the correct email which actually signed the image then we have to remove it and repool the container image. And then we get the actual error message that the required email is not found. So it's kind of interesting. And also as expected in that case. And we can also do the same by using a Kubernetes pod, right? So the Kubernetes instance in the background is still running. We just run a pod with the default names in the default namespace on the same container image. And now we create a pod and then we get the actual error from the kubectl CLI. If we now describe the pod then we see the same error that the image will failed because the source image got rejected. And we also added a bunch of new single enhancements to Kubernetes. And this is the signature validation failed error code. And the cool thing on this is that it provides direct CLI feedback to end users. So if we just change the image name from signed to unsigned so there's this image exists in the registry but it's just not signed. It's a test image for all cases. And if we then apply the pod itself then we actually have a chance to see the signature validation failed in the image status which is great. So this provides a direct feedback to end users that the verification of the signature is not accepted at all. Thank you. Another thing we are working on is the cryo packaging efforts. So Kubernetes already announced in August that the legacy package repositories for APT and YAM package managers are frozen back in August. And the community on DAP and RBM repositories are now built on top of OpenSUSE on the OpenSUSE build service and they are already in production right now. So cryo already used OBS for packaging which is kind of cool. But we are using the sources and maintaining dependencies like Golang versions and also had to update the versions manually. And this is kind of a high effort especially if we consider that we have to support many distributions and stuff like that. And Kubernetes does that by using a different approach. They just use two repositories for DAPs which is one for DAP and one for RPMs. And cryo, funnily we already have a static binary bundle and this contains everything we need to run cryo. And this also gets automatically published for every release branches, release branch and also for tech releases and it's reproducible by using NICs and those static binaries could be used for packages like actually Kubernetes does it in the same way. So integrating the existing OBS project layout was an actual low hanging fruit. And this means that all future cryo packages will be shipped as part of the official supported infrastructure like which is then on pkages.kates.io. We added everything to a dedicated repository and we have a daily reconcilation shop which does the reconcilation for pre-releases like the release branches and also for every tech. And the only tech which is supported right now is the release we did last week. So it's 128.2. So we can already test it which is pretty cool. And we also integrated some test pipeline for various distributions and we have some like staging and releasing process like the Kubernetes upstream repository does this and we are losing the exact same tools to have achieved a similar result. So cool, now let's just demo it just for the RPMs. We have a second demo in the talk for the depth packages but we don't have to do it because it's basically the same without using a different package manager. So we have different streams available right now. We have stable129 which is empty, stable128 which contains 128.2. And we have the pre-release branches for main release129 and one release128. Release129 is main right now because we don't actually have release.1.29. This is how we do it. So we just defined the package stream as pre-release main in our demo. And this package stream should contain the latest version for Kubernetes which is available at least for that day. We have to define the Kubernetes repository for the RPM because this lives in a different directory structure. It lives in core stable. And the repository for cryo lives in add-ons cryo. That's the layout, how it looks like for extensions to Kubernetes. And that's everything we have to do and then we can simply install cryo together with kubetm, kubectl and the kubelet. And we can see that we downloaded kubetcni, the kubelet, latest version 128.2 which was available during the demo and then we have the 129 depth version of cryo. Then we just have to start it and prepare the node. We have to disable swap. If you don't want to run Kubernetes with swap support and we also have to load the net filter bridge, plug in kernel plug in and then we have to forward the IPv4 traffic. And that's everything we have to do and then we can run kubetm in it. And kubetm will use CI tools which we already installed with the previous commands to pre-pull the container images and then it will actually bootstrap the node by using cryo because it finds the socket running in the background, which is pretty cool. And if everything is up and running, then we can just check our cluster and verify that the cluster is up and running. So in this case, I will just taint the node to be not a control plane, check for the deployment that QoDNS is available, list all ports if they are available up and running and then we just try out by using a test workload which just prints out test files. This is always a good test to check if the cluster actually works as intended and in our case it works. All right, let's go back to the presentation and there's another demo in the presentation for the Debian packages, but we can skip it for now because of timings. And exactly, thanks. Yeah, the demo sources are available on GitHub. If you want to check it out, the presentation's already uploaded to the event platform so you can just test this. And what we plan for the future is we created a roadmap upstream so that everyone is invited to contribute to it. We have basic ideas for a Rust-based NIA framework or we also have support or working on support for wasn't plugins which are directly loaded into Cryo and we have to increase our release automation but we are continuously working on that and also documentation enhancements is something which we now have added to our roadmap so that everyone can contribute and making Cryo more portable for non-Linux is also something which is currently work in progress. So you can always reach out to us by using the GitHub or the Slack channels so it's the official Kubernetes Slack and the Cryo channel or you just open up an issue or a discussion item on the GitHub project. And with that, I would like to thank you all for listening and enjoy the rest of KubeCon. Thank you.